Skip to main content
Dataset
Greek New Testament For Data Analysis
(2023)
  • Keith L. Yoder
Description
Updated: 12 June 2023
Corrections were made to five records in Matthew (18:33, 23:23) and Luke (11:42, 13:16, 15:32) that displayed an incorrect lemma for ἐσθίω ("eat"). Ten instances of δέω ("bind, tie") in Matthew 9:38, Luke (5:12, 8:38, 9:40, 10:2, 22:32), Acts (8:24, 10:2), Romans 1:10, and 2 Corinthians 5:20 were corrected to δέομαι ("plead, beseech"), and then all 22 instances of δέομαι were regularized to the Passive voice as per BDAG (3rd Edition). Finally, the PoS indicators for 4,938 "name" records (4,658 proper nouns, 268 proper adjectives, and 12 proper adverbs) were updated to NP, AP, and DP respectively.

Informational Note of 5 March 2020
I have recently posted an extract of my working Hebrew Old Testament (Tanakh), which I use for similar data analysis purposes, on this repository at https://works.bepress.com/klyoder/49/.

Updated: 18 December 2017
This Excel data file (compatible with Excel 2007 and later versions) is an extract of my working Greek New Testament database which I use for statistical and data analysis. It originated in the early 2000's from UBS3 data files in beta code I obtained from CCAT, and has since been evolving through countless changes and corrections. A flat-file table display such as Excel 2007+ is the best format suitable for Autofilter and VBA applications, without involving a more complex XML format. The file itself may be opened with Excel 2007 or later versions, or with the freeware spreadsheet packages OpenOffice Calc 3.3 or later, or LibreOffice Calc 3.3.3 or later. Values for the individual Greek words are presented in a stripped Latin character transliteration, see "FullWord" entry below.

Each record in the database contains the following fields:
  • ID - numeric record identifier, 1 through 138,019
  • Ref - reference address in format 00.ABC_11:22.33, where 00 is the book index, ABC is the two or three letter book abbreviation, 11 is the chapter number, 22 the verse number, and 33 the word number; all numeric segments less than 10 are padded with a leading zero
  • FullWord - Greek word stripped of all diacritics/accents and in compact SBL style transliteration, to prevent machine reverse engineering back to copyrighted text
  • Lemma - dictionary form of the word in Greek polytonic Unicode (UTF-8) characters
  • Cap - numeric indicator of capitalization, 2 for both lemma and word, 1 for word only (UBS3 criteria)
  • TC - text critical indicator, 1 for single brackets, 2 for double brackets, 3 for single brackets within double bracketed text
  • IPQ - three character indicator for Indent-Punctuation-Quotation; the first two positions are largely unused, but a number in the third position indicates the word is quoted from Old Testament text, based on UBS criteria
  • PoS - one or two character indicator for part of speech: Adjective and Adjective-Proper, Conjunction, aDverb and aDverb-Proper, Interjection, Noun and Noun-Proper, Preposition, pRonoun-Articular, pRonoun-Demonstrative, pRonoun-indeFinite, pRonoun-Interrogative, pRonoun-Personal/Possessive, pRonoun-Relative, pRronoun-Xreciprocal, Verb, X-particle
  • Person - numeric indicator 1, 2, or 3 for first, second or third person
  • Tense - single character indicator for tense of verb forms: Aorist, Future, Imperfect, Present, X-perfect, Y-pluperfect
  • Voice - single character indicator for voice of verb forms: Active, Middle, Passive
  • Mood - single character indicator for mood of verb forms: D-imperative, Indicative, iNfinitive, Optative, Participle, Subjunctive
  • Case - single character indicator for case endings: Accusative, Dative, Genitive, Nominative, Vocative
  • Number - single character for number: Singular, Plural
  • Gender - single character for gender: Masculine, Feminine, Neuter
  • Extra - miscellaneous one or two character indicator: Comparative, Superlative, Historical Present (HP indicators are complete only for the gospels of Mark, Luke, and John), and Alpha-Privative; multiple indicators may be present for a single record, separated by a comma
  • Sylls - numeric syllable count of the full word; initial iota followed by another vowel is always counted as a separate syllable
  • Chars - numeric (Greek-) character count of the full word, which may differ from character count of the transliterated form displayed in the FullWord field..

Transliteration scheme (after SBL) for FullWord field:
  • α=a, β=b, γ=g or γ=n before γ/κ/ξ/χ, δ=d, ε=e, ζ=z, η=ē, θ=th, ι=i, κ=k, λ=l, μ=m, ν=n, ξ=x, ο=o, π=p, ρ=r, σ/ς=s, τ=t, υ=u in diphthongs αυ/ευ/ου/υι otherwise υ=y, φ=ph, χ=ch, ψ=ps, ω=ō
  • Rough breathing on initial vowel or diphthong = initial "h" at beginning of word, except for initial  ῥ=rh; medial ρρ = rrh.

Any researcher who wants a copy of this file with the FullWord field displayed in Greek Unicode characters should contact me privately by email - click the "Contact" button underneath the picture at the top of my Works page.
Publication Date
Summer 2023
Citation Information
Keith L. Yoder. "Greek New Testament For Data Analysis" (2023)
Available at: http://works.bepress.com/klyoder/32/