T'OMIM: Tanakh Observable Matches of Intertextual Mimesis
Description
T'OMIM (Tanakh Observable Matches of Intertextual Mimesis, from Hebrew תאומים meaning "twins") is an open-access dataset of labeled parallel passages in the Hebrew Bible, compiled for computational and literary research on inner-biblical intertextuality. The archive pairs two distinct corpora of known parallels: 554 narrative verse pairs drawn from the Chronicles synoptic tradition, cataloged by Bendavid (2013) and Endres et al. (1998), and 256 poetic half-verse pairs identified by Berlin (2008), Fokkelman (2001), Kugel (1981), Watson (1994), and Tsumura (2023). Each corpus is provided at two levels of granularity. Verse-level tables contain the paired Hebrew texts with their source citations. Word-level tables expand each passage into its constituent tokens, preserving the full morphological annotation of the ETCBC Biblia Hebraica Stuttgartensia Amstelodamensis (van Peursen, Sikkel, and Roorda 2015): part of speech, verbal stem and tense, gender, number, person, lexeme, English gloss, and hierarchical syntactic structure. The four resulting tables are distributed as Apache Parquet files under a CC-BY-4.0 license, suitable for training and evaluating models for semantic similarity, text reuse detection, and intertextual retrieval in Biblical Hebrew.
Files
| File Name | Rows | Description |
| narrative_pairs_verse.parquet | 554 | Narrative parallel pairs with Hebrew verse text |
| narrative_pairs_word.parquet | 25,572 | Word-level ETCBC annotation for narrative pairs |
| poetic_pairs_verse.parquet | 256 | Poetic parallel pairs with Hebrew half-verse text |
| poetic_pairs_word.parquet | 2,437 | Word-level ETCBC annotation for poetic pairs |
Verse-Level Tables
Both narrative_pairs_verse.parquet and poetic_pairs_verse.parquet share the same schema:
| Column | Type | Description |
| pair_id | int | Sequential identifier for each parallel pair (0-indexed) |
| source_ref | string | Verse or half-verse reference for the source passage (e.g., "2 Sam 3:2" or "2 Sam 22:14a") |
| target_ref | string | Verse or half-verse reference for the target passage (e.g., "1 Chr 3:1" or "Ps 18:14a") |
| source_text | string | Vocalized Hebrew text of the source passage |
| target_text | string | Vocalized Hebrew text of the target passage |
| reference | string | Scholarly source for the parallel identification |
Narrative pairs use full verse references; poetic pairs use half-verse references (with a/b suffix indicating the half-verse division).
Word-Level Tables
Both narrative_pairs_word.parquet and poetic_pairs_word.parquet share the same 58-column schema. Each row represents a single word token from one side of a parallel pair. The pair_id and side columns link each word back to its pair in the corresponding verse-level table.
Pair linkage columns
| Column | Type | Description |
| pair_id | int | Matches pair_id in the verse-level table |
| side | string | "source" or "target" |
Verse and book locators
| Column | Type | Description |
| abbrev_ref | string | Abbreviated verse reference (e.g., "Gen 1:1") |
| verse_ref | string | Full verse reference |
| book | string | Book name (English; e.g., "Genesis", "1 Samuel") |
| chapter | int | Chapter number |
| verse | int | Verse number |
Data Source and Extended Annotation
The Hebrew text and morphological annotations in T'OMIM are derived from the ETCBC Biblia Hebraica Stuttgartensia Amstelodamensis (BHSA) dataset (van Peursen, Sikkel, and Roorda 2015; DOI:10.17026/DANS-Z6Y-SKYH). T'OMIM includes a curated subset of ETCBC word-level features relevant to parallel passage analysis. The full BHSA dataset contains additional annotation layers not included here, such as clause type and function, phrase-level syntax, discourse domain, and text-critical qere/ketiv variants. Researchers requiring these features can join T'OMIM's structural identifiers (the nd and in_* columns) directly to the BHSA dataset via the Text-Fabric framework (https://github.com/ETCBC/bhsa).
Files
Files
(2.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:44637f92b651394e898431b78c2b42ca
|
131.6 kB | Download |
|
md5:73091e314651049592bf20442930a17b
|
1.7 MB | Download |
|
md5:82b743f0047adedf8015d09f7982af14
|
30.9 kB | Download |
|
md5:9d1c46318e5ef9fb3ddc5d8468aa7d6d
|
292.8 kB | Download |
Additional details
Software
- Development Status
- Active
References
- W.T. van Peursen, C. Sikkel, and D. Roorda. 2015. "Hebrew Text Database ETCBC4b." DANS Data Station Social Sciences and Humanities. https://doi.org/10.17026/DANS-Z6Y-SKYH.
- Fokkelman, Joannes Petrus. Reading Biblical Poetry: An Introductory Guide. Translated by Ineke Smit. Westminster John Knox Press, 2001.
- Berlin, Adele. The Dynamics of Biblical Parallelism. Revised and Expanded Edition. Eerdmans, 2008.
- Kugel, James L. The Idea of Biblical Poetry: Parallelism and Its History. Yale University Press, 1981.
- Watson, Wilfred G. E. Traditional Techniques in Classical Hebrew Verse. Journal for the Study of the Old Testament 170. Sheffield Academic Press, 1994.
- Tsumura, David Toshio. Vertical Grammar of Parallelism in Biblical Hebrew. Ancient Israel and Its Literature 47. SBL Press, 2023.
- Bendavid, Aba, ed. Parallels in the Bible [in Hebrew]. Carta, 2013.
- Endres, John Carol, Corrine L. Patton, William R. Millar, John Barclay Burns, Corrine L. Carvalho, Pauline A. Viviano, and Jim Fitzgerald, eds. Chronicles and Its Synoptic Parallels in Samuel, Kings, and Related Biblical Texts. Liturgical Press, 1998.