Published March 22, 2026 | Version v1
Dataset Open

T'OMIM: Tanakh Observable Matches of Intertextual Mimesis

  • 1. University of Notre Dame

Description

T'OMIM (Tanakh Observable Matches of Intertextual Mimesis, from Hebrew תאומים meaning "twins") is an open-access dataset of labeled parallel passages in the Hebrew Bible, compiled for computational and literary research on inner-biblical intertextuality. The archive pairs two distinct corpora of known parallels: 554 narrative verse pairs drawn from the Chronicles synoptic tradition, cataloged by Bendavid (2013) and Endres et al. (1998), and 256 poetic half-verse pairs identified by Berlin (2008), Fokkelman (2001), Kugel (1981), Watson (1994), and Tsumura (2023). Each corpus is provided at two levels of granularity. Verse-level tables contain the paired Hebrew texts with their source citations. Word-level tables expand each passage into its constituent tokens, preserving the full morphological annotation of the ETCBC Biblia Hebraica Stuttgartensia Amstelodamensis (van Peursen, Sikkel, and Roorda 2015): part of speech, verbal stem and tense, gender, number, person, lexeme, English gloss, and hierarchical syntactic structure. The four resulting tables are distributed as Apache Parquet files under a CC-BY-4.0 license, suitable for training and evaluating models for semantic similarity, text reuse detection, and intertextual retrieval in Biblical Hebrew.

 

Files

File Name Rows Description
narrative_pairs_verse.parquet 554 Narrative parallel pairs with Hebrew verse text
narrative_pairs_word.parquet 25,572 Word-level ETCBC annotation for narrative pairs
poetic_pairs_verse.parquet 256 Poetic parallel pairs with Hebrew half-verse text
poetic_pairs_word.parquet 2,437 Word-level ETCBC annotation for poetic pairs

 

Verse-Level Tables

Both narrative_pairs_verse.parquet and poetic_pairs_verse.parquet share the same schema:

Column Type Description
pair_id int Sequential identifier for each parallel pair (0-indexed)
source_ref string Verse or half-verse reference for the source passage (e.g., "2 Sam 3:2" or "2 Sam 22:14a")
target_ref string Verse or half-verse reference for the target passage (e.g., "1 Chr 3:1" or "Ps 18:14a")
source_text string Vocalized Hebrew text of the source passage
target_text string Vocalized Hebrew text of the target passage
reference string Scholarly source for the parallel identification

 

Narrative pairs use full verse references; poetic pairs use half-verse references (with a/b suffix indicating the half-verse division).

 

Word-Level Tables

Both narrative_pairs_word.parquet and poetic_pairs_word.parquet share the same 58-column schema. Each row represents a single word token from one side of a parallel pair. The pair_id and side columns link each word back to its pair in the corresponding verse-level table.

Pair linkage columns

Column Type Description
pair_id int Matches pair_id in the verse-level table
side string "source" or "target"

Verse and book locators

Column Type Description
abbrev_ref string Abbreviated verse reference (e.g., "Gen 1:1")
verse_ref string Full verse reference
book string Book name (English; e.g., "Genesis", "1 Samuel")
chapter int Chapter number
verse int Verse number

 

Data Source and Extended Annotation

The Hebrew text and morphological annotations in T'OMIM are derived from the ETCBC Biblia Hebraica Stuttgartensia Amstelodamensis (BHSA) dataset (van Peursen, Sikkel, and Roorda 2015; DOI:10.17026/DANS-Z6Y-SKYH). T'OMIM includes a curated subset of ETCBC word-level features relevant to parallel passage analysis. The full BHSA dataset contains additional annotation layers not included here, such as clause type and function, phrase-level syntax, discourse domain, and text-critical qere/ketiv variants. Researchers requiring these features can join T'OMIM's structural identifiers (the nd and in_* columns) directly to the BHSA dataset via the Text-Fabric framework (https://github.com/ETCBC/bhsa).

Files

Files (2.2 MB)

Name Size Download all
md5:44637f92b651394e898431b78c2b42ca
131.6 kB Download
md5:73091e314651049592bf20442930a17b
1.7 MB Download
md5:82b743f0047adedf8015d09f7982af14
30.9 kB Download
md5:9d1c46318e5ef9fb3ddc5d8468aa7d6d
292.8 kB Download

Additional details

Software

Development Status
Active

References

  • W.T. van Peursen, C. Sikkel, and D. Roorda. 2015. "Hebrew Text Database ETCBC4b." DANS Data Station Social Sciences and Humanities. https://doi.org/10.17026/DANS-Z6Y-SKYH.
  • Fokkelman, Joannes Petrus. Reading Biblical Poetry: An Introductory Guide. Translated by Ineke Smit. Westminster John Knox Press, 2001.
  • Berlin, Adele. The Dynamics of Biblical Parallelism. Revised and Expanded Edition. Eerdmans, 2008.
  • Kugel, James L. The Idea of Biblical Poetry: Parallelism and Its History. Yale University Press, 1981.
  • Watson, Wilfred G. E. Traditional Techniques in Classical Hebrew Verse. Journal for the Study of the Old Testament 170. Sheffield Academic Press, 1994.
  • Tsumura, David Toshio. Vertical Grammar of Parallelism in Biblical Hebrew. Ancient Israel and Its Literature 47. SBL Press, 2023.
  • Bendavid, Aba, ed. Parallels in the Bible [in Hebrew]. Carta, 2013.
  • Endres, John Carol, Corrine L. Patton, William R. Millar, John Barclay Burns, Corrine L. Carvalho, Pauline A. Viviano, and Jim Fitzgerald, eds. Chronicles and Its Synoptic Parallels in Samuel, Kings, and Related Biblical Texts. Liturgical Press, 1998.