Buddhist Classics AI Translation Series Vol.5: Complete Tengyur (Tibetan-Chinese-English, 丹珠尔 v2.1)
Authors/Creators
- 1. Independent Research Collective
Description
This is **Volume 5** of the comprehensive *Buddhist Classics AI Translation Series*, featuring the **complete Tengyur (Tibetan Buddhist Commentarial Canon)** (*bsTan 'gyur*, 丹珠尔), the authoritative collection of Indian Buddhist treatises preserved in Tibetan translation.
---
## About the Tengyur
The **Tengyur** (Tib. *bsTan 'gyur*, "Translated Treatises"; 丹珠尔) is the commentarial canon of Tibetan Buddhism, comprising:
- **3,460 texts** across 212 volumes (Degé edition)
- **Indian Buddhist scholarship** (2nd-17th centuries CE): Nāgārjuna, Asaṅga, Vasubandhu, Dharmakīrti, Atiśa, etc.
- **Systematic coverage**: Tantric commentaries, Prajñāpāramitā exegesis, Madhyamaka, Yogācāra, Buddhist logic, poetics, medicine, linguistics
**Historical Significance**:
- **Complements Kangyur**: While Kangyur contains Buddha's words, Tengyur preserves Indian masters' explanations
- **Post-Tang Dynasty Indian Buddhism**: Represents 8th-17th century scholasticism (largely absent from Chinese Canon)
- **~7.3% overlap with Chinese Canon** (91 texts, by title count; ~80 million Chinese characters total)
- **First complete Chinese translation** in human history
---
## Unprecedented Achievement: Completion of South Asian Buddhist Literature in Chinese
### **Historic Milestone**
**With the publication of Volume 5, Chinese Buddhist literature has achieved comprehensive coverage of all major South Asian Buddhist textual traditions for the first time since Zhu Shixing's 3rd-century CE journey to Khotan.**
**Four Previously Untranslated Corpora (Now Complete)**:
1. ✅ **Nyingma Gyubum** (宁玛十万续, Vol.2): Old Translation tantras
2. ✅ **Kangyur** (甘珠尔, Vol.3): Tibetan Buddhist Canon
3. ✅ **Tengyur** (丹珠尔, Vol.5): Commentarial Canon
4. ✅ **Pāli Canon** (巴利文大藏经, Vol.4): Theravāda scriptures + commentaries
**Together with the Chinese Buddhist Canon** (汉文大藏经), these five collections represent:
- **~2,300 years of Buddhist thought** (5th century BCE - 17th century CE)
- **Complete doctrinal spectrum**: Hīnayāna, Mahāyāna, Vajrayāna, Theravāda
- **Geographic breadth**: India, Sri Lanka, Nepal, Kashmir, Oḍḍiyāna, Tibet, China
- **~500 million characters** (Tibetan-Chinese-English parallel texts in this series)
**Civilizational Significance**:
> "We now possess, in modern accessible languages, a relatively complete portrait of a lost South Asian Buddhist civilization spanning two millennia. This is both a moment of sorrow—for this civilization endured too much in its homeland—and a moment of joy—for its legacy reunites with Chinese readers, continuing the transmission interrupted in the mid-Tang Dynasty."
---
## Dataset Scope and Structure
### **Source Edition**
**Degé Tengyur** (德格版丹珠尔):
- **212 volumes**, 3,460 texts
- **Digital source**: Nitartha Digital Library (2023 downloads, provided by two practitioner contributors)
- **Cross-referenced with**: BDRC (Buddhist Digital Resource Center), ACIP (Asian Classics Input Project)
**Structural Organization**:
1. **Praise and Homage Texts** (佛赞, ~10 texts)
2. **Tantric Commentaries** (密续注疏, ~1,200 texts)
- Kriyā, Caryā, Yoga, Anuttarayoga Tantras
3. **Prajñāpāramitā Commentaries** (般若注疏, ~150 texts)
- *Abhisamayālaṃkāra* system
4. **Madhyamaka** (中观, ~200 texts)
- Nāgārjuna, Āryadeva, Candrakīrti, Śāntideva
5. **Yogācāra** (唯识, ~150 texts)
- Maitreya, Asaṅga, Vasubandhu
6. **Pramāṇa** (因明, ~100 texts)
- Dignāga, Dharmakīrti
7. **Liberal Arts** (明处, ~300 texts)
- Poetics, grammar, medicine, astrology
8. **Vinaya Commentaries** (律注, ~50 texts)
9. **Abhidharma Commentaries** (论藏注, ~80 texts)
10. **Miscellaneous** (杂集, ~1,220 texts)
---
### **Version 2.0 Enhancements**
**Major Upgrade from Version 1.05**:
- **Complete Gemini 2.0 translations**: Full Tibetan-Chinese-English parallel corpus
- **Retained Claude versions**: Original high-quality translations preserved for comparison
- **Expanded coverage**: Previously untranslated grammatical treatises, medical texts, poetics
- **Cross-edition validation**: Nitartha Digital Library (2023) vs. original data sources (2017-2024)
**Compilation History**:
- **2017**: Initial project (selected Mahāmudrā texts, Saraha's *dohā* songs, Madhyamaka treatises)
- **2024**: Systematic expansion (2.5 volumes of Mahāmudrā corpus compiled)
- **2025 January**: Version 1.05 released
- **2025 August**: Version 2.0 with full Gemini 2.0 coverage
---
## Historical Context: Development of the Tengyur
### **Early Period (7th-9th centuries)**
**Tibetan Empire Era**:
- **King Trisong Detsen** (赤松德赞, 742-797): Large-scale translation projects
- **King Ralpachen** (赤热巴巾, 815-841): Standardized translation terminology
- **Translation teams**: Indian paṇḍitas + Tibetan lotsāwas
- **Legacy**: "Old Translation" (旧译) texts later compiled into Nyingma Gyubum
---
### **Later Diffusion (11th-14th centuries)**
**New Translation Period**:
- **11th century**: Atiśa, Rinchen Zangpo, Marpa, etc.
- **Massive translation wave**: Sakya, Kagyu, Kadam, Jonang founders participate
- **13th century**: Sakya school initiates systematic compilation
- **1310 CE**: Narthang Kangyur completed (near modern Shigatse, Bailan County)
**Butön Rinchen Drub** (布顿·仁钦珠, 1290-1364):
- First comprehensive catalog of Tengyur
- Established standard structure and classification
- **Ming-Qing additions**: Pramāṇa and grammar sections continue to expand
---
### **Editions and Versions**
**Major Tengyur Editions**:
- **Narthang** (纳塘版, ~1310): Earliest printed edition
- **Beijing** (北京版, 15th century): Imperial sponsorship
- **Degé** (德格版, 18th century): Most authoritative, used in this translation
- **Peking** (北京版, 20th century): Modern critical edition
**Differences from Chinese Canon**:
- **Minimal overlap**: Only 91 texts (2.6% by count, 7.3% by volume)
- **Post-Tang materials**: Represents Indian Buddhism after Chinese transmission ceased (~850 CE)
- **Includes Chinese-origin texts**: Some Tengyur texts are Tibetan translations of Chinese works (e.g., *Dasheng qixin lun* 《大乘起信论》)
---
## Translation Methodology
### **AI Models and Quality Tiers**
**Tier 1 (Highest Quality) - Original Version 1.05**:
- **Claude 3.5 Sonnet**: Major treatises, Madhyamaka, Yogācāra, Mahāmudrā
- **Manual processing**: Critical doctrinal terms reviewed
- **Preserved in Version 2.0** for comparison
**Tier 2 (Full Coverage) - Version 2.0**:
- **Gemini 2.0**: Complete corpus (3,460 texts)
- **Tibetan-Chinese-English parallel**
- **Automated workflow**: Software by Beijing layperson collaborator
- **Validation**: Segment overlaps for quality assurance
**Translation Prompts**:
- **Standard**: "Please provide complete, literal Chinese translation. No paraphrasing or abbreviation. If repetitions exist, translate fully. For verse sections, maintain parallel structure. For seed-syllables and mantras, display: (Devanāgarī, romanization, literal meaning if available) in continuous format."
- **Later simplified**: Removed "literal meaning" requirement due to AI inconsistency
---
### **Special Challenges: Sanskrit and Mantras**
**Seed-Syllable Rendering Issues**:
- **Goal**: Display (Tibetan, Devanāgarī, romanization, Chinese gloss) for all *bīja* syllables
- **Reality**: AI frequently omits Devanāgarī or gloss; romanization often inaccurate
- **Affected texts**: All tantric commentaries, dhāraṇī collections
- **User advisory**:
- ⚠️ **All Sanskrit in this translation requires manual verification**
- Recommend: Use specialized tools (e.g., Digital Sanskrit Buddhist Canon) for accurate rendering
- Multiple scripts needed: Oḍḍiyāna, Kashmir, Nepali, Sinhalese variants (not provided in this edition)
**Examples of AI Errors**:
- **Term confusion**: "Hevajra" (喜金刚) vs. "Cakrasaṃvara" (胜乐金刚) mixed
- **Name conflation**: "Candrakīrti" (月称) vs. "Candragomin" (月官)
- **Mantra variants**: "Six-Syllable Mantra" vs. "Six-Syllable Great Bright Mantra" treated as identical
- **Romanization**: Inconsistent IAST/Tibetan Wylie hybrid systems
**Recommendation**:
> "For scholarly work, all Sanskrit examples must be retranslated from Tibetan sources. Consider this translation a 'first draft' requiring expert review."
---
### **Biographical and Colophon Challenges**
**Name Variant Issues**:
- **Same person, multiple names**: Translation teams used inconsistent Sanskrit-Tibetan name pairs
- **Example**: Nāgārjuna = *Klu sgrub* (龙树) but also *Nāgārjuna* (那伽阿尔朱那) in different texts
- **Solution**: Cross-reference colophons, historical catalogs (e.g., *Blue Annals*)
**Historical Detail Reconstruction**:
- **Colophon analysis**: Many contain precise dates, locations, patron names
- **Editorial notes**: Highlighted in main text where significant
- **Qing Dynasty additions**: Some texts not in Degé mainline (e.g., Atiśa's *Laghuprayoga* collection, separately printed, duplicates existing content—not included)
---
## Linguistic and Philosophical Features
### **Madhyamaka-Yogācāra Synthesis**
**Tengyur as Philosophical Encyclopedia**:
- **Nāgārjuna's corpus**: *Mūlamadhyamakakārikā*, *Ratnāvalī*, *Śūnyatāsaptati*, etc.
- **Asaṅga-Vasubandhu**: *Abhidharmasamuccaya*, *Mahāyānasaṃgraha*, *Triṃśikā*
- **Śāntideva**: *Bodhicaryāvatāra* (plus autocommentary)
- **Candrakīrti**: *Madhyamakāvatāra*, *Prasannapadā*
- **Dharmakīrti**: *Pramāṇavārttika* system
**Cross-Tradition Dialogue**:
- **Chinese Chan vs. Indian Madhyamaka**: Tengyur provides Indian side of debate
- **Huayan-Yogācāra links**: *Daśabhūmika* commentaries show shared foundations
- **Tibetan synthesis**: How Tsongkhapa, Longchenpa, Sakya Paṇḍita interpreted Indian masters
---
### **Pramāṇa (Buddhist Logic)**
**Dignāga-Dharmakīrti Tradition**:
- **150+ treatises** on epistemology, logic, debate
- **Critical for Tibetan scholasticism**: Gelug monastic curriculum centers on these
- **Largely absent from Chinese Buddhism**: Only fragments translated in Tang Dynasty
**Practical Applications**:
- **Debate manuals**: How to construct syllogisms (*prayoga*)
- **Logical fallacies**: Classification systems (*hetvābhāsa*)
- **Valid cognition**: Perception (*pratyakṣa*) vs. inference (*anumāna*)
---
### **Tantric Exegesis**
**1,200+ Tantric Commentaries**:
- **Guhyasamāja**: ~50 commentaries (Nāgārjuna, Āryadeva, Candrakīrti)
- **Cakrasaṃvara**: ~80 commentaries (Luipa, Ghaṇṭāpa, Kāṇha)
- **Hevajra**: ~60 commentaries (Vajragarbha, Saroruha)
- **Kālacakra**: ~30 commentaries (Kālacakrapāda, Nāropa)
**Operational Precision**:
- **Sādhana step-by-step**: Generation stage (*utpattikrama*) instructions
- **Completion stage** (*sampannakrama*): Subtle body (*tsa-lung-tigle*) yogas
- **Empowerment protocols** (*abhiṣeka*): Ritual manuals
**Example (Mahāmudrā Corpus)**:
- **2.5 volumes** (this translation): Saraha, Tilopa, Nāropa, Maitrīpa
- **Sanskrit originals**: *Dohākoṣa* preserved in Apabhraṃśa (via Tibetan)
- **See also**: Appendix - Mahāmudrā AI Songs (Hindi-based musical reconstruction)
---
## Critical Content and Editorial Policy
### **Linguistic Complexity**
**Sanskrit-Tibetan Translation Layers**:
- **Original Sanskrit** (8th-13th century)
- **Tibetan translation** (9th-17th century)
- **Chinese translation** (2024-2025, AI-assisted)
- **English translation** (2025, AI-assisted)
**Challenges**:
- **Technical terminology density**: Often 50+ Sanskrit loanwords per page
- **Poetic/verse sections**: Maintain meter in Chinese (partially successful)
- **Commentarial structure**: Nested root-text + commentary + sub-commentary
**Translation Philosophy**:
- **Literal translation prioritized**: Preserves doctrinal precision
- **Not fluent literary Chinese**: Academic orientation
- **Rationale**: Enables philological analysis, cross-tradition comparison
---
### **Quality and Limitations**
**What This Translation Is**:
- ✅ **Complete coverage**: First full Chinese Tengyur
- ✅ **Research foundation**: Enables systematic study of post-Tang Indian Buddhism
- ✅ **Comparative resource**: Tibetan interpretations of Sanskrit sources
**What This Translation Is Not**:
- ❌ **Polished literary edition**: Contains AI errors, awkward phrasing
- ❌ **Authoritative reference**: Not peer-reviewed by Tibetologists
- ❌ **Practice manual (tantric sections)**: Requires qualified lama guidance + empowerment
**User Advisory**:
- **Scholarly work**: Always verify against Tibetan sources
- **Sanskrit citations**: Retranslate all mantras/seed-syllables
- **Personal study**: Compare Claude vs. Gemini versions for clarity
- **Do not report individual errors to editors**: Project scope precludes sentence-level corrections
---
## Technical Specifications
- **Total size**: ~80-100 million characters (Tibetan + Chinese + English)
- **Text count**: 3,460 treatises
- **Volume structure**: 212 volumes (Degé edition)
- **File formats**: Plain text (.txt), Markdown (.md), compressed archives (.7z)
- **Encoding**: UTF-8
- **Metadata**: Degé volume/text numbers, author attributions, colophons, translation model
**File Naming Convention**:
- Format: `[Section]-[Volume].[Text_Number]-[Author]_[Title]_[Model].txt`
- Example: `Madhyamaka-018.045-Candrakirti_Madhyamakavatara_c3.5s.txt`
- Models: `c3.5s` (Claude 3.5), `g2.0` (Gemini 2.0)
---
## Appendix: Mahāmudrā AI Songs (Musical Reconstruction)
### **Project Background**
**Cultural Context**:
- **Mahāmudrā dohā tradition**: 84 Mahāsiddhas sang realization songs (8th-12th century)
- **Languages**: Apabhraṃśa, Old Bengali, Sanskrit
- **Tibetan preservation**: ~150 songs in Tengyur (*dohā* collections)
**Experimental Goal**:
> "To simulate the lived experience of these songs through AI-generated music, primarily using Hindi (closest modern descendant of Apabhraṃśa) with melodic structures informed by North Indian classical traditions."
---
### **Technical Approach**
**Music Generation**:
- **AI system**: Suno AI (music generation model)
- **Lyrics**: Tibetan dohās → Chinese translation → Hindi adaptation
- **Style prompts**: "North Indian devotional (*bhajan*), *rāga*-inspired, male vocals, meditative tempo"
- **Instrumentation**: Sitar, tabla, bansuri, tanpura (AI-synthesized)
**Cultural Disclaimer**:
> "These AI-generated songs are **not** authentic historical reconstructions. They are creative experiments to evoke the cultural milieu of the Mahāsiddha tradition. Do not use for ritual purposes without consulting qualified teachers."
---
### **Musicological Notes**
**Why Hindi?**:
- **Apabhraṃśa → Modern Hindi**: Linguistic continuity (vs. extinct Apabhraṃśa)
- **Prosody**: Hindi retains metrical structures compatible with original dohās
- **Devotional tradition**: *Bhajan*/*Kirtan* styles preserve similar aesthetic
**Limitations**:
- **Regional variants ignored**: Oḍḍiyāna, Kashmir, Bengal had distinct musical traditions
- **Tantric context lost**: Original performances likely part of *gaṇacakra* feasts
- **AI voice**: Cannot replicate human vocal ornaments (*gamakas*)
**Ethnomusicological Value**:
- ✅ **Hypothesis generation**: Suggests possible melodic frameworks
- ❌ **Not evidence**: Cannot confirm actual historical performance practice
- 🎵 **Aesthetic experience**: May help readers "feel" cultural atmosphere
---
## Intended Use Cases
### **Academic Research**
- **Buddhist philosophy**: Comparative Madhyamaka-Yogācāra studies
- **Tantric Buddhism**: Ritual structure, deity yoga, subtle body theories
- **Buddhist logic**: Pramāṇa tradition, debate methodologies
- **Translation studies**: Sanskrit→Tibetan→Chinese transmission analysis
- **History of science**: Indian astronomy, medicine, linguistics
### **Religious Practice**
- **Madhyamaka study**: Foundation for Tibetan Buddhist philosophy
- **Tantric practice**: Sādhana instructions (with teacher guidance + empowerment)
- **Mahāmudrā**: Meditation manuals (Kagyu/Gelug traditions)
### **AI and NLP Applications**
- **Low-resource language modeling**: Tibetan-Sanskrit-Chinese parallel corpus
- **Domain-specific training**: Buddhist philosophical reasoning
- **Machine translation research**: Multi-stage translation analysis
In the G2.0 translation edition (volumes 1, 2, 3, 5, 6, 7, 8, 11, 12, etc.) produced between July and November 2025, approximately 1% (a very small proportion) of the text contains entire paragraphs that were accidentally omitted in translation.To address this issue, we have written a dedicated program to perform electronic collation and supplementary translation. As of November 26, 2025, this remedial work has not yet been fully completed.Under normal circumstances, the upgraded complete volumes will first be released at:
https://huggingface.co/datasets/ospx1u/buddhist-classics-vol1-12/tree/main and subsequently published on zenodo.org.
Other data repositories will be updated on a case-by-case basis or may not be updated at all.
Notes (Jinyu Chinese)
Files
Files
(317.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:d6ac9e7075ee4ed4781539847a5c814c
|
317.0 MB | Download |