Buddhist Classics AI Translation Series Vol.5: Complete Tengyur (Tibetan-Chinese-English, 丹珠尔 v2.0)
Authors/Creators
- 1. Independent Research Collective
Description
This is **Volume 5** of the comprehensive *Buddhist Classics AI Translation Series*, featuring the **complete Tengyur (Tibetan Buddhist Commentarial Canon)** (*bsTan 'gyur*, 丹珠尔), the authoritative collection of Indian Buddhist treatises preserved in Tibetan translation.
---
## About the Tengyur
The **Tengyur** (Tib. *bsTan 'gyur*, "Translated Treatises"; 丹珠尔) is the commentarial canon of Tibetan Buddhism, comprising:
- **3,460 texts** across 212 volumes (Degé edition)
- **Indian Buddhist scholarship** (2nd-17th centuries CE): Nāgārjuna, Asaṅga, Vasubandhu, Dharmakīrti, Atiśa, etc.
- **Systematic coverage**: Tantric commentaries, Prajñāpāramitā exegesis, Madhyamaka, Yogācāra, Buddhist logic, poetics, medicine, linguistics
**Historical Significance**:
- **Complements Kangyur**: While Kangyur contains Buddha's words, Tengyur preserves Indian masters' explanations
- **Post-Tang Dynasty Indian Buddhism**: Represents 8th-17th century scholasticism (largely absent from Chinese Canon)
- **~7.3% overlap with Chinese Canon** (91 texts, by title count; ~80 million Chinese characters total)
- **First complete Chinese translation** in human history
---
## Unprecedented Achievement: Completion of South Asian Buddhist Literature in Chinese
### **Historic Milestone**
**With the publication of Volume 5, Chinese Buddhist literature has achieved comprehensive coverage of all major South Asian Buddhist textual traditions for the first time since Zhu Shixing's 3rd-century CE journey to Khotan.**
**Four Previously Untranslated Corpora (Now Complete)**:
1. ✅ **Nyingma Gyubum** (宁玛十万续, Vol.2): Old Translation tantras
2. ✅ **Kangyur** (甘珠尔, Vol.3): Tibetan Buddhist Canon
3. ✅ **Tengyur** (丹珠尔, Vol.5): Commentarial Canon
4. ✅ **Pāli Canon** (巴利文大藏经, Vol.4): Theravāda scriptures + commentaries
**Together with the Chinese Buddhist Canon** (汉文大藏经), these five collections represent:
- **~2,300 years of Buddhist thought** (5th century BCE - 17th century CE)
- **Complete doctrinal spectrum**: Hīnayāna, Mahāyāna, Vajrayāna, Theravāda
- **Geographic breadth**: India, Sri Lanka, Nepal, Kashmir, Oḍḍiyāna, Tibet, China
- **~500 million characters** (Tibetan-Chinese-English parallel texts in this series)
**Civilizational Significance**:
> "We now possess, in modern accessible languages, a relatively complete portrait of a lost South Asian Buddhist civilization spanning two millennia. This is both a moment of sorrow—for this civilization endured too much in its homeland—and a moment of joy—for its legacy reunites with Chinese readers, continuing the transmission interrupted in the mid-Tang Dynasty."
---
## Dataset Scope and Structure
### **Source Edition**
**Degé Tengyur** (德格版丹珠尔):
- **212 volumes**, 3,460 texts
- **Digital source**: Nitartha Digital Library (2023 downloads, provided by two practitioner contributors)
- **Cross-referenced with**: BDRC (Buddhist Digital Resource Center), ACIP (Asian Classics Input Project)
**Structural Organization**:
1. **Praise and Homage Texts** (佛赞, ~10 texts)
2. **Tantric Commentaries** (密续注疏, ~1,200 texts)
- Kriyā, Caryā, Yoga, Anuttarayoga Tantras
3. **Prajñāpāramitā Commentaries** (般若注疏, ~150 texts)
- *Abhisamayālaṃkāra* system
4. **Madhyamaka** (中观, ~200 texts)
- Nāgārjuna, Āryadeva, Candrakīrti, Śāntideva
5. **Yogācāra** (唯识, ~150 texts)
- Maitreya, Asaṅga, Vasubandhu
6. **Pramāṇa** (因明, ~100 texts)
- Dignāga, Dharmakīrti
7. **Liberal Arts** (明处, ~300 texts)
- Poetics, grammar, medicine, astrology
8. **Vinaya Commentaries** (律注, ~50 texts)
9. **Abhidharma Commentaries** (论藏注, ~80 texts)
10. **Miscellaneous** (杂集, ~1,220 texts)
---
### **Version 2.0 Enhancements**
**Major Upgrade from Version 1.05**:
- **Complete Gemini 2.0 translations**: Full Tibetan-Chinese-English parallel corpus
- **Retained Claude versions**: Original high-quality translations preserved for comparison
- **Expanded coverage**: Previously untranslated grammatical treatises, medical texts, poetics
- **Cross-edition validation**: Nitartha Digital Library (2023) vs. original data sources (2017-2024)
**Compilation History**:
- **2017**: Initial project (selected Mahāmudrā texts, Saraha's *dohā* songs, Madhyamaka treatises)
- **2024**: Systematic expansion (2.5 volumes of Mahāmudrā corpus compiled)
- **2025 January**: Version 1.05 released
- **2025 August**: Version 2.0 with full Gemini 2.0 coverage
---
## Historical Context: Development of the Tengyur
### **Early Period (7th-9th centuries)**
**Tibetan Empire Era**:
- **King Trisong Detsen** (赤松德赞, 742-797): Large-scale translation projects
- **King Ralpachen** (赤热巴巾, 815-841): Standardized translation terminology
- **Translation teams**: Indian paṇḍitas + Tibetan lotsāwas
- **Legacy**: "Old Translation" (旧译) texts later compiled into Nyingma Gyubum
---
### **Later Diffusion (11th-14th centuries)**
**New Translation Period**:
- **11th century**: Atiśa, Rinchen Zangpo, Marpa, etc.
- **Massive translation wave**: Sakya, Kagyu, Kadam, Jonang founders participate
- **13th century**: Sakya school initiates systematic compilation
- **1310 CE**: Narthang Kangyur completed (near modern Shigatse, Bailan County)
**Butön Rinchen Drub** (布顿·仁钦珠, 1290-1364):
- First comprehensive catalog of Tengyur
- Established standard structure and classification
- **Ming-Qing additions**: Pramāṇa and grammar sections continue to expand
---
### **Editions and Versions**
**Major Tengyur Editions**:
- **Narthang** (纳塘版, ~1310): Earliest printed edition
- **Beijing** (北京版, 15th century): Imperial sponsorship
- **Degé** (德格版, 18th century): Most authoritative, used in this translation
- **Peking** (北京版, 20th century): Modern critical edition
**Differences from Chinese Canon**:
- **Minimal overlap**: Only 91 texts (2.6% by count, 7.3% by volume)
- **Post-Tang materials**: Represents Indian Buddhism after Chinese transmission ceased (~850 CE)
- **Includes Chinese-origin texts**: Some Tengyur texts are Tibetan translations of Chinese works (e.g., *Dasheng qixin lun* 《大乘起信论》)
---
## Translation Methodology
### **AI Models and Quality Tiers**
**Tier 1 (Highest Quality) - Original Version 1.05**:
- **Claude 3.5 Sonnet**: Major treatises, Madhyamaka, Yogācāra, Mahāmudrā
- **Manual processing**: Critical doctrinal terms reviewed
- **Preserved in Version 2.0** for comparison
**Tier 2 (Full Coverage) - Version 2.0**:
- **Gemini 2.0**: Complete corpus (3,460 texts)
- **Tibetan-Chinese-English parallel**
- **Automated workflow**: Software by Beijing layperson collaborator
- **Validation**: Segment overlaps for quality assurance
**Translation Prompts**:
- **Standard**: "Please provide complete, literal Chinese translation. No paraphrasing or abbreviation. If repetitions exist, translate fully. For verse sections, maintain parallel structure. For seed-syllables and mantras, display: (Devanāgarī, romanization, literal meaning if available) in continuous format."
- **Later simplified**: Removed "literal meaning" requirement due to AI inconsistency
---
### **Special Challenges: Sanskrit and Mantras**
**Seed-Syllable Rendering Issues**:
- **Goal**: Display (Tibetan, Devanāgarī, romanization, Chinese gloss) for all *bīja* syllables
- **Reality**: AI frequently omits Devanāgarī or gloss; romanization often inaccurate
- **Affected texts**: All tantric commentaries, dhāraṇī collections
- **User advisory**:
- ⚠️ **All Sanskrit in this translation requires manual verification**
- Recommend: Use specialized tools (e.g., Digital Sanskrit Buddhist Canon) for accurate rendering
- Multiple scripts needed: Oḍḍiyāna, Kashmir, Nepali, Sinhalese variants (not provided in this edition)
**Examples of AI Errors**:
- **Term confusion**: "Hevajra" (喜金刚) vs. "Cakrasaṃvara" (胜乐金刚) mixed
- **Name conflation**: "Candrakīrti" (月称) vs. "Candragomin" (月官)
- **Mantra variants**: "Six-Syllable Mantra" vs. "Six-Syllable Great Bright Mantra" treated as identical
- **Romanization**: Inconsistent IAST/Tibetan Wylie hybrid systems
**Recommendation**:
> "For scholarly work, all Sanskrit examples must be retranslated from Tibetan sources. Consider this translation a 'first draft' requiring expert review."
---
### **Biographical and Colophon Challenges**
**Name Variant Issues**:
- **Same person, multiple names**: Translation teams used inconsistent Sanskrit-Tibetan name pairs
- **Example**: Nāgārjuna = *Klu sgrub* (龙树) but also *Nāgārjuna* (那伽阿尔朱那) in different texts
- **Solution**: Cross-reference colophons, historical catalogs (e.g., *Blue Annals*)
**Historical Detail Reconstruction**:
- **Colophon analysis**: Many contain precise dates, locations, patron names
- **Editorial notes**: Highlighted in main text where significant
- **Qing Dynasty additions**: Some texts not in Degé mainline (e.g., Atiśa's *Laghuprayoga* collection, separately printed, duplicates existing content—not included)
---
## Linguistic and Philosophical Features
### **Madhyamaka-Yogācāra Synthesis**
**Tengyur as Philosophical Encyclopedia**:
- **Nāgārjuna's corpus**: *Mūlamadhyamakakārikā*, *Ratnāvalī*, *Śūnyatāsaptati*, etc.
- **Asaṅga-Vasubandhu**: *Abhidharmasamuccaya*, *Mahāyānasaṃgraha*, *Triṃśikā*
- **Śāntideva**: *Bodhicaryāvatāra* (plus autocommentary)
- **Candrakīrti**: *Madhyamakāvatāra*, *Prasannapadā*
- **Dharmakīrti**: *Pramāṇavārttika* system
**Cross-Tradition Dialogue**:
- **Chinese Chan vs. Indian Madhyamaka**: Tengyur provides Indian side of debate
- **Huayan-Yogācāra links**: *Daśabhūmika* commentaries show shared foundations
- **Tibetan synthesis**: How Tsongkhapa, Longchenpa, Sakya Paṇḍita interpreted Indian masters
---
### **Pramāṇa (Buddhist Logic)**
**Dignāga-Dharmakīrti Tradition**:
- **150+ treatises** on epistemology, logic, debate
- **Critical for Tibetan scholasticism**: Gelug monastic curriculum centers on these
- **Largely absent from Chinese Buddhism**: Only fragments translated in Tang Dynasty
**Practical Applications**:
- **Debate manuals**: How to construct syllogisms (*prayoga*)
- **Logical fallacies**: Classification systems (*hetvābhāsa*)
- **Valid cognition**: Perception (*pratyakṣa*) vs. inference (*anumāna*)
---
### **Tantric Exegesis**
**1,200+ Tantric Commentaries**:
- **Guhyasamāja**: ~50 commentaries (Nāgārjuna, Āryadeva, Candrakīrti)
- **Cakrasaṃvara**: ~80 commentaries (Luipa, Ghaṇṭāpa, Kāṇha)
- **Hevajra**: ~60 commentaries (Vajragarbha, Saroruha)
- **Kālacakra**: ~30 commentaries (Kālacakrapāda, Nāropa)
**Operational Precision**:
- **Sādhana step-by-step**: Generation stage (*utpattikrama*) instructions
- **Completion stage** (*sampannakrama*): Subtle body (*tsa-lung-tigle*) yogas
- **Empowerment protocols** (*abhiṣeka*): Ritual manuals
**Example (Mahāmudrā Corpus)**:
- **2.5 volumes** (this translation): Saraha, Tilopa, Nāropa, Maitrīpa
- **Sanskrit originals**: *Dohākoṣa* preserved in Apabhraṃśa (via Tibetan)
- **See also**: Appendix - Mahāmudrā AI Songs (Hindi-based musical reconstruction)
---
## Critical Content and Editorial Policy
### **Linguistic Complexity**
**Sanskrit-Tibetan Translation Layers**:
- **Original Sanskrit** (8th-13th century)
- **Tibetan translation** (9th-17th century)
- **Chinese translation** (2024-2025, AI-assisted)
- **English translation** (2025, AI-assisted)
**Challenges**:
- **Technical terminology density**: Often 50+ Sanskrit loanwords per page
- **Poetic/verse sections**: Maintain meter in Chinese (partially successful)
- **Commentarial structure**: Nested root-text + commentary + sub-commentary
**Translation Philosophy**:
- **Literal translation prioritized**: Preserves doctrinal precision
- **Not fluent literary Chinese**: Academic orientation
- **Rationale**: Enables philological analysis, cross-tradition comparison
---
### **Quality and Limitations**
**What This Translation Is**:
- ✅ **Complete coverage**: First full Chinese Tengyur
- ✅ **Research foundation**: Enables systematic study of post-Tang Indian Buddhism
- ✅ **Comparative resource**: Tibetan interpretations of Sanskrit sources
**What This Translation Is Not**:
- ❌ **Polished literary edition**: Contains AI errors, awkward phrasing
- ❌ **Authoritative reference**: Not peer-reviewed by Tibetologists
- ❌ **Practice manual (tantric sections)**: Requires qualified lama guidance + empowerment
**User Advisory**:
- **Scholarly work**: Always verify against Tibetan sources
- **Sanskrit citations**: Retranslate all mantras/seed-syllables
- **Personal study**: Compare Claude vs. Gemini versions for clarity
- **Do not report individual errors to editors**: Project scope precludes sentence-level corrections
---
## Technical Specifications
- **Total size**: ~80-100 million characters (Tibetan + Chinese + English)
- **Text count**: 3,460 treatises
- **Volume structure**: 212 volumes (Degé edition)
- **File formats**: Plain text (.txt), Markdown (.md), compressed archives (.7z)
- **Encoding**: UTF-8
- **Metadata**: Degé volume/text numbers, author attributions, colophons, translation model
**File Naming Convention**:
- Format: `[Section]-[Volume].[Text_Number]-[Author]_[Title]_[Model].txt`
- Example: `Madhyamaka-018.045-Candrakirti_Madhyamakavatara_c3.5s.txt`
- Models: `c3.5s` (Claude 3.5), `g2.0` (Gemini 2.0)
---
## Appendix: Mahāmudrā AI Songs (Musical Reconstruction)
### **Project Background**
**Cultural Context**:
- **Mahāmudrā dohā tradition**: 84 Mahāsiddhas sang realization songs (8th-12th century)
- **Languages**: Apabhraṃśa, Old Bengali, Sanskrit
- **Tibetan preservation**: ~150 songs in Tengyur (*dohā* collections)
**Experimental Goal**:
> "To simulate the lived experience of these songs through AI-generated music, primarily using Hindi (closest modern descendant of Apabhraṃśa) with melodic structures informed by North Indian classical traditions."
---
### **Technical Approach**
**Music Generation**:
- **AI system**: Suno AI (music generation model)
- **Lyrics**: Tibetan dohās → Chinese translation → Hindi adaptation
- **Style prompts**: "North Indian devotional (*bhajan*), *rāga*-inspired, male vocals, meditative tempo"
- **Instrumentation**: Sitar, tabla, bansuri, tanpura (AI-synthesized)
**Cultural Disclaimer**:
> "These AI-generated songs are **not** authentic historical reconstructions. They are creative experiments to evoke the cultural milieu of the Mahāsiddha tradition. Do not use for ritual purposes without consulting qualified teachers."
---
### **Musicological Notes**
**Why Hindi?**:
- **Apabhraṃśa → Modern Hindi**: Linguistic continuity (vs. extinct Apabhraṃśa)
- **Prosody**: Hindi retains metrical structures compatible with original dohās
- **Devotional tradition**: *Bhajan*/*Kirtan* styles preserve similar aesthetic
**Limitations**:
- **Regional variants ignored**: Oḍḍiyāna, Kashmir, Bengal had distinct musical traditions
- **Tantric context lost**: Original performances likely part of *gaṇacakra* feasts
- **AI voice**: Cannot replicate human vocal ornaments (*gamakas*)
**Ethnomusicological Value**:
- ✅ **Hypothesis generation**: Suggests possible melodic frameworks
- ❌ **Not evidence**: Cannot confirm actual historical performance practice
- 🎵 **Aesthetic experience**: May help readers "feel" cultural atmosphere
---
## Intended Use Cases
### **Academic Research**
- **Buddhist philosophy**: Comparative Madhyamaka-Yogācāra studies
- **Tantric Buddhism**: Ritual structure, deity yoga, subtle body theories
- **Buddhist logic**: Pramāṇa tradition, debate methodologies
- **Translation studies**: Sanskrit→Tibetan→Chinese transmission analysis
- **History of science**: Indian astronomy, medicine, linguistics
### **Religious Practice**
- **Madhyamaka study**: Foundation for Tibetan Buddhist philosophy
- **Tantric practice**: Sādhana instructions (with teacher guidance + empowerment)
- **Mahāmudrā**: Meditation manuals (Kagyu/Gelug traditions)
### **AI and NLP Applications**
- **Low-resource language modeling**: Tibetan-Sanskrit-Chinese parallel corpus
- **Domain-specific training**: Buddhist philosophical reasoning
- **Machine translation research**: Multi-stage translation analysis
Notes (Jinyu Chinese)
Files
Files
(420.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:0daf3db6e315eb020253337cbc2a3695
|
315.5 MB | Download |
|
md5:fcdf30180ac7c3ec39ab1d7274a8586d
|
104.8 MB | Download |