Buddhist Classics AI Translation Series Vol.5: Complete Tengyur (Tibetan-Chinese-English, 丹珠尔 v2.0)

Buddhist Classics AI Translation Series

doi:10.5281/zenodo.17507881

Published November 2, 2025 | Version 2.0

Dataset Open

Buddhist Classics AI Translation Series Vol.5: Complete Tengyur (Tibetan-Chinese-English, 丹珠尔 v2.0)

Buddhist Classics AI Translation Series (Data collector)¹

1. Independent Research Collective

This is **Volume 5** of the comprehensive *Buddhist Classics AI Translation Series*, featuring the **complete Tengyur (Tibetan Buddhist Commentarial Canon)** (*bsTan 'gyur*, 丹珠尔), the authoritative collection of Indian Buddhist treatises preserved in Tibetan translation.

---

## About the Tengyur

The **Tengyur** (Tib. *bsTan 'gyur*, "Translated Treatises"; 丹珠尔) is the commentarial canon of Tibetan Buddhism, comprising:

- **3,460 texts** across 212 volumes (Degé edition)
- **Indian Buddhist scholarship** (2nd-17th centuries CE): Nāgārjuna, Asaṅga, Vasubandhu, Dharmakīrti, Atiśa, etc.
- **Systematic coverage**: Tantric commentaries, Prajñāpāramitā exegesis, Madhyamaka, Yogācāra, Buddhist logic, poetics, medicine, linguistics

**Historical Significance**:
- **Complements Kangyur**: While Kangyur contains Buddha's words, Tengyur preserves Indian masters' explanations
- **Post-Tang Dynasty Indian Buddhism**: Represents 8th-17th century scholasticism (largely absent from Chinese Canon)
- **~7.3% overlap with Chinese Canon** (91 texts, by title count; ~80 million Chinese characters total)
- **First complete Chinese translation** in human history

---

## Unprecedented Achievement: Completion of South Asian Buddhist Literature in Chinese

### **Historic Milestone**

**With the publication of Volume 5, Chinese Buddhist literature has achieved comprehensive coverage of all major South Asian Buddhist textual traditions for the first time since Zhu Shixing's 3rd-century CE journey to Khotan.**

**Four Previously Untranslated Corpora (Now Complete)**:
1. ✅ **Nyingma Gyubum** (宁玛十万续, Vol.2): Old Translation tantras
2. ✅ **Kangyur** (甘珠尔, Vol.3): Tibetan Buddhist Canon
3. ✅ **Tengyur** (丹珠尔, Vol.5): Commentarial Canon
4. ✅ **Pāli Canon** (巴利文大藏经, Vol.4): Theravāda scriptures + commentaries

**Together with the Chinese Buddhist Canon** (汉文大藏经), these five collections represent:
- **~2,300 years of Buddhist thought** (5th century BCE - 17th century CE)
- **Complete doctrinal spectrum**: Hīnayāna, Mahāyāna, Vajrayāna, Theravāda
- **Geographic breadth**: India, Sri Lanka, Nepal, Kashmir, Oḍḍiyāna, Tibet, China
- **~500 million characters** (Tibetan-Chinese-English parallel texts in this series)

**Civilizational Significance**:
> "We now possess, in modern accessible languages, a relatively complete portrait of a lost South Asian Buddhist civilization spanning two millennia. This is both a moment of sorrow—for this civilization endured too much in its homeland—and a moment of joy—for its legacy reunites with Chinese readers, continuing the transmission interrupted in the mid-Tang Dynasty."

---

## Dataset Scope and Structure

### **Source Edition**

**Degé Tengyur** (德格版丹珠尔):
- **212 volumes**, 3,460 texts
- **Digital source**: Nitartha Digital Library (2023 downloads, provided by two practitioner contributors)
- **Cross-referenced with**: BDRC (Buddhist Digital Resource Center), ACIP (Asian Classics Input Project)

**Structural Organization**:
1. **Praise and Homage Texts** (佛赞, ~10 texts)
2. **Tantric Commentaries** (密续注疏, ~1,200 texts)
- Kriyā, Caryā, Yoga, Anuttarayoga Tantras
3. **Prajñāpāramitā Commentaries** (般若注疏, ~150 texts)
- *Abhisamayālaṃkāra* system
4. **Madhyamaka** (中观, ~200 texts)
- Nāgārjuna, Āryadeva, Candrakīrti, Śāntideva
5. **Yogācāra** (唯识, ~150 texts)
- Maitreya, Asaṅga, Vasubandhu
6. **Pramāṇa** (因明, ~100 texts)
- Dignāga, Dharmakīrti
7. **Liberal Arts** (明处, ~300 texts)
- Poetics, grammar, medicine, astrology
8. **Vinaya Commentaries** (律注, ~50 texts)
9. **Abhidharma Commentaries** (论藏注, ~80 texts)
10. **Miscellaneous** (杂集, ~1,220 texts)

---

### **Version 2.0 Enhancements**

**Major Upgrade from Version 1.05**:
- **Complete Gemini 2.0 translations**: Full Tibetan-Chinese-English parallel corpus
- **Retained Claude versions**: Original high-quality translations preserved for comparison
- **Expanded coverage**: Previously untranslated grammatical treatises, medical texts, poetics
- **Cross-edition validation**: Nitartha Digital Library (2023) vs. original data sources (2017-2024)

**Compilation History**:
- **2017**: Initial project (selected Mahāmudrā texts, Saraha's *dohā* songs, Madhyamaka treatises)
- **2024**: Systematic expansion (2.5 volumes of Mahāmudrā corpus compiled)
- **2025 January**: Version 1.05 released
- **2025 August**: Version 2.0 with full Gemini 2.0 coverage

---

## Historical Context: Development of the Tengyur

### **Early Period (7th-9th centuries)**

**Tibetan Empire Era**:
- **King Trisong Detsen** (赤松德赞, 742-797): Large-scale translation projects
- **King Ralpachen** (赤热巴巾, 815-841): Standardized translation terminology
- **Translation teams**: Indian paṇḍitas + Tibetan lotsāwas
- **Legacy**: "Old Translation" (旧译) texts later compiled into Nyingma Gyubum

---

### **Later Diffusion (11th-14th centuries)**

**New Translation Period**:
- **11th century**: Atiśa, Rinchen Zangpo, Marpa, etc.
- **Massive translation wave**: Sakya, Kagyu, Kadam, Jonang founders participate
- **13th century**: Sakya school initiates systematic compilation
- **1310 CE**: Narthang Kangyur completed (near modern Shigatse, Bailan County)

**Butön Rinchen Drub** (布顿·仁钦珠, 1290-1364):
- First comprehensive catalog of Tengyur
- Established standard structure and classification
- **Ming-Qing additions**: Pramāṇa and grammar sections continue to expand

---

### **Editions and Versions**

**Major Tengyur Editions**:
- **Narthang** (纳塘版, ~1310): Earliest printed edition
- **Beijing** (北京版, 15th century): Imperial sponsorship
- **Degé** (德格版, 18th century): Most authoritative, used in this translation
- **Peking** (北京版, 20th century): Modern critical edition

**Differences from Chinese Canon**:
- **Minimal overlap**: Only 91 texts (2.6% by count, 7.3% by volume)
- **Post-Tang materials**: Represents Indian Buddhism after Chinese transmission ceased (~850 CE)
- **Includes Chinese-origin texts**: Some Tengyur texts are Tibetan translations of Chinese works (e.g., *Dasheng qixin lun* 《大乘起信论》)

---

## Translation Methodology

### **AI Models and Quality Tiers**

**Tier 1 (Highest Quality) - Original Version 1.05**:
- **Claude 3.5 Sonnet**: Major treatises, Madhyamaka, Yogācāra, Mahāmudrā
- **Manual processing**: Critical doctrinal terms reviewed
- **Preserved in Version 2.0** for comparison

**Tier 2 (Full Coverage) - Version 2.0**:
- **Gemini 2.0**: Complete corpus (3,460 texts)
- **Tibetan-Chinese-English parallel**
- **Automated workflow**: Software by Beijing layperson collaborator
- **Validation**: Segment overlaps for quality assurance

**Translation Prompts**:
- **Standard**: "Please provide complete, literal Chinese translation. No paraphrasing or abbreviation. If repetitions exist, translate fully. For verse sections, maintain parallel structure. For seed-syllables and mantras, display: (Devanāgarī, romanization, literal meaning if available) in continuous format."
- **Later simplified**: Removed "literal meaning" requirement due to AI inconsistency

---

### **Special Challenges: Sanskrit and Mantras**

**Seed-Syllable Rendering Issues**:
- **Goal**: Display (Tibetan, Devanāgarī, romanization, Chinese gloss) for all *bīja* syllables
- **Reality**: AI frequently omits Devanāgarī or gloss; romanization often inaccurate
- **Affected texts**: All tantric commentaries, dhāraṇī collections
- **User advisory**:
- ⚠️ **All Sanskrit in this translation requires manual verification**
- Recommend: Use specialized tools (e.g., Digital Sanskrit Buddhist Canon) for accurate rendering
- Multiple scripts needed: Oḍḍiyāna, Kashmir, Nepali, Sinhalese variants (not provided in this edition)

**Examples of AI Errors**:
- **Term confusion**: "Hevajra" (喜金刚) vs. "Cakrasaṃvara" (胜乐金刚) mixed
- **Name conflation**: "Candrakīrti" (月称) vs. "Candragomin" (月官)
- **Mantra variants**: "Six-Syllable Mantra" vs. "Six-Syllable Great Bright Mantra" treated as identical
- **Romanization**: Inconsistent IAST/Tibetan Wylie hybrid systems

**Recommendation**:
> "For scholarly work, all Sanskrit examples must be retranslated from Tibetan sources. Consider this translation a 'first draft' requiring expert review."

---

### **Biographical and Colophon Challenges**

**Name Variant Issues**:
- **Same person, multiple names**: Translation teams used inconsistent Sanskrit-Tibetan name pairs
- **Example**: Nāgārjuna = *Klu sgrub* (龙树) but also *Nāgārjuna* (那伽阿尔朱那) in different texts
- **Solution**: Cross-reference colophons, historical catalogs (e.g., *Blue Annals*)

**Historical Detail Reconstruction**:
- **Colophon analysis**: Many contain precise dates, locations, patron names
- **Editorial notes**: Highlighted in main text where significant
- **Qing Dynasty additions**: Some texts not in Degé mainline (e.g., Atiśa's *Laghuprayoga* collection, separately printed, duplicates existing content—not included)

---

## Linguistic and Philosophical Features

### **Madhyamaka-Yogācāra Synthesis**

**Tengyur as Philosophical Encyclopedia**:
- **Nāgārjuna's corpus**: *Mūlamadhyamakakārikā*, *Ratnāvalī*, *Śūnyatāsaptati*, etc.
- **Asaṅga-Vasubandhu**: *Abhidharmasamuccaya*, *Mahāyānasaṃgraha*, *Triṃśikā*
- **Śāntideva**: *Bodhicaryāvatāra* (plus autocommentary)
- **Candrakīrti**: *Madhyamakāvatāra*, *Prasannapadā*
- **Dharmakīrti**: *Pramāṇavārttika* system

**Cross-Tradition Dialogue**:
- **Chinese Chan vs. Indian Madhyamaka**: Tengyur provides Indian side of debate
- **Huayan-Yogācāra links**: *Daśabhūmika* commentaries show shared foundations
- **Tibetan synthesis**: How Tsongkhapa, Longchenpa, Sakya Paṇḍita interpreted Indian masters

---

### **Pramāṇa (Buddhist Logic)**

**Dignāga-Dharmakīrti Tradition**:
- **150+ treatises** on epistemology, logic, debate
- **Critical for Tibetan scholasticism**: Gelug monastic curriculum centers on these
- **Largely absent from Chinese Buddhism**: Only fragments translated in Tang Dynasty

**Practical Applications**:
- **Debate manuals**: How to construct syllogisms (*prayoga*)
- **Logical fallacies**: Classification systems (*hetvābhāsa*)
- **Valid cognition**: Perception (*pratyakṣa*) vs. inference (*anumāna*)

---

### **Tantric Exegesis**

**1,200+ Tantric Commentaries**:
- **Guhyasamāja**: ~50 commentaries (Nāgārjuna, Āryadeva, Candrakīrti)
- **Cakrasaṃvara**: ~80 commentaries (Luipa, Ghaṇṭāpa, Kāṇha)
- **Hevajra**: ~60 commentaries (Vajragarbha, Saroruha)
- **Kālacakra**: ~30 commentaries (Kālacakrapāda, Nāropa)

**Operational Precision**:
- **Sādhana step-by-step**: Generation stage (*utpattikrama*) instructions
- **Completion stage** (*sampannakrama*): Subtle body (*tsa-lung-tigle*) yogas
- **Empowerment protocols** (*abhiṣeka*): Ritual manuals

**Example (Mahāmudrā Corpus)**:
- **2.5 volumes** (this translation): Saraha, Tilopa, Nāropa, Maitrīpa
- **Sanskrit originals**: *Dohākoṣa* preserved in Apabhraṃśa (via Tibetan)
- **See also**: Appendix - Mahāmudrā AI Songs (Hindi-based musical reconstruction)

---

## Critical Content and Editorial Policy

### **Linguistic Complexity**

**Sanskrit-Tibetan Translation Layers**:
- **Original Sanskrit** (8th-13th century)
- **Tibetan translation** (9th-17th century)
- **Chinese translation** (2024-2025, AI-assisted)
- **English translation** (2025, AI-assisted)

**Challenges**:
- **Technical terminology density**: Often 50+ Sanskrit loanwords per page
- **Poetic/verse sections**: Maintain meter in Chinese (partially successful)
- **Commentarial structure**: Nested root-text + commentary + sub-commentary

**Translation Philosophy**:
- **Literal translation prioritized**: Preserves doctrinal precision
- **Not fluent literary Chinese**: Academic orientation
- **Rationale**: Enables philological analysis, cross-tradition comparison

---

### **Quality and Limitations**

**What This Translation Is**:
- ✅ **Complete coverage**: First full Chinese Tengyur
- ✅ **Research foundation**: Enables systematic study of post-Tang Indian Buddhism
- ✅ **Comparative resource**: Tibetan interpretations of Sanskrit sources

**What This Translation Is Not**:
- ❌ **Polished literary edition**: Contains AI errors, awkward phrasing
- ❌ **Authoritative reference**: Not peer-reviewed by Tibetologists
- ❌ **Practice manual (tantric sections)**: Requires qualified lama guidance + empowerment

**User Advisory**:
- **Scholarly work**: Always verify against Tibetan sources
- **Sanskrit citations**: Retranslate all mantras/seed-syllables
- **Personal study**: Compare Claude vs. Gemini versions for clarity
- **Do not report individual errors to editors**: Project scope precludes sentence-level corrections

---

## Technical Specifications

- **Total size**: ~80-100 million characters (Tibetan + Chinese + English)
- **Text count**: 3,460 treatises
- **Volume structure**: 212 volumes (Degé edition)
- **File formats**: Plain text (.txt), Markdown (.md), compressed archives (.7z)
- **Encoding**: UTF-8
- **Metadata**: Degé volume/text numbers, author attributions, colophons, translation model

**File Naming Convention**:
- Format: `[Section]-[Volume].[Text_Number]-[Author]_[Title]_[Model].txt`
- Example: `Madhyamaka-018.045-Candrakirti_Madhyamakavatara_c3.5s.txt`
- Models: `c3.5s` (Claude 3.5), `g2.0` (Gemini 2.0)

---

## Appendix: Mahāmudrā AI Songs (Musical Reconstruction)

### **Project Background**

**Cultural Context**:
- **Mahāmudrā dohā tradition**: 84 Mahāsiddhas sang realization songs (8th-12th century)
- **Languages**: Apabhraṃśa, Old Bengali, Sanskrit
- **Tibetan preservation**: ~150 songs in Tengyur (*dohā* collections)

**Experimental Goal**:
> "To simulate the lived experience of these songs through AI-generated music, primarily using Hindi (closest modern descendant of Apabhraṃśa) with melodic structures informed by North Indian classical traditions."

---

### **Technical Approach**

**Music Generation**:
- **AI system**: Suno AI (music generation model)
- **Lyrics**: Tibetan dohās → Chinese translation → Hindi adaptation
- **Style prompts**: "North Indian devotional (*bhajan*), *rāga*-inspired, male vocals, meditative tempo"
- **Instrumentation**: Sitar, tabla, bansuri, tanpura (AI-synthesized)

**Cultural Disclaimer**:
> "These AI-generated songs are **not** authentic historical reconstructions. They are creative experiments to evoke the cultural milieu of the Mahāsiddha tradition. Do not use for ritual purposes without consulting qualified teachers."

---

### **Musicological Notes**

**Why Hindi?**:
- **Apabhraṃśa → Modern Hindi**: Linguistic continuity (vs. extinct Apabhraṃśa)
- **Prosody**: Hindi retains metrical structures compatible with original dohās
- **Devotional tradition**: *Bhajan*/*Kirtan* styles preserve similar aesthetic

**Limitations**:
- **Regional variants ignored**: Oḍḍiyāna, Kashmir, Bengal had distinct musical traditions
- **Tantric context lost**: Original performances likely part of *gaṇacakra* feasts
- **AI voice**: Cannot replicate human vocal ornaments (*gamakas*)

**Ethnomusicological Value**:
- ✅ **Hypothesis generation**: Suggests possible melodic frameworks
- ❌ **Not evidence**: Cannot confirm actual historical performance practice
- 🎵 **Aesthetic experience**: May help readers "feel" cultural atmosphere

---

## Intended Use Cases

### **Academic Research**
- **Buddhist philosophy**: Comparative Madhyamaka-Yogācāra studies
- **Tantric Buddhism**: Ritual structure, deity yoga, subtle body theories
- **Buddhist logic**: Pramāṇa tradition, debate methodologies
- **Translation studies**: Sanskrit→Tibetan→Chinese transmission analysis
- **History of science**: Indian astronomy, medicine, linguistics

### **Religious Practice**
- **Madhyamaka study**: Foundation for Tibetan Buddhist philosophy
- **Tantric practice**: Sādhana instructions (with teacher guidance + empowerment)
- **Mahāmudrā**: Meditation manuals (Kagyu/Gelug traditions)

### **AI and NLP Applications**
- **Low-resource language modeling**: Tibetan-Sanskrit-Chinese parallel corpus
- **Domain-specific training**: Buddhist philosophical reasoning
- **Machine translation research**: Multi-stage translation analysis

Notes (Jinyu Chinese)

---

## 中文说明

### 关于丹珠尔

**丹珠尔**(藏文:བསྟན་འགྱུར།,"佛语译传之注疏")是藏传佛教的论藏,包含:
- **3,460部论著**,212函(德格版)
- **印度佛教**(2-17世纪):龙树、无著、世亲、法称、阿底峡等
- **系统覆盖**:密续注疏、般若疏释、中观、唯识、因明、诗学、医方明、声明

### 本卷历史意义

**中国佛教史上第一个完整的丹珠尔中文译本**

**历史里程碑**:
至此,汉语佛教文献首次实现对南亚佛教四大未译体系的完整覆盖:
1. ✅ **宁玛十万续**(第2卷)
2. ✅ **甘珠尔**(第3卷)
3. ✅ **丹珠尔**(第5卷)
4. ✅ **巴利文大藏经**(第4卷)

**加上汉文大藏经**,这五大集合代表:
- **2300年佛教思想**(公元前5世纪-公元17世纪)
- **完整教义谱系**:小乘、大乘、金刚乘、南传
- **地理广度**:印度、斯里兰卡、尼泊尔、克什米尔、乌杖那、**、中国
- **约5亿字**(本系列藏中英平行文本)

**文明意义**:
> "我们现在用现代通用语言,看到了一个相对完整的、消失的南亚佛教文明两千年的面貌。这既是悲伤的一刻——因为这个文明在故土历经太多磨难;也是欣喜的一刻——因为其遗产与汉语读者重逢,延续了中唐以来中断的传承。"

### 与汉文大藏经的关系

**最小重叠**:
- 仅91部重复(篇目2.6%,篇幅7.3%)
- 丹珠尔代表**中唐后的印度佛教**(850年后)
- 汉传缺失的领域:
- 因明学(150+部)
- 后期中观(月称等)
- 无上瑜伽注疏(1200+部)

### 版本演进

- **1.0版**(2017):萨罗哈道歌、部分大手印文献
- **1.05版**(2025年1月):系统扩充(2.5函大手印文集)
- **2.0版**(2025年8月):完整Gemini 2.0三语版

**数据来源**:
- Nitartha Digital Library(2023年下载)
- 两位师兄提供

### 翻译方法

**AI模型**:
- **Claude 3.5 Sonnet**:主要论著(1.05版,保留)
- **Gemini 2.0**:完整覆盖(2.0版新增)
- 藏中英三语平行

**翻译要求**:
"完整直译,不意译缩略,重复部分照译,诗歌体尽量对仗;种子字和咒语显示(天城体,罗马拟音,字面意义)"

### 特殊问题

**梵文渲染困难**:
- ⚠️ **所有梵文需人工核验**
- AI混淆:喜金刚vs胜乐金刚、月称vs月官
- 拟音不准、字体单调
- 建议:用专业工具(如DSBC)重新处理

**人名系统复杂**:
- 同一人多种译名
- 示例:龙树 = Klu sgrub = 那伽阿尔朱那
- 解决:参考题记、蓝册

### 内容亮点

**中观-唯识综合**:
- 龙树全集:《中论》《宝鬘论》等
- 无著-世亲:《摄大乘论》《唯识三十颂》
- 寂天:《入菩萨行论》
- 月称:《入中论》《明句论》

**因明学**:
- 陈那-法称传统
- 150+部逻辑、认识论著作
- 汉传罕见

**密续注疏**:
- 1200+部
- 《密集金刚》50部注、《胜乐金刚》80部注
- 详细仪轨

### 使用建议

**学术研究**:
- 核对藏文原文
- 梵文必须重译
- 对照Claude/Gemini版本

**修行应用**:
- 咨询具德上师
- 密续需要灌顶
- 大手印:参考2.5函文集

**不要反馈个别错误**:
- 项目规模不支持逐句修正
- 自己用AI改进

### 附录:大手印道歌AI歌曲

**文化重建实验**:
- 84大成就者道歌(8-12世纪)
- AI音乐生成(Suno AI)
- 印地语为主(最接近古阿帕布兰沙语)

**免责声明**:
> "这些AI歌曲**不是**历史真实重建,仅是文化氛围模拟。不可用于仪轨。"

### 版权说明

- 原典:公共领域(古代文献)
- AI译本:CC BY 4.0许可
- 明确允许AI训练使用

### 引用格式

Files

Files (420.3 MB)

Name	Size	Download all
佛典AI译丛第五卷：丹珠尔 2.0版.7z md5:0daf3db6e315eb020253337cbc2a3695	315.5 MB	Download
佛典AI译丛第五卷：丹珠尔番外大手印道歌集 AI歌曲.7z md5:fcdf30180ac7c3ec39ab1d7274a8586d	104.8 MB	Download

	All versions	This version
Views	89	38
Downloads	56	36
Data volume	26.7 GB	12.1 GB

Buddhist Classics AI Translation Series Vol.5: Complete Tengyur (Tibetan-Chinese-English, 丹珠尔 v2.0)

Authors/Creators

Description

Notes (Jinyu Chinese)

Files

Files (420.3 MB)