Published November 29, 2025 | Version 2.1
Dataset Open

Buddhist Classics AI Translation Series Vol.5: Complete Tengyur (Tibetan-Chinese-English, 丹珠尔 v2.1)

  • 1. Independent Research Collective

Description

This is **Volume 5** of the comprehensive *Buddhist Classics AI Translation Series*, featuring the **complete Tengyur (Tibetan Buddhist Commentarial Canon)** (*bsTan 'gyur*, 丹珠尔), the authoritative collection of Indian Buddhist treatises preserved in Tibetan translation.

---

## About the Tengyur

The **Tengyur** (Tib. *bsTan 'gyur*, "Translated Treatises"; 丹珠尔) is the commentarial canon of Tibetan Buddhism, comprising:

- **3,460 texts** across 212 volumes (Degé edition)
- **Indian Buddhist scholarship** (2nd-17th centuries CE): Nāgārjuna, Asaṅga, Vasubandhu, Dharmakīrti, Atiśa, etc.
- **Systematic coverage**: Tantric commentaries, Prajñāpāramitā exegesis, Madhyamaka, Yogācāra, Buddhist logic, poetics, medicine, linguistics

**Historical Significance**:
- **Complements Kangyur**: While Kangyur contains Buddha's words, Tengyur preserves Indian masters' explanations
- **Post-Tang Dynasty Indian Buddhism**: Represents 8th-17th century scholasticism (largely absent from Chinese Canon)
- **~7.3% overlap with Chinese Canon** (91 texts, by title count; ~80 million Chinese characters total)
- **First complete Chinese translation** in human history

---

## Unprecedented Achievement: Completion of South Asian Buddhist Literature in Chinese

### **Historic Milestone**

**With the publication of Volume 5, Chinese Buddhist literature has achieved comprehensive coverage of all major South Asian Buddhist textual traditions for the first time since Zhu Shixing's 3rd-century CE journey to Khotan.**

**Four Previously Untranslated Corpora (Now Complete)**:
1. ✅ **Nyingma Gyubum** (宁玛十万续, Vol.2): Old Translation tantras
2. ✅ **Kangyur** (甘珠尔, Vol.3): Tibetan Buddhist Canon
3. ✅ **Tengyur** (丹珠尔, Vol.5): Commentarial Canon
4. ✅ **Pāli Canon** (巴利文大藏经, Vol.4): Theravāda scriptures + commentaries

**Together with the Chinese Buddhist Canon** (汉文大藏经), these five collections represent:
- **~2,300 years of Buddhist thought** (5th century BCE - 17th century CE)
- **Complete doctrinal spectrum**: Hīnayāna, Mahāyāna, Vajrayāna, Theravāda
- **Geographic breadth**: India, Sri Lanka, Nepal, Kashmir, Oḍḍiyāna, Tibet, China
- **~500 million characters** (Tibetan-Chinese-English parallel texts in this series)

**Civilizational Significance**:
> "We now possess, in modern accessible languages, a relatively complete portrait of a lost South Asian Buddhist civilization spanning two millennia. This is both a moment of sorrow—for this civilization endured too much in its homeland—and a moment of joy—for its legacy reunites with Chinese readers, continuing the transmission interrupted in the mid-Tang Dynasty."

---

## Dataset Scope and Structure

### **Source Edition**

**Degé Tengyur** (德格版丹珠尔):
- **212 volumes**, 3,460 texts
- **Digital source**: Nitartha Digital Library (2023 downloads, provided by two practitioner contributors)
- **Cross-referenced with**: BDRC (Buddhist Digital Resource Center), ACIP (Asian Classics Input Project)

**Structural Organization**:
1. **Praise and Homage Texts** (佛赞, ~10 texts)
2. **Tantric Commentaries** (密续注疏, ~1,200 texts)
   - Kriyā, Caryā, Yoga, Anuttarayoga Tantras
3. **Prajñāpāramitā Commentaries** (般若注疏, ~150 texts)
   - *Abhisamayālaṃkāra* system
4. **Madhyamaka** (中观, ~200 texts)
   - Nāgārjuna, Āryadeva, Candrakīrti, Śāntideva
5. **Yogācāra** (唯识, ~150 texts)
   - Maitreya, Asaṅga, Vasubandhu
6. **Pramāṇa** (因明, ~100 texts)
   - Dignāga, Dharmakīrti
7. **Liberal Arts** (明处, ~300 texts)
   - Poetics, grammar, medicine, astrology
8. **Vinaya Commentaries** (律注, ~50 texts)
9. **Abhidharma Commentaries** (论藏注, ~80 texts)
10. **Miscellaneous** (杂集, ~1,220 texts)

---

### **Version 2.0 Enhancements**

**Major Upgrade from Version 1.05**:
- **Complete Gemini 2.0 translations**: Full Tibetan-Chinese-English parallel corpus
- **Retained Claude versions**: Original high-quality translations preserved for comparison
- **Expanded coverage**: Previously untranslated grammatical treatises, medical texts, poetics
- **Cross-edition validation**: Nitartha Digital Library (2023) vs. original data sources (2017-2024)

**Compilation History**:
- **2017**: Initial project (selected Mahāmudrā texts, Saraha's *dohā* songs, Madhyamaka treatises)
- **2024**: Systematic expansion (2.5 volumes of Mahāmudrā corpus compiled)
- **2025 January**: Version 1.05 released
- **2025 August**: Version 2.0 with full Gemini 2.0 coverage

---

## Historical Context: Development of the Tengyur

### **Early Period (7th-9th centuries)**

**Tibetan Empire Era**:
- **King Trisong Detsen** (赤松德赞, 742-797): Large-scale translation projects
- **King Ralpachen** (赤热巴巾, 815-841): Standardized translation terminology
- **Translation teams**: Indian paṇḍitas + Tibetan lotsāwas
- **Legacy**: "Old Translation" (旧译) texts later compiled into Nyingma Gyubum

---

### **Later Diffusion (11th-14th centuries)**

**New Translation Period**:
- **11th century**: Atiśa, Rinchen Zangpo, Marpa, etc.
- **Massive translation wave**: Sakya, Kagyu, Kadam, Jonang founders participate
- **13th century**: Sakya school initiates systematic compilation
- **1310 CE**: Narthang Kangyur completed (near modern Shigatse, Bailan County)

**Butön Rinchen Drub** (布顿·仁钦珠, 1290-1364):
- First comprehensive catalog of Tengyur
- Established standard structure and classification
- **Ming-Qing additions**: Pramāṇa and grammar sections continue to expand

---

### **Editions and Versions**

**Major Tengyur Editions**:
- **Narthang** (纳塘版, ~1310): Earliest printed edition
- **Beijing** (北京版, 15th century): Imperial sponsorship
- **Degé** (德格版, 18th century): Most authoritative, used in this translation
- **Peking** (北京版, 20th century): Modern critical edition

**Differences from Chinese Canon**:
- **Minimal overlap**: Only 91 texts (2.6% by count, 7.3% by volume)
- **Post-Tang materials**: Represents Indian Buddhism after Chinese transmission ceased (~850 CE)
- **Includes Chinese-origin texts**: Some Tengyur texts are Tibetan translations of Chinese works (e.g., *Dasheng qixin lun* 《大乘起信论》)

---

## Translation Methodology

### **AI Models and Quality Tiers**

**Tier 1 (Highest Quality) - Original Version 1.05**:
- **Claude 3.5 Sonnet**: Major treatises, Madhyamaka, Yogācāra, Mahāmudrā
- **Manual processing**: Critical doctrinal terms reviewed
- **Preserved in Version 2.0** for comparison

**Tier 2 (Full Coverage) - Version 2.0**:
- **Gemini 2.0**: Complete corpus (3,460 texts)
- **Tibetan-Chinese-English parallel**
- **Automated workflow**: Software by Beijing layperson collaborator
- **Validation**: Segment overlaps for quality assurance

**Translation Prompts**:
- **Standard**: "Please provide complete, literal Chinese translation. No paraphrasing or abbreviation. If repetitions exist, translate fully. For verse sections, maintain parallel structure. For seed-syllables and mantras, display: (Devanāgarī, romanization, literal meaning if available) in continuous format."
- **Later simplified**: Removed "literal meaning" requirement due to AI inconsistency

---

### **Special Challenges: Sanskrit and Mantras**

**Seed-Syllable Rendering Issues**:
- **Goal**: Display (Tibetan, Devanāgarī, romanization, Chinese gloss) for all *bīja* syllables
- **Reality**: AI frequently omits Devanāgarī or gloss; romanization often inaccurate
- **Affected texts**: All tantric commentaries, dhāraṇī collections
- **User advisory**: 
  - ⚠️ **All Sanskrit in this translation requires manual verification**
  - Recommend: Use specialized tools (e.g., Digital Sanskrit Buddhist Canon) for accurate rendering
  - Multiple scripts needed: Oḍḍiyāna, Kashmir, Nepali, Sinhalese variants (not provided in this edition)

**Examples of AI Errors**:
- **Term confusion**: "Hevajra" (喜金刚) vs. "Cakrasaṃvara" (胜乐金刚) mixed
- **Name conflation**: "Candrakīrti" (月称) vs. "Candragomin" (月官)
- **Mantra variants**: "Six-Syllable Mantra" vs. "Six-Syllable Great Bright Mantra" treated as identical
- **Romanization**: Inconsistent IAST/Tibetan Wylie hybrid systems

**Recommendation**:
> "For scholarly work, all Sanskrit examples must be retranslated from Tibetan sources. Consider this translation a 'first draft' requiring expert review."

---

### **Biographical and Colophon Challenges**

**Name Variant Issues**:
- **Same person, multiple names**: Translation teams used inconsistent Sanskrit-Tibetan name pairs
- **Example**: Nāgārjuna = *Klu sgrub* (龙树) but also *Nāgārjuna* (那伽阿尔朱那) in different texts
- **Solution**: Cross-reference colophons, historical catalogs (e.g., *Blue Annals*)

**Historical Detail Reconstruction**:
- **Colophon analysis**: Many contain precise dates, locations, patron names
- **Editorial notes**: Highlighted in main text where significant
- **Qing Dynasty additions**: Some texts not in Degé mainline (e.g., Atiśa's *Laghuprayoga* collection, separately printed, duplicates existing content—not included)

---

## Linguistic and Philosophical Features

### **Madhyamaka-Yogācāra Synthesis**

**Tengyur as Philosophical Encyclopedia**:
- **Nāgārjuna's corpus**: *Mūlamadhyamakakārikā*, *Ratnāvalī*, *Śūnyatāsaptati*, etc.
- **Asaṅga-Vasubandhu**: *Abhidharmasamuccaya*, *Mahāyānasaṃgraha*, *Triṃśikā*
- **Śāntideva**: *Bodhicaryāvatāra* (plus autocommentary)
- **Candrakīrti**: *Madhyamakāvatāra*, *Prasannapadā*
- **Dharmakīrti**: *Pramāṇavārttika* system

**Cross-Tradition Dialogue**:
- **Chinese Chan vs. Indian Madhyamaka**: Tengyur provides Indian side of debate
- **Huayan-Yogācāra links**: *Daśabhūmika* commentaries show shared foundations
- **Tibetan synthesis**: How Tsongkhapa, Longchenpa, Sakya Paṇḍita interpreted Indian masters

---

### **Pramāṇa (Buddhist Logic)**

**Dignāga-Dharmakīrti Tradition**:
- **150+ treatises** on epistemology, logic, debate
- **Critical for Tibetan scholasticism**: Gelug monastic curriculum centers on these
- **Largely absent from Chinese Buddhism**: Only fragments translated in Tang Dynasty

**Practical Applications**:
- **Debate manuals**: How to construct syllogisms (*prayoga*)
- **Logical fallacies**: Classification systems (*hetvābhāsa*)
- **Valid cognition**: Perception (*pratyakṣa*) vs. inference (*anumāna*)

---

### **Tantric Exegesis**

**1,200+ Tantric Commentaries**:
- **Guhyasamāja**: ~50 commentaries (Nāgārjuna, Āryadeva, Candrakīrti)
- **Cakrasaṃvara**: ~80 commentaries (Luipa, Ghaṇṭāpa, Kāṇha)
- **Hevajra**: ~60 commentaries (Vajragarbha, Saroruha)
- **Kālacakra**: ~30 commentaries (Kālacakrapāda, Nāropa)

**Operational Precision**:
- **Sādhana step-by-step**: Generation stage (*utpattikrama*) instructions
- **Completion stage** (*sampannakrama*): Subtle body (*tsa-lung-tigle*) yogas
- **Empowerment protocols** (*abhiṣeka*): Ritual manuals

**Example (Mahāmudrā Corpus)**:
- **2.5 volumes** (this translation): Saraha, Tilopa, Nāropa, Maitrīpa
- **Sanskrit originals**: *Dohākoṣa* preserved in Apabhraṃśa (via Tibetan)
- **See also**: Appendix - Mahāmudrā AI Songs (Hindi-based musical reconstruction)

---

## Critical Content and Editorial Policy

### **Linguistic Complexity**

**Sanskrit-Tibetan Translation Layers**:
- **Original Sanskrit** (8th-13th century)
- **Tibetan translation** (9th-17th century)
- **Chinese translation** (2024-2025, AI-assisted)
- **English translation** (2025, AI-assisted)

**Challenges**:
- **Technical terminology density**: Often 50+ Sanskrit loanwords per page
- **Poetic/verse sections**: Maintain meter in Chinese (partially successful)
- **Commentarial structure**: Nested root-text + commentary + sub-commentary

**Translation Philosophy**:
- **Literal translation prioritized**: Preserves doctrinal precision
- **Not fluent literary Chinese**: Academic orientation
- **Rationale**: Enables philological analysis, cross-tradition comparison

---

### **Quality and Limitations**

**What This Translation Is**:
- ✅ **Complete coverage**: First full Chinese Tengyur
- ✅ **Research foundation**: Enables systematic study of post-Tang Indian Buddhism
- ✅ **Comparative resource**: Tibetan interpretations of Sanskrit sources

**What This Translation Is Not**:
- ❌ **Polished literary edition**: Contains AI errors, awkward phrasing
- ❌ **Authoritative reference**: Not peer-reviewed by Tibetologists
- ❌ **Practice manual (tantric sections)**: Requires qualified lama guidance + empowerment

**User Advisory**:
- **Scholarly work**: Always verify against Tibetan sources
- **Sanskrit citations**: Retranslate all mantras/seed-syllables
- **Personal study**: Compare Claude vs. Gemini versions for clarity
- **Do not report individual errors to editors**: Project scope precludes sentence-level corrections

---

## Technical Specifications

- **Total size**: ~80-100 million characters (Tibetan + Chinese + English)
- **Text count**: 3,460 treatises
- **Volume structure**: 212 volumes (Degé edition)
- **File formats**: Plain text (.txt), Markdown (.md), compressed archives (.7z)
- **Encoding**: UTF-8
- **Metadata**: Degé volume/text numbers, author attributions, colophons, translation model

**File Naming Convention**:
- Format: `[Section]-[Volume].[Text_Number]-[Author]_[Title]_[Model].txt`
- Example: `Madhyamaka-018.045-Candrakirti_Madhyamakavatara_c3.5s.txt`
- Models: `c3.5s` (Claude 3.5), `g2.0` (Gemini 2.0)

---

## Appendix: Mahāmudrā AI Songs (Musical Reconstruction)

### **Project Background**

**Cultural Context**:
- **Mahāmudrā dohā tradition**: 84 Mahāsiddhas sang realization songs (8th-12th century)
- **Languages**: Apabhraṃśa, Old Bengali, Sanskrit
- **Tibetan preservation**: ~150 songs in Tengyur (*dohā* collections)

**Experimental Goal**:
> "To simulate the lived experience of these songs through AI-generated music, primarily using Hindi (closest modern descendant of Apabhraṃśa) with melodic structures informed by North Indian classical traditions."

---

### **Technical Approach**

**Music Generation**:
- **AI system**: Suno AI (music generation model)
- **Lyrics**: Tibetan dohās → Chinese translation → Hindi adaptation
- **Style prompts**: "North Indian devotional (*bhajan*), *rāga*-inspired, male vocals, meditative tempo"
- **Instrumentation**: Sitar, tabla, bansuri, tanpura (AI-synthesized)

**Cultural Disclaimer**:
> "These AI-generated songs are **not** authentic historical reconstructions. They are creative experiments to evoke the cultural milieu of the Mahāsiddha tradition. Do not use for ritual purposes without consulting qualified teachers."

---

### **Musicological Notes**

**Why Hindi?**:
- **Apabhraṃśa → Modern Hindi**: Linguistic continuity (vs. extinct Apabhraṃśa)
- **Prosody**: Hindi retains metrical structures compatible with original dohās
- **Devotional tradition**: *Bhajan*/*Kirtan* styles preserve similar aesthetic

**Limitations**:
- **Regional variants ignored**: Oḍḍiyāna, Kashmir, Bengal had distinct musical traditions
- **Tantric context lost**: Original performances likely part of *gaṇacakra* feasts
- **AI voice**: Cannot replicate human vocal ornaments (*gamakas*)

**Ethnomusicological Value**:
- ✅ **Hypothesis generation**: Suggests possible melodic frameworks
- ❌ **Not evidence**: Cannot confirm actual historical performance practice
- 🎵 **Aesthetic experience**: May help readers "feel" cultural atmosphere

---

## Intended Use Cases

### **Academic Research**
- **Buddhist philosophy**: Comparative Madhyamaka-Yogācāra studies
- **Tantric Buddhism**: Ritual structure, deity yoga, subtle body theories
- **Buddhist logic**: Pramāṇa tradition, debate methodologies
- **Translation studies**: Sanskrit→Tibetan→Chinese transmission analysis
- **History of science**: Indian astronomy, medicine, linguistics

### **Religious Practice**
- **Madhyamaka study**: Foundation for Tibetan Buddhist philosophy
- **Tantric practice**: Sādhana instructions (with teacher guidance + empowerment)
- **Mahāmudrā**: Meditation manuals (Kagyu/Gelug traditions)

### **AI and NLP Applications**
- **Low-resource language modeling**: Tibetan-Sanskrit-Chinese parallel corpus
- **Domain-specific training**: Buddhist philosophical reasoning
- **Machine translation research**: Multi-stage translation analysis

 

In the G2.0 translation edition (volumes 1, 2, 3, 5, 6, 7, 8, 11, 12, etc.) produced between July and November 2025, approximately 1% (a very small proportion) of the text contains entire paragraphs that were accidentally omitted in translation.To address this issue, we have written a dedicated program to perform electronic collation and supplementary translation. As of November 26, 2025, this remedial work has not yet been fully completed.Under normal circumstances, the upgraded complete volumes will first be released at:
https://huggingface.co/datasets/ospx1u/buddhist-classics-vol1-12/tree/main  and subsequently published on zenodo.org.
Other data repositories will be updated on a case-by-case basis or may not be updated at all.

Notes (Jinyu Chinese)

---

## 中文说明

### 关于丹珠尔

**丹珠尔**(藏文:བསྟན་འགྱུར།,"佛语译传之注疏")是藏传佛教的论藏,包含:
- **3,460部论著**,212函(德格版)
- **印度佛教**(2-17世纪):龙树、无著、世亲、法称、阿底峡等
- **系统覆盖**:密续注疏、般若疏释、中观、唯识、因明、诗学、医方明、声明

### 本卷历史意义

**中国佛教史上第一个完整的丹珠尔中文译本**

**历史里程碑**:
至此,汉语佛教文献首次实现对南亚佛教四大未译体系的完整覆盖:
1. ✅ **宁玛十万续**(第2卷)
2. ✅ **甘珠尔**(第3卷)
3. ✅ **丹珠尔**(第5卷)
4. ✅ **巴利文大藏经**(第4卷)

**加上汉文大藏经**,这五大集合代表:
- **2300年佛教思想**(公元前5世纪-公元17世纪)
- **完整教义谱系**:小乘、大乘、金刚乘、南传
- **地理广度**:印度、斯里兰卡、尼泊尔、克什米尔、乌杖那、**、中国
- **约5亿字**(本系列藏中英平行文本)

**文明意义**:
> "我们现在用现代通用语言,看到了一个相对完整的、消失的南亚佛教文明两千年的面貌。这既是悲伤的一刻——因为这个文明在故土历经太多磨难;也是欣喜的一刻——因为其遗产与汉语读者重逢,延续了中唐以来中断的传承。"

### 与汉文大藏经的关系

**最小重叠**:
- 仅91部重复(篇目2.6%,篇幅7.3%)
- 丹珠尔代表**中唐后的印度佛教**(850年后)
- 汉传缺失的领域:
  - 因明学(150+部)
  - 后期中观(月称等)
  - 无上瑜伽注疏(1200+部)

### 版本演进

- **1.0版**(2017):萨罗哈道歌、部分大手印文献
- **1.05版**(2025年1月):系统扩充(2.5函大手印文集)
- **2.0版**(2025年8月):完整Gemini 2.0三语版

**数据来源**:
- Nitartha Digital Library(2023年下载)
- 两位师兄提供

### 翻译方法

**AI模型**:
- **Claude 3.5 Sonnet**:主要论著(1.05版,保留)
- **Gemini 2.0**:完整覆盖(2.0版新增)
- 藏中英三语平行

**翻译要求**:
"完整直译,不意译缩略,重复部分照译,诗歌体尽量对仗;种子字和咒语显示(天城体,罗马拟音,字面意义)"

### 特殊问题

**梵文渲染困难**:
- ⚠️ **所有梵文需人工核验**
- AI混淆:喜金刚vs胜乐金刚、月称vs月官
- 拟音不准、字体单调
- 建议:用专业工具(如DSBC)重新处理

**人名系统复杂**:
- 同一人多种译名
- 示例:龙树 = Klu sgrub = 那伽阿尔朱那
- 解决:参考题记、蓝册

### 内容亮点

**中观-唯识综合**:
- 龙树全集:《中论》《宝鬘论》等
- 无著-世亲:《摄大乘论》《唯识三十颂》
- 寂天:《入菩萨行论》
- 月称:《入中论》《明句论》

**因明学**:
- 陈那-法称传统
- 150+部逻辑、认识论著作
- 汉传罕见

**密续注疏**:
- 1200+部
- 《密集金刚》50部注、《胜乐金刚》80部注
- 详细仪轨

### 使用建议

**学术研究**:
- 核对藏文原文
- 梵文必须重译
- 对照Claude/Gemini版本

**修行应用**:
- 咨询具德上师
- 密续需要灌顶
- 大手印:参考2.5函文集

**不要反馈个别错误**:
- 项目规模不支持逐句修正
- 自己用AI改进

### 附录:大手印道歌AI歌曲

**文化重建实验**:
- 84大成就者道歌(8-12世纪)
- AI音乐生成(Suno AI)
- 印地语为主(最接近古阿帕布兰沙语)

**免责声明**:
> "这些AI歌曲**不是**历史真实重建,仅是文化氛围模拟。不可用于仪轨。"

### 版权说明

- 原典:公共领域(古代文献)
- AI译本:CC BY 4.0许可
- 明确允许AI训练使用

### 引用格式

### 项目网站

https://github.com/Buddhist-Classics-AI-Translation-Series/Buddhist-translations

1.2.3.5.6.7.8.11.12等卷2025年7月-11月制作的g2.0翻译本中有约1%弱数量,整段脱译
的情况,为这种情况做了专门的程序,进行了电子校勘和补译,到20251126还未全部完成。
一般情况升级全表先发布在 
https://huggingface.co/datasets/ospx1u/buddhist-classics-vol1-12/tree/main
之后再发布在zenodo.org。其他数据仓库看情况更新或不更新。

Files

Files (317.0 MB)

Name Size Download all
md5:d6ac9e7075ee4ed4781539847a5c814c
317.0 MB Download