Published March 2, 2026
| Version v2
Dataset
Open
KNoTE
Authors/Creators
-
Park, Seonyeong
(Data curator)1
-
Kim, Gayeon
(Data manager)1
-
Ji, Haein
(Data collector)1
-
Lee, Hagyeong
(Annotator)2
-
Lee, Byeongjoo
(Annotator)3
-
Jeong, Chaeyeon
(Annotator)4
-
Lee, Jae-yeol
(Annotator)1
-
Jo, Gyungmin
(Annotator)1
-
Lim, Iro
(Annotator)1
-
Ismayilov, Orkhan
(Annotator)1
-
Kim, Byungjun
(Supervisor)1
Description
Project Overview
KNoTE (Korean Novel TEI Encoded) dataset
Unlike simple text conversion, this dataset follows the TEI (Text Encoding Initiative) P5 guidelines. It includes detailed metadata, character descriptions, linguistic variations (Hanja/Hangul), and semantic tagging.
Key Features
- TEI Standard: Fully compliant with TEI P5 (
<teiHeader>,<body>,<div>). - Characters: Linked via
xml:idandref(e.g.,<persName ref="#YB">). - Linguistic Mapping: Original Hanja and modern Hangul mapped via
<foreign xml:lang="zh">. - Entities: Places (
<placeName>), Dates (<date>), and Occupations (<occupation>). - Scholarly Metadata: Includes source descriptions, publication history, and revision logs.
XML Structure Example (Snippet)
The dataset uses a hierarchical structure to capture both the content and the context of the literature:
<?xml version="1.0" encoding="UTF-8"?><TEI xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader> <fileDesc> <titleStmt> <title>낙오자</title> <author>이익상</author> <respStmt> <resp>TEI 인코딩</resp> <name>지해인<idno type="ISNI">0000 0005 2802 5223</idno></name> <email>cihayin [at] gmail.com</email> </respStmt> <respStmt> <resp>TEI 검수</resp> <name>박선영<idno type="ORCID">0009-0001-1340-0455</idno></name> <email>sun09125 [at] gmail.com</email> </respStmt> </titleStmt> <publicationStmt> <publisher>한국학중앙연구원 인문정보학과</publisher> </publicationStmt> <sourceDesc> <bibl type="digitalSource" xml:lang="ko"> <title level="a">낙오자</title> <author>이익상</author> <publisher>Wikisource(한국어)</publisher> <idno type="wikisource">https://ko.wikisource.org/wiki/낙오자</idno> <idno type="wikisource-info">https://ko.wikisource.org/w/index.php?title=낙오자&action=info</idno> <note type="acquisition">작업자 지해인이 위키문헌 항목에서 raw data를 취득함.</note> </bibl> </sourceDesc> </fileDesc> <encodingDesc> <projectDesc> <p>본 전자본은 TEI P5 지침(TEI Lite)에 따라 구조화함.</p> </projectDesc> </encodingDesc> <profileDesc> <langUsage> <language ident="ko">Korean</language> </langUsage> <textClass> <keywords scheme="local"> <term>근현대 한국문학</term> <term>단편소설</term> </keywords> </textClass> <particDesc> <listPerson> <person xml:id="ZH"> <persName xml:lang="ko">진화</persName> <persName xml:lang="zh">鎭華</persName> </person> <person xml:id="M"> <persName xml:lang="ko">M</persName> </person> <personGrp xml:id="EP"> <persName xml:lang="ko">모든 사람</persName> <note>진화가 본 길가에 지나가는 모든 사람</note> </personGrp> </listPerson> </particDesc> </profileDesc> <revisionDesc> <change when="2025-12-07" who="#지해인">작업자 지해인이 TEI 인코딩을 완료함.</change> <change when="2026-02-18" who="#박선영">작업자 박선영이 TEI 검수를 완료함.</change> </revisionDesc> </teiHeader> <text> <body> <div> <p>일 개월을 지나지 못하여 자기 수대(<foreign xml:lang="zh">數代</foreign>) 전래하는 주택을 훼철(<foreign xml:lang="zh">毁撤</foreign>)치 아니 못할 운명에 당한 <persName ref="#ZH">진화</persName>는 책보를 곁에 끼고 <orgName>C사</orgName> 정문을 나왔다. 문 앞에서 한 번 주저하며 뒤에 있는 현관을 돌아다보며, <said aloud="true" direct="false" who="#ZH">이곳에 다시 발을 들여놓으면 <rs ref="#ZH">나</rs>는 사람이 아니</said>라고 중얼거리며 나왔다. <said aloud="false" direct="false" mode="thought" who="#ZH">위선자, 협잡배들이 가면을 쓰고 권력하에서 굽실굽실 아첨하는 것을 차마 볼 수 없다</said>고 <persName ref="#ZH">진화(<foreign xml:lang="zh">鎭華</foreign>)</persName>는 생각했다. <rs ref="#ZH">그</rs>는 머리를 들어 가로에 분주히 다니는 <persName ref="EP">모든 사람</persName> 얼굴을 의미 있게 쳐다보았다. 다 평화로운 듯하다. <said who="#EP" aloud="false" direct="true" mode="thought" agent="#ZH"><rs ref="#ZH">너</rs>는 <rs ref="#ZH" type="epithet">낙오자</rs>이다······.</said> 조소하는 것 같다. <rs ref="#ZH">그</rs>의 머리에서는 한 달 지나면 집을 헐어야 하는 것이 간단없이 울리어 온다.</p> </div> </body> </text></TEI>List of Works
| No. | Author | Title (English / Transliteration) | Date |
| 1 | Yi In-jik | Tears of Blood (Hyeol-ui Nu) | 1906 |
| 2 | Yi Hae-jo | The Iron World (Cheol-segye) | 1908 |
| 3 | Yi Kwang-su | The Heartless (Mujeong - Short Story) | 1910 |
| 4 | Yi Hae-jo | Blood of Flowers (Hwa-ui Hyeol) | 1911.04 |
| 5 | Kim Myeong-sun | The Girl of Mystery (Uisim-ui Sonyeo) | 1917.11 |
| 6 | Na Hye-seok | Kyung-hee | 1918.03 |
| 7 | Na Hye-seok | To the Revived Granddaughter | 1918.09 |
| 8 | Kim Dong-in | The Sorrows of the Weak | 1919.02~03 |
| 9 | Yi Ik-sang | The Straggler (Nagoja) | 1919.07.14 |
| 10 | Hyun Jin-geon | A Poor Wife (Bincheo) | 1921.01 |
| 11 | Na Hye-seok | Gyu-won | 1921.07 |
| 12 | Hyun Jin-geon | A Society That Drives You to Drink | 1921.11 |
| 13 | Choi Seo-hae | Nostalgia (Hyangsu) | 1924.04 |
| 14 | Hyun Jin-geon | A Lucky Day (Unsu Joeun Nal) | 1924.06 |
| 15 | Kim Dong-in | Potato (Gamja) | 1925.01 |
| 16 | Hyun Jin-geon | Director B and the Love Letters | 1925.02 |
| 17 | Na Do-hyang | The Watermill (Mullebang-a) | 1925.09 |
| 18 | Bang Jeong-hwan | For Our Friends | 1927.02 |
| 19 | Bang Jeong-hwan | The Eternal Shirt (Mannyeon Shirt) | 1927.03 |
| 20 | Bang Jeong-hwan | The Gold Watch | 1929.01~02 |
| 21 | Kim Dong-in | Dr. K’s Research | 1929.12 |
| 22 | Kim Nam-cheon | Water (Mul) | 1933.06 |
| 23 | Chae Man-sik | Ready-made Life | 1934.05~07 |
| 24 | Kang Kyeong-ae | Salt (Sogeum) | 1934.05~10 |
| 25 | Gye Yong-mook | Adada the Idiot (Baekchi Adada) | 1935 |
| 26 | Kim Yu-jeong | The Camellias (Dongbaek-kkot) | 1936.05 |
| 27 | Yi Sang | The Wings (Nalgae) | 1936.09 |
| 28 | Yi Hyo-seok | When Buckwheat Flowers Bloom | 1936.10 |
| 29 | Chae Man-sik | Uncle Chi-suk | 1938 |
| 30 | Jeong In-taek | Melancholy (Uuljeung) | 1940.09 |
| 31 | Kim Sa-ryang | The Man Met in the Detention Center | 1941 |
| 32 | Ji Ha-ryeon | The Journey (Dojeong) | 1946.07 |
| 33 | Kang So-cheon | The Photo Studio that Takes Pictures of Dreams | 1954.03 |
Files
01_이인직_혈의 누.xml
Files
(2.7 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:087d20c927fd3d087db20016612d01d1
|
320.0 kB | Preview Download |
|
md5:157b2aff4b9a77cf734e2c11f9a67c12
|
299.2 kB | Preview Download |
|
md5:40c9841b66d5aed0654e60e20fe42c8d
|
31.8 kB | Preview Download |
|
md5:deccf536cf6a0c1d22e3d573f720d3d5
|
394.6 kB | Preview Download |
|
md5:1229b615335d814a95b52d92f94275e2
|
27.7 kB | Preview Download |
|
md5:b14e6a7f291acdf75552851ab84d4326
|
105.4 kB | Preview Download |
|
md5:42ea4e1946b377a4320de64b8b428522
|
23.5 kB | Preview Download |
|
md5:ce4b7a787deb40f13e2692aeedebea9d
|
203.8 kB | Preview Download |
|
md5:bea0a9a56545ef8bfb5a6757fc0f0ef6
|
8.7 kB | Preview Download |
|
md5:442b3f628ab299f2ca5f5569ecc4753b
|
59.1 kB | Preview Download |
|
md5:d49addd34aab0417faba3f9a26da8ac4
|
49.2 kB | Preview Download |
|
md5:767d96d38c8b1b26258b522c3e3376fa
|
40.5 kB | Preview Download |
|
md5:9ca91364b40132e9c90faf7f1bf93c82
|
23.3 kB | Preview Download |
|
md5:778ba44e1f49b1bf866dd58e240bfe0d
|
40.9 kB | Preview Download |
|
md5:c1978961c610d53a33c8c69461a2840a
|
37.9 kB | Preview Download |
|
md5:c07673446c602f6462467fc747fa636b
|
23.1 kB | Preview Download |
|
md5:fe3f5dc3a8c28e281a16e18cbbf7faab
|
55.8 kB | Preview Download |
|
md5:67637cccd45e36d04cd31848bd2c2abb
|
29.5 kB | Preview Download |
|
md5:8ea4eef25812e6c36abc901ccf814d70
|
28.5 kB | Preview Download |
|
md5:5d1e1b8845ee81218766f8600445d3eb
|
41.0 kB | Preview Download |
|
md5:e8b9f62c3e6a8a98a7cc3a3ccd407287
|
68.0 kB | Preview Download |
|
md5:5db1000923f730f7132c21b2d3a82310
|
32.7 kB | Preview Download |
|
md5:29062df9c305f0efe19c94db08ef30fd
|
102.9 kB | Preview Download |
|
md5:5eae7a1e0c80f5791b9356b1a36d70f4
|
166.6 kB | Preview Download |
|
md5:24c9f1a4ba633d00f7046bba65a9a53b
|
49.6 kB | Preview Download |
|
md5:64634c13814a659ebd3032e9496d4663
|
29.5 kB | Preview Download |
|
md5:3c7fb79743a5bee4f26b9b1c0073ba72
|
86.4 kB | Preview Download |
|
md5:a92966d24afa953de740c55d96827b55
|
34.4 kB | Preview Download |
|
md5:f159e8352231f95c4003cfa295dfeaf7
|
64.7 kB | Preview Download |
|
md5:9d089fb1e7cb39cb5d77fd77cab2c8f4
|
55.4 kB | Preview Download |
|
md5:f76b0dadc1d632698beeaa251deb56da
|
64.7 kB | Preview Download |
|
md5:6bb53be24949d2fd02fad6c008186d5d
|
79.8 kB | Preview Download |
|
md5:0014bc5808559fc07280cf21518fcd65
|
25.5 kB | Preview Download |
Additional details
Funding
- Academy of Korean Studies
- TEI/XML Construction Methodology for Korean Modern Literature in Humanities Data Design Education AKSR2025-RE10
Dates
- Issued
-
2026-02-18The date of v1 release
Software
- Repository URL
- https://github.com/AKS-DHLAB/ModernKoreanNovelsTEI
- Programming language
- Python , XML
- Development Status
- Active