Published April 27, 2026
| Version v4
Dataset
Open
KNoTE dataset
Authors/Creators
-
Kim, Gayeon
(Data manager)1
-
Park, Seonyeong
(Data curator)1
-
Ji, Haein
(Data collector)1
-
Lee, Hagyeong
(Annotator)2
-
Lee, Byeongjoo
(Annotator)3
-
Jeong, Chaeyeon
(Annotator)4
-
Lee, Jae-yeol
(Annotator)1
-
Jo, Gyungmin
(Annotator)1
-
Lim, Iro
(Annotator)1
-
Ismayilov, Orkhan
(Annotator)1
-
Kim, Byungjun
(Supervisor)1
Description
Project Overview
KNoTE (Korean Novel TEI Encoded) dataset
Unlike simple text conversion, this dataset follows the TEI (Text Encoding Initiative) P5 guidelines. It includes detailed metadata, character descriptions, linguistic variations (Hanja/Hangul), and semantic tagging.
Key Features
- TEI Standard: Fully compliant with TEI P5 (
<teiHeader>,<body>,<div>). - Characters: Linked via
xml:idandref(e.g.,<persName ref="#YB">). - Linguistic Mapping: Original Hanja and modern Hangul mapped via
<foreign xml:lang="zh">. - Entities: Places (
<placeName>), Dates (<date>), and Occupations (<occupation>). - Scholarly Metadata: Includes source descriptions, publication history, and revision logs.
XML Structure Example (Snippet)
The dataset uses a hierarchical structure to capture both the content and the context of the literature:
<?xml version="1.0" encoding="UTF-8"?><TEI xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader> <fileDesc> <titleStmt> <title>낙오자</title> <author>이익상</author> <respStmt> <resp>TEI 인코딩</resp> <name>지해인<idno type="ISNI">0000 0005 2802 5223</idno></name> <email>cihayin [at] gmail.com</email> </respStmt> <respStmt> <resp>TEI 검수</resp> <name>박선영<idno type="ORCID">0009-0001-1340-0455</idno></name> <email>sun09125 [at] gmail.com</email> </respStmt> </titleStmt> <publicationStmt> <publisher>한국학중앙연구원 인문정보학과</publisher> </publicationStmt> <sourceDesc> <bibl type="digitalSource" xml:lang="ko"> <title level="a">낙오자</title> <author>이익상</author> <publisher>Wikisource(한국어)</publisher> <idno type="wikisource">https://ko.wikisource.org/wiki/낙오자</idno> <idno type="wikisource-info">https://ko.wikisource.org/w/index.php?title=낙오자&action=info</idno> <note type="acquisition">작업자 지해인이 위키문헌 항목에서 raw data를 취득함.</note> </bibl> </sourceDesc> </fileDesc> <encodingDesc> <projectDesc> <p>본 전자본은 TEI P5 지침(TEI Lite)에 따라 구조화함.</p> </projectDesc> </encodingDesc> <profileDesc> <langUsage> <language ident="ko">Korean</language> </langUsage> <textClass> <keywords scheme="local"> <term>근현대 한국문학</term> <term>단편소설</term> </keywords> </textClass> <particDesc> <listPerson> <person xml:id="ZH"> <persName xml:lang="ko">진화</persName> <persName xml:lang="zh">鎭華</persName> </person> <person xml:id="M"> <persName xml:lang="ko">M</persName> </person> <personGrp xml:id="EP"> <persName xml:lang="ko">모든 사람</persName> <note>진화가 본 길가에 지나가는 모든 사람</note> </personGrp> </listPerson> </particDesc> </profileDesc> <revisionDesc> <change when="2025-12-07" who="#지해인">작업자 지해인이 TEI 인코딩을 완료함.</change> <change when="2026-02-18" who="#박선영">작업자 박선영이 TEI 검수를 완료함.</change> </revisionDesc> </teiHeader> <text> <body> <div> <p>일 개월을 지나지 못하여 자기 수대(<foreign xml:lang="zh">數代</foreign>) 전래하는 주택을 훼철(<foreign xml:lang="zh">毁撤</foreign>)치 아니 못할 운명에 당한 <persName ref="#ZH">진화</persName>는 책보를 곁에 끼고 <orgName>C사</orgName> 정문을 나왔다. 문 앞에서 한 번 주저하며 뒤에 있는 현관을 돌아다보며, <said aloud="true" direct="false" who="#ZH">이곳에 다시 발을 들여놓으면 <rs ref="#ZH">나</rs>는 사람이 아니</said>라고 중얼거리며 나왔다. <said aloud="false" direct="false" mode="thought" who="#ZH">위선자, 협잡배들이 가면을 쓰고 권력하에서 굽실굽실 아첨하는 것을 차마 볼 수 없다</said>고 <persName ref="#ZH">진화(<foreign xml:lang="zh">鎭華</foreign>)</persName>는 생각했다. <rs ref="#ZH">그</rs>는 머리를 들어 가로에 분주히 다니는 <persName ref="EP">모든 사람</persName> 얼굴을 의미 있게 쳐다보았다. 다 평화로운 듯하다. <said who="#EP" aloud="false" direct="true" mode="thought" agent="#ZH"><rs ref="#ZH">너</rs>는 <rs ref="#ZH" type="epithet">낙오자</rs>이다······.</said> 조소하는 것 같다. <rs ref="#ZH">그</rs>의 머리에서는 한 달 지나면 집을 헐어야 하는 것이 간단없이 울리어 온다.</p> </div> </body> </text></TEI>List of Works
| No. | Author (English) | Author (Korean) | Title (English / Transliteration) | Title (Korean) | Year |
| 1 | Yi In-jik | 이인직 | Tears of Blood (Hyeol-ui Nu) | 혈의 누 | 1906 |
| 2 | Yi Hae-jo | 이해조 | The Iron World (Cheol-segye) | 철세계 | 1908 |
| 3 | Yi Kwang-su | 이광수 | The Heartless (Mujeong - Short Story) | 무정(단편) | 1910 |
| 4 | Yi Hae-jo | 이해조 | Blood of Flowers (Hwa-ui Hyeol) | 화의 혈 | 1911 |
| 5 | Kim Myeong-sun | 김명순 | The Girl of Mystery (Uisim-ui Sonyeo) | 의심의 소녀 | 1917 |
| 6 | Na Hye-seok | 나혜석 | Kyung-hee | 경희 | 1918 |
| 7 | Na Hye-seok | 나혜석 | To the Revived Granddaughter | 회생한 손녀에게 | 1918 |
| 8 | Kim Dong-in | 김동인 | The Sorrows of the Weak | 약한 자의 슬픔 | 1919 |
| 9 | Yi Ik-sang | 이익상 | The Straggler (Nagoja) | 낙오자 | 1919 |
| 10 | Hyun Jin-geon | 현진건 | A Poor Wife (Bincheo) | 빈처 | 1921 |
| 11 | Na Hye-seok | 나혜석 | Gyu-won | 규원 | 1921 |
| 12 | Hyun Jin-geon | 현진건 | A Society That Drives You to Drink | 술 권하는 사회 | 1921 |
| 13 | Choi Seo-hae | 최서해 | Nostalgia (Hyangsu) | 향수 | 1924 |
| 14 | Hyun Jin-geon | 현진건 | A Lucky Day (Unsu Joeun Nal) | 운수 좋은 날 | 1924 |
| 15 | Kim Dong-in | 김동인 | Potato (Gamja) | 감자 | 1925 |
| 16 | Hyun Jin-geon | 현진건 | Director B and the Love Letters | B사감과 러브레터 | 1925 |
| 17 | Na Do-hyang | 나도향 | The Watermill (Mullebang-a) | 물레방아 | 1925 |
| 18 | Bang Jeong-hwan | 방정환 | For Our Friends | 동무를 위하여 | 1927 |
| 19 | Bang Jeong-hwan | 방정환 | The Eternal Shirt (Mannyeon Shirt) | 만년 셔츠 | 1927 |
| 20 | Bang Jeong-hwan | 방정환 | The Gold Watch | 금시계 | 1929 |
| 21 | Kim Dong-in | 김동인 | Dr. K's Research | K박사의 연구 | 1929 |
| 22 | Kim Nam-cheon | 김남천 | Water (Mul) | 물 | 1933 |
| 23 | Chae Man-sik | 채만식 | Ready-made Life | 레디메이드 인생 | 1934 |
| 24 | Kang Kyeong-ae | 강경애 | Salt (Sogeum) | 소금 | 1934 |
| 25 | Gye Yong-mook | 계용묵 | Adada the Idiot (Baekchi Adada) | 백치 아다다 | 1935 |
| 26 | Kim Yu-jeong | 김유정 | The Camellias (Dongbaek-kkot) | 동백꽃 | 1936 |
| 27 | Yi Sang | 이상 | The Wings (Nalgae) | 날개 | 1936 |
| 28 | Yi Hyo-seok | 이효석 | When Buckwheat Flowers Bloom | 메밀꽃 필 무렵 | 1936 |
| 29 | Chae Man-sik | 채만식 | Uncle Chi-suk | 치숙 | 1938 |
| 30 | Jeong In-taek | 정인택 | Melancholy (Uuljeung) | 우울증 | 1940 |
| 31 | Kim Sa-ryang | 김사량 | The Man Met in the Detention Center | 유치장에서 만난 사나이 | 1941 |
| 32 | Ji Ha-ryeon | 지하련 | The Journey (Dojeong) | 도정 | 1946 |
| 33 | Kang So-cheon | 강소천 | The Photo Studio that Takes Pictures of Dreams | 꿈을 찍는 사진관 | 1954 |
Files
01_Yi In-jik_Tears of Blood (Hyeol-ui Nu).xml
Files
(2.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:333015b2e754709150942f3d73346ef0
|
353.3 kB | Preview Download |
|
md5:7ffa8acfc76f56bf0986500340c957b3
|
337.7 kB | Preview Download |
|
md5:3a312431ca64a0db9f32b8d34af9ab10
|
32.4 kB | Preview Download |
|
md5:b515babfc79b404dd49afb6ebec04b82
|
420.4 kB | Preview Download |
|
md5:0df6343132c56f60f3988b8f0e327841
|
31.1 kB | Preview Download |
|
md5:8be177c37dd404a0fb8bd651b6ab9bb4
|
110.9 kB | Preview Download |
|
md5:58f5822c4495ca153d12a1adcac10be1
|
24.2 kB | Preview Download |
|
md5:d0a37a52f9381963640fc38e978c25b0
|
223.5 kB | Preview Download |
|
md5:7681283cef229631e8f1bc051a7ae9e0
|
9.9 kB | Preview Download |
|
md5:0fb20080b38cb197614bf33ff8f8ff50
|
63.8 kB | Preview Download |
|
md5:4ebb6e2168642cf6c1f70f2f3c29e47b
|
52.5 kB | Preview Download |
|
md5:128aa0470e3aa00989471d959cfc230e
|
44.0 kB | Preview Download |
|
md5:1efc316d3d4eba31cec1c5e8041877f1
|
26.0 kB | Preview Download |
|
md5:962612dfef6bf4fedb94513b42218480
|
44.4 kB | Preview Download |
|
md5:8f6976f6548a53f47217b53f10f15843
|
42.2 kB | Preview Download |
|
md5:be9fe83b86bc21715a5b9e326127d2e7
|
25.7 kB | Preview Download |
|
md5:ba8b0b5ab56212220876184059bf499f
|
60.5 kB | Preview Download |
|
md5:240a4e18b0dac1e459636f9290c8d962
|
32.6 kB | Preview Download |
|
md5:ca38d091757397313d18c3ff19f1372e
|
30.3 kB | Preview Download |
|
md5:d372124d9844f307cd5a51425a14be3f
|
44.1 kB | Preview Download |
|
md5:376f56109f1d396ad79ca52352b4d3bd
|
71.4 kB | Preview Download |
|
md5:a65c10ac31eeba8cba706569b3e4004e
|
32.8 kB | Preview Download |
|
md5:f04835230220f9cac9dc07b5144b4e33
|
112.8 kB | Preview Download |
|
md5:b699f481acbc333e07930f10ee2f6d82
|
178.0 kB | Preview Download |
|
md5:eccda4e26323b7089b2583bd03bc158b
|
52.6 kB | Preview Download |
|
md5:aa303e2ee6c05e13fb0bf97539188136
|
32.8 kB | Preview Download |
|
md5:5684d077fbb0ae7f2d3ab032ac2085b5
|
94.0 kB | Preview Download |
|
md5:4af120204b240d96e6c4c2987ba67704
|
37.7 kB | Preview Download |
|
md5:cae054015ef8d1f32a9076652384158e
|
69.8 kB | Preview Download |
|
md5:01c421c3ede1e9ceee2221e887b60f72
|
59.3 kB | Preview Download |
|
md5:74cf74fc3690329f0f0b3bd17e4470a6
|
68.1 kB | Preview Download |
|
md5:05e074240bb6c4aedf2a5ad5cc69ed88
|
86.7 kB | Preview Download |
|
md5:10c0ff6afba89854f264e5a7c65cc654
|
27.4 kB | Preview Download |
Additional details
Funding
- Academy of Korean Studies
- TEI/XML Construction Methodology for Korean Modern Literature in Humanities Data Design Education AKSR2025-RE10
Dates
- Issued
-
2026-02-18The date of v1 release
Software
- Repository URL
- https://github.com/AKS-DHLAB/KNoTE
- Programming language
- Python , XML
- Development Status
- Active