Published March 2, 2026 | Version v2
Dataset Open

KNoTE

Description

Project Overview

KNoTE (Korean Novel TEI Encoded) dataset

Unlike simple text conversion, this dataset follows the TEI (Text Encoding Initiative) P5 guidelines. It includes detailed metadata, character descriptions, linguistic variations (Hanja/Hangul), and semantic tagging.

Key Features

  • TEI Standard: Fully compliant with TEI P5 (<teiHeader>, <body>, <div>).
  • Characters: Linked via xml:id and ref (e.g., <persName ref="#YB">).
  • Linguistic Mapping: Original Hanja and modern Hangul mapped via <foreign xml:lang="zh">.
  • Entities: Places (<placeName>), Dates (<date>), and Occupations (<occupation>).
  • Scholarly Metadata: Includes source descriptions, publication history, and revision logs.

XML Structure Example (Snippet)

The dataset uses a hierarchical structure to capture both the content and the context of the literature:

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>낙오자</title>
        <author>이익상</author>
        <respStmt>
          <resp>TEI 인코딩</resp>
          <name>지해인<idno type="ISNI">0000 0005 2802 5223</idno></name>
          <email>cihayin [at] gmail.com</email>
        </respStmt>
        <respStmt>
          <resp>TEI 검수</resp>
          <name>박선영<idno type="ORCID">0009-0001-1340-0455</idno></name>
          <email>sun09125 [at] gmail.com</email>
        </respStmt>
      </titleStmt>
      <publicationStmt>
        <publisher>한국학중앙연구원 인문정보학과</publisher>
      </publicationStmt>
      <sourceDesc>
        <bibl type="digitalSource" xml:lang="ko">
          <title level="a">낙오자</title>
          <author>이익상</author>
          <publisher>Wikisource(한국어)</publisher>
          <idno type="wikisource">https://ko.wikisource.org/wiki/낙오자</idno>
          <idno type="wikisource-info">https://ko.wikisource.org/w/index.php?title=낙오자&amp;action=info</idno>
          <note type="acquisition">작업자 지해인이 위키문헌 항목에서 raw data를 취득함.</note>
        </bibl>
      </sourceDesc>
    </fileDesc>
    <encodingDesc>
      <projectDesc>
        <p>본 전자본은 TEI P5 지침(TEI Lite)에 따라 구조화함.</p>
      </projectDesc>
    </encodingDesc>
    <profileDesc>
      <langUsage>
        <language ident="ko">Korean</language>
      </langUsage>
      <textClass>
        <keywords scheme="local">
          <term>근현대 한국문학</term>
          <term>단편소설</term>
        </keywords>
      </textClass>
      <particDesc>
        <listPerson>
          <person xml:id="ZH">
            <persName xml:lang="ko">진화</persName>
            <persName xml:lang="zh">鎭華</persName>
          </person>
          <person xml:id="M">
            <persName xml:lang="ko">M</persName>
          </person>
          <personGrp xml:id="EP">
            <persName xml:lang="ko">모든 사람</persName>
            <note>진화가 본 길가에 지나가는 모든 사람</note>
          </personGrp>
        </listPerson>
      </particDesc>
    </profileDesc>
    <revisionDesc>
      <change when="2025-12-07" who="#지해인">작업자 지해인이 TEI 인코딩을 완료함.</change>
      <change when="2026-02-18" who="#박선영">작업자 박선영이 TEI 검수를 완료함.</change>
    </revisionDesc>
  </teiHeader>
  <text>
    <body>
      <div>
        <p>일 개월을 지나지 못하여 자기 수대(<foreign xml:lang="zh">數代</foreign>) 전래하는 주택을 훼철(<foreign xml:lang="zh">毁撤</foreign>)치 아니 못할 운명에 당한 <persName ref="#ZH">진화</persName>는 책보를 곁에 끼고 <orgName>C사</orgName> 정문을 나왔다. 문 앞에서 한 번 주저하며 뒤에 있는 현관을 돌아다보며, <said aloud="true" direct="false" who="#ZH">이곳에 다시 발을 들여놓으면 <rs ref="#ZH">나</rs>는 사람이 아니</said>라고 중얼거리며 나왔다. <said aloud="false" direct="false" mode="thought" who="#ZH">위선자, 협잡배들이 가면을 쓰고 권력하에서 굽실굽실 아첨하는 것을 차마 볼 수 없다</said>고 <persName ref="#ZH">진화(<foreign xml:lang="zh">鎭華</foreign>)</persName>는 생각했다. <rs ref="#ZH">그</rs>는 머리를 들어 가로에 분주히 다니는 <persName ref="EP">모든 사람</persName> 얼굴을 의미 있게 쳐다보았다. 다 평화로운 듯하다. <said who="#EP" aloud="false" direct="true" mode="thought" agent="#ZH"><rs ref="#ZH">너</rs>는 <rs ref="#ZH" type="epithet">낙오자</rs>이다······.</said> 조소하는 것 같다. <rs ref="#ZH">그</rs>의 머리에서는 한 달 지나면 집을 헐어야 하는 것이 간단없이 울리어 온다.</p>
      </div>
    </body>
  </text>
</TEI>

List of Works

No. Author Title (English / Transliteration) Date
1 Yi In-jik Tears of Blood (Hyeol-ui Nu) 1906
2 Yi Hae-jo The Iron World (Cheol-segye) 1908
3 Yi Kwang-su The Heartless (Mujeong - Short Story) 1910
4 Yi Hae-jo Blood of Flowers (Hwa-ui Hyeol) 1911.04
5 Kim Myeong-sun The Girl of Mystery (Uisim-ui Sonyeo) 1917.11
6 Na Hye-seok Kyung-hee 1918.03
7 Na Hye-seok To the Revived Granddaughter 1918.09
8 Kim Dong-in The Sorrows of the Weak 1919.02~03
9 Yi Ik-sang The Straggler (Nagoja) 1919.07.14
10 Hyun Jin-geon A Poor Wife (Bincheo) 1921.01
11 Na Hye-seok Gyu-won 1921.07
12 Hyun Jin-geon A Society That Drives You to Drink 1921.11
13 Choi Seo-hae Nostalgia (Hyangsu) 1924.04
14 Hyun Jin-geon A Lucky Day (Unsu Joeun Nal) 1924.06
15 Kim Dong-in Potato (Gamja) 1925.01
16 Hyun Jin-geon Director B and the Love Letters 1925.02
17 Na Do-hyang The Watermill (Mullebang-a) 1925.09
18 Bang Jeong-hwan For Our Friends 1927.02
19 Bang Jeong-hwan The Eternal Shirt (Mannyeon Shirt) 1927.03
20 Bang Jeong-hwan The Gold Watch 1929.01~02
21 Kim Dong-in Dr. K’s Research 1929.12
22 Kim Nam-cheon Water (Mul) 1933.06
23 Chae Man-sik Ready-made Life 1934.05~07
24 Kang Kyeong-ae Salt (Sogeum) 1934.05~10
25 Gye Yong-mook Adada the Idiot (Baekchi Adada) 1935
26 Kim Yu-jeong The Camellias (Dongbaek-kkot) 1936.05
27 Yi Sang The Wings (Nalgae) 1936.09
28 Yi Hyo-seok When Buckwheat Flowers Bloom 1936.10
29 Chae Man-sik Uncle Chi-suk 1938
30 Jeong In-taek Melancholy (Uuljeung) 1940.09
31 Kim Sa-ryang The Man Met in the Detention Center 1941
32 Ji Ha-ryeon The Journey (Dojeong) 1946.07
33 Kang So-cheon The Photo Studio that Takes Pictures of Dreams 1954.03

Files

01_이인직_혈의 누.xml

Files (2.7 MB)

Name Size Download all
md5:087d20c927fd3d087db20016612d01d1
320.0 kB Preview Download
md5:157b2aff4b9a77cf734e2c11f9a67c12
299.2 kB Preview Download
md5:40c9841b66d5aed0654e60e20fe42c8d
31.8 kB Preview Download
md5:deccf536cf6a0c1d22e3d573f720d3d5
394.6 kB Preview Download
md5:1229b615335d814a95b52d92f94275e2
27.7 kB Preview Download
md5:b14e6a7f291acdf75552851ab84d4326
105.4 kB Preview Download
md5:42ea4e1946b377a4320de64b8b428522
23.5 kB Preview Download
md5:ce4b7a787deb40f13e2692aeedebea9d
203.8 kB Preview Download
md5:bea0a9a56545ef8bfb5a6757fc0f0ef6
8.7 kB Preview Download
md5:442b3f628ab299f2ca5f5569ecc4753b
59.1 kB Preview Download
md5:d49addd34aab0417faba3f9a26da8ac4
49.2 kB Preview Download
md5:767d96d38c8b1b26258b522c3e3376fa
40.5 kB Preview Download
md5:9ca91364b40132e9c90faf7f1bf93c82
23.3 kB Preview Download
md5:778ba44e1f49b1bf866dd58e240bfe0d
40.9 kB Preview Download
md5:c1978961c610d53a33c8c69461a2840a
37.9 kB Preview Download
md5:c07673446c602f6462467fc747fa636b
23.1 kB Preview Download
md5:fe3f5dc3a8c28e281a16e18cbbf7faab
55.8 kB Preview Download
md5:67637cccd45e36d04cd31848bd2c2abb
29.5 kB Preview Download
md5:8ea4eef25812e6c36abc901ccf814d70
28.5 kB Preview Download
md5:5d1e1b8845ee81218766f8600445d3eb
41.0 kB Preview Download
md5:e8b9f62c3e6a8a98a7cc3a3ccd407287
68.0 kB Preview Download
md5:5db1000923f730f7132c21b2d3a82310
32.7 kB Preview Download
md5:29062df9c305f0efe19c94db08ef30fd
102.9 kB Preview Download
md5:5eae7a1e0c80f5791b9356b1a36d70f4
166.6 kB Preview Download
md5:24c9f1a4ba633d00f7046bba65a9a53b
49.6 kB Preview Download
md5:64634c13814a659ebd3032e9496d4663
29.5 kB Preview Download
md5:3c7fb79743a5bee4f26b9b1c0073ba72
86.4 kB Preview Download
md5:a92966d24afa953de740c55d96827b55
34.4 kB Preview Download
md5:f159e8352231f95c4003cfa295dfeaf7
64.7 kB Preview Download
md5:9d089fb1e7cb39cb5d77fd77cab2c8f4
55.4 kB Preview Download
md5:f76b0dadc1d632698beeaa251deb56da
64.7 kB Preview Download
md5:6bb53be24949d2fd02fad6c008186d5d
79.8 kB Preview Download
md5:0014bc5808559fc07280cf21518fcd65
25.5 kB Preview Download

Additional details

Funding

Academy of Korean Studies
TEI/XML Construction Methodology for Korean Modern Literature in Humanities Data Design Education AKSR2025-RE10

Dates

Issued
2026-02-18
The date of v1 release

Software

Repository URL
https://github.com/AKS-DHLAB/ModernKoreanNovelsTEI
Programming language
Python , XML
Development Status
Active