Published April 27, 2026 | Version v4
Dataset Open

KNoTE dataset

Description

Project Overview

KNoTE (Korean Novel TEI Encoded) dataset

Unlike simple text conversion, this dataset follows the TEI (Text Encoding Initiative) P5 guidelines. It includes detailed metadata, character descriptions, linguistic variations (Hanja/Hangul), and semantic tagging.

Key Features

  • TEI Standard: Fully compliant with TEI P5 (<teiHeader>, <body>, <div>).
  • Characters: Linked via xml:id and ref (e.g., <persName ref="#YB">).
  • Linguistic Mapping: Original Hanja and modern Hangul mapped via <foreign xml:lang="zh">.
  • Entities: Places (<placeName>), Dates (<date>), and Occupations (<occupation>).
  • Scholarly Metadata: Includes source descriptions, publication history, and revision logs.

XML Structure Example (Snippet)

The dataset uses a hierarchical structure to capture both the content and the context of the literature:

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>낙오자</title>
        <author>이익상</author>
        <respStmt>
          <resp>TEI 인코딩</resp>
          <name>지해인<idno type="ISNI">0000 0005 2802 5223</idno></name>
          <email>cihayin [at] gmail.com</email>
        </respStmt>
        <respStmt>
          <resp>TEI 검수</resp>
          <name>박선영<idno type="ORCID">0009-0001-1340-0455</idno></name>
          <email>sun09125 [at] gmail.com</email>
        </respStmt>
      </titleStmt>
      <publicationStmt>
        <publisher>한국학중앙연구원 인문정보학과</publisher>
      </publicationStmt>
      <sourceDesc>
        <bibl type="digitalSource" xml:lang="ko">
          <title level="a">낙오자</title>
          <author>이익상</author>
          <publisher>Wikisource(한국어)</publisher>
          <idno type="wikisource">https://ko.wikisource.org/wiki/낙오자</idno>
          <idno type="wikisource-info">https://ko.wikisource.org/w/index.php?title=낙오자&amp;action=info</idno>
          <note type="acquisition">작업자 지해인이 위키문헌 항목에서 raw data를 취득함.</note>
        </bibl>
      </sourceDesc>
    </fileDesc>
    <encodingDesc>
      <projectDesc>
        <p>본 전자본은 TEI P5 지침(TEI Lite)에 따라 구조화함.</p>
      </projectDesc>
    </encodingDesc>
    <profileDesc>
      <langUsage>
        <language ident="ko">Korean</language>
      </langUsage>
      <textClass>
        <keywords scheme="local">
          <term>근현대 한국문학</term>
          <term>단편소설</term>
        </keywords>
      </textClass>
      <particDesc>
        <listPerson>
          <person xml:id="ZH">
            <persName xml:lang="ko">진화</persName>
            <persName xml:lang="zh">鎭華</persName>
          </person>
          <person xml:id="M">
            <persName xml:lang="ko">M</persName>
          </person>
          <personGrp xml:id="EP">
            <persName xml:lang="ko">모든 사람</persName>
            <note>진화가 본 길가에 지나가는 모든 사람</note>
          </personGrp>
        </listPerson>
      </particDesc>
    </profileDesc>
    <revisionDesc>
      <change when="2025-12-07" who="#지해인">작업자 지해인이 TEI 인코딩을 완료함.</change>
      <change when="2026-02-18" who="#박선영">작업자 박선영이 TEI 검수를 완료함.</change>
    </revisionDesc>
  </teiHeader>
  <text>
    <body>
      <div>
        <p>일 개월을 지나지 못하여 자기 수대(<foreign xml:lang="zh">數代</foreign>) 전래하는 주택을 훼철(<foreign xml:lang="zh">毁撤</foreign>)치 아니 못할 운명에 당한 <persName ref="#ZH">진화</persName>는 책보를 곁에 끼고 <orgName>C사</orgName> 정문을 나왔다. 문 앞에서 한 번 주저하며 뒤에 있는 현관을 돌아다보며, <said aloud="true" direct="false" who="#ZH">이곳에 다시 발을 들여놓으면 <rs ref="#ZH">나</rs>는 사람이 아니</said>라고 중얼거리며 나왔다. <said aloud="false" direct="false" mode="thought" who="#ZH">위선자, 협잡배들이 가면을 쓰고 권력하에서 굽실굽실 아첨하는 것을 차마 볼 수 없다</said>고 <persName ref="#ZH">진화(<foreign xml:lang="zh">鎭華</foreign>)</persName>는 생각했다. <rs ref="#ZH">그</rs>는 머리를 들어 가로에 분주히 다니는 <persName ref="EP">모든 사람</persName> 얼굴을 의미 있게 쳐다보았다. 다 평화로운 듯하다. <said who="#EP" aloud="false" direct="true" mode="thought" agent="#ZH"><rs ref="#ZH">너</rs>는 <rs ref="#ZH" type="epithet">낙오자</rs>이다······.</said> 조소하는 것 같다. <rs ref="#ZH">그</rs>의 머리에서는 한 달 지나면 집을 헐어야 하는 것이 간단없이 울리어 온다.</p>
      </div>
    </body>
  </text>
</TEI>

List of Works

No. Author (English) Author (Korean) Title (English / Transliteration) Title (Korean) Year
1 Yi In-jik 이인직 Tears of Blood (Hyeol-ui Nu) 혈의 누 1906
2 Yi Hae-jo 이해조 The Iron World (Cheol-segye) 철세계 1908
3 Yi Kwang-su 이광수 The Heartless (Mujeong - Short Story) 무정(단편) 1910
4 Yi Hae-jo 이해조 Blood of Flowers (Hwa-ui Hyeol) 화의 혈 1911
5 Kim Myeong-sun 김명순 The Girl of Mystery (Uisim-ui Sonyeo) 의심의 소녀 1917
6 Na Hye-seok 나혜석 Kyung-hee 경희 1918
7 Na Hye-seok 나혜석 To the Revived Granddaughter 회생한 손녀에게 1918
8 Kim Dong-in 김동인 The Sorrows of the Weak 약한 자의 슬픔 1919
9 Yi Ik-sang 이익상 The Straggler (Nagoja) 낙오자 1919
10 Hyun Jin-geon 현진건 A Poor Wife (Bincheo) 빈처 1921
11 Na Hye-seok 나혜석 Gyu-won 규원 1921
12 Hyun Jin-geon 현진건 A Society That Drives You to Drink 술 권하는 사회 1921
13 Choi Seo-hae 최서해 Nostalgia (Hyangsu) 향수 1924
14 Hyun Jin-geon 현진건 A Lucky Day (Unsu Joeun Nal) 운수 좋은 날 1924
15 Kim Dong-in 김동인 Potato (Gamja) 감자 1925
16 Hyun Jin-geon 현진건 Director B and the Love Letters B사감과 러브레터 1925
17 Na Do-hyang 나도향 The Watermill (Mullebang-a) 물레방아 1925
18 Bang Jeong-hwan 방정환 For Our Friends 동무를 위하여 1927
19 Bang Jeong-hwan 방정환 The Eternal Shirt (Mannyeon Shirt) 만년 셔츠 1927
20 Bang Jeong-hwan 방정환 The Gold Watch 금시계 1929
21 Kim Dong-in 김동인 Dr. K's Research K박사의 연구 1929
22 Kim Nam-cheon 김남천 Water (Mul) 1933
23 Chae Man-sik 채만식 Ready-made Life 레디메이드 인생 1934
24 Kang Kyeong-ae 강경애 Salt (Sogeum) 소금 1934
25 Gye Yong-mook 계용묵 Adada the Idiot (Baekchi Adada) 백치 아다다 1935
26 Kim Yu-jeong 김유정 The Camellias (Dongbaek-kkot) 동백꽃 1936
27 Yi Sang 이상 The Wings (Nalgae) 날개 1936
28 Yi Hyo-seok 이효석 When Buckwheat Flowers Bloom 메밀꽃 필 무렵 1936
29 Chae Man-sik 채만식 Uncle Chi-suk 치숙 1938
30 Jeong In-taek 정인택 Melancholy (Uuljeung) 우울증 1940
31 Kim Sa-ryang 김사량 The Man Met in the Detention Center 유치장에서 만난 사나이 1941
32 Ji Ha-ryeon 지하련 The Journey (Dojeong) 도정 1946
33 Kang So-cheon 강소천 The Photo Studio that Takes Pictures of Dreams 꿈을 찍는 사진관 1954

Files

01_Yi In-jik_Tears of Blood (Hyeol-ui Nu).xml

Files (2.9 MB)

Name Size Download all
md5:333015b2e754709150942f3d73346ef0
353.3 kB Preview Download
md5:7ffa8acfc76f56bf0986500340c957b3
337.7 kB Preview Download
md5:3a312431ca64a0db9f32b8d34af9ab10
32.4 kB Preview Download
md5:b515babfc79b404dd49afb6ebec04b82
420.4 kB Preview Download
md5:0df6343132c56f60f3988b8f0e327841
31.1 kB Preview Download
md5:8be177c37dd404a0fb8bd651b6ab9bb4
110.9 kB Preview Download
md5:58f5822c4495ca153d12a1adcac10be1
24.2 kB Preview Download
md5:d0a37a52f9381963640fc38e978c25b0
223.5 kB Preview Download
md5:7681283cef229631e8f1bc051a7ae9e0
9.9 kB Preview Download
md5:0fb20080b38cb197614bf33ff8f8ff50
63.8 kB Preview Download
md5:4ebb6e2168642cf6c1f70f2f3c29e47b
52.5 kB Preview Download
md5:128aa0470e3aa00989471d959cfc230e
44.0 kB Preview Download
md5:1efc316d3d4eba31cec1c5e8041877f1
26.0 kB Preview Download
md5:962612dfef6bf4fedb94513b42218480
44.4 kB Preview Download
md5:8f6976f6548a53f47217b53f10f15843
42.2 kB Preview Download
md5:be9fe83b86bc21715a5b9e326127d2e7
25.7 kB Preview Download
md5:ba8b0b5ab56212220876184059bf499f
60.5 kB Preview Download
md5:240a4e18b0dac1e459636f9290c8d962
32.6 kB Preview Download
md5:ca38d091757397313d18c3ff19f1372e
30.3 kB Preview Download
md5:d372124d9844f307cd5a51425a14be3f
44.1 kB Preview Download
md5:376f56109f1d396ad79ca52352b4d3bd
71.4 kB Preview Download
md5:a65c10ac31eeba8cba706569b3e4004e
32.8 kB Preview Download
md5:f04835230220f9cac9dc07b5144b4e33
112.8 kB Preview Download
md5:b699f481acbc333e07930f10ee2f6d82
178.0 kB Preview Download
md5:eccda4e26323b7089b2583bd03bc158b
52.6 kB Preview Download
md5:aa303e2ee6c05e13fb0bf97539188136
32.8 kB Preview Download
md5:5684d077fbb0ae7f2d3ab032ac2085b5
94.0 kB Preview Download
md5:4af120204b240d96e6c4c2987ba67704
37.7 kB Preview Download
md5:cae054015ef8d1f32a9076652384158e
69.8 kB Preview Download
md5:01c421c3ede1e9ceee2221e887b60f72
59.3 kB Preview Download
md5:74cf74fc3690329f0f0b3bd17e4470a6
68.1 kB Preview Download
md5:05e074240bb6c4aedf2a5ad5cc69ed88
86.7 kB Preview Download
md5:10c0ff6afba89854f264e5a7c65cc654
27.4 kB Preview Download

Additional details

Funding

Academy of Korean Studies
TEI/XML Construction Methodology for Korean Modern Literature in Humanities Data Design Education AKSR2025-RE10

Dates

Issued
2026-02-18
The date of v1 release

Software

Repository URL
https://github.com/AKS-DHLAB/KNoTE
Programming language
Python , XML
Development Status
Active