Published August 1, 2022 | Version v6
Dataset Open

Digitisation of Weather Records of Seungjeongwon Ilgi: A Historical Weather Dynamics Dataset of the Korean Peninsula (1623-1910)

  • 1. Institute of Social Science, The University of Tokyo
  • 2. Frontier Research Institute for Interdisciplinary Sciences, Tohoku University
  • 3. Center for Northeast Asian Studies, Tohoku University
  • 4. Graduate School of Humanities, Department of Classics / Institute for Advanced Research, Nagoya University
  • 5. Graduate School of Humanities Department of Humanities, Nagoya University

Description

Introduction

This study has exploited the daily weather records of Seungjeongwon Ilgi from the NIKH database (http://sjw.history.go.kr/main.do). Seungjeongwon Ilgi is a daily record of the Seungjeongwon, the Royal Secretariat of the Joseon Dynasty of Korea. These diaries span from 1623 to 1910 and generally involve daily weather records in the entry header. Their observational site would be located in Seoul (N37°35′, E126°59′). We have exploited the weather records from the NIKH database and classified the daily weather using text mining method. We have also converted the report dates from the traditional lunisolar calendar to the Gregorian calendar, to better contextualise our data into the contemporary daily measurements.

Data

We provide different formats (csv, xlsx, json) to facilitate the usage of data. The main contents of data are listed as below.

  • ID: The unique identifier of a specific record in the metadata, which can also serve as the identifier to merge with external data in the NIKH digital database.
  • Traditional calendar: The original lunar dates in the NIKH digital database, which are listed in data format "YYYY-MM-DD". More specifically, "L0" implies the leap year and "L1" implies the common year.
  • Leap: The identifier of a leap year.
  • Gregorian calendar: The Gregorian calendar date that converted by the traditional calendar date.
  • Weather Text: The text that describe the weather conditions. Specifically, multiple weather descriptions of the same day have been put together.
  • Flag: The computed value that indicates different combinations of weather conditions.
  • Volume: The volume of text in the original record.
  • Herbal Volume: The volume of text in the herbal record.
  • Sunny: A dummy variable that represents whether the weather description contains the expression of sunny.
  • Cloudy: A dummy variable that represents whether the weather description contains the expression of cloudy.
  • Rainy: A dummy variable that represents whether the weather description contains the expression of rainy.
  • Snow: A dummy variable that represents whether the weather description contains the expression of snow.
  • Wind: A dummy variable that represents whether the weather description contains the expression of wind.

Import Data

# Python
# CSV file
import pandas as pd
data=pd.read_csv('~/SJWilgi_Seoul_Weather_YR1623_1910.csv',encoding="utf-8") 
# JSON file
data=pd.read_json('~/SJWilgi_Seoul_Weather_YR1623_1910.json',encoding="utf-8")
# Excel file
data=pd.read_excel('~/SJWilgi_Seoul_Weather_YR1623_1910.xlsx') # Excel file
# R
# CSV file
library(readr)
data<- read_csv("~/SJWilgi_Seoul_Weather_YR1623_1910.csv")
# Excel file
library(readxl)
data <- read_excel("~/SJWilgi_Seoul_Weather_YR1623_1910.xlsx")

 

Files

README.md

Files (32.6 MB)

Name Size Download all
md5:2af7820833ecafc04ab79cc46dc474ab
3.0 kB Preview Download
md5:8e7fb345d14174dcf747af10a43dbecd
65.4 kB Preview Download
md5:a19307532127f64bec13f79f09cae50d
1.6 kB Preview Download
md5:a64cd492a640e402ce4b2073b7a7c182
6.7 MB Preview Download
md5:8f796abd0a4be18626211c9d95f79fe6
19.3 MB Preview Download
md5:0532fd0b7d12373c0b72b70851a55440
6.5 MB Download