Digitisation of Weather Records of Seungjeongwon Ilgi: A Historical Weather Dynamics Dataset of the Korean Peninsula (1623-1910)
- 1. Institute of Social Science, The University of Tokyo
- 2. Frontier Research Institute for Interdisciplinary Sciences, Tohoku University
- 3. Center for Northeast Asian Studies, Tohoku University
- 4. Graduate School of Humanities, Department of Classics / Institute for Advanced Research, Nagoya University
- 5. Graduate School of Humanities Department of Humanities, Nagoya University
Description
Introduction
This study has exploited the daily weather records of Seungjeongwon Ilgi from the NIKH database (http://sjw.history.go.kr/main.do). Seungjeongwon Ilgi is a daily record of the Seungjeongwon, the Royal Secretariat of the Joseon Dynasty of Korea. These diaries span from 1623 to 1910 and generally involve daily weather records in the entry header. Their observational site would be located in Seoul (N37°35′, E126°59′). We have exploited the weather records from the NIKH database and classified the daily weather using text mining method. We have also converted the report dates from the traditional lunisolar calendar to the Gregorian calendar, to better contextualise our data into the contemporary daily measurements.
Data
We provide different formats (csv, xlsx, json) to facilitate the usage of data. The main contents of data are listed as below.
- ID: The unique identifier of a specific record in the metadata, which can also serve as the identifier to merge with external data in the NIKH digital database.
- Traditional calendar: The original lunar dates in the NIKH digital database, which are listed in data format "YYYY-MM-DD". More specifically, "L0" implies the leap year and "L1" implies the common year.
- Leap: The identifier of a leap year.
- Gregorian calendar: The Gregorian calendar date that converted by the traditional calendar date.
- Weather Text: The text that describe the weather conditions. Specifically, multiple weather descriptions of the same day have been put together.
- Flag: The computed value that indicates different combinations of weather conditions.
- Volume: The volume of text in the original record.
- Herbal Volume: The volume of text in the herbal record.
- Sunny: A dummy variable that represents whether the weather description contains the expression of sunny.
- Cloudy: A dummy variable that represents whether the weather description contains the expression of cloudy.
- Rainy: A dummy variable that represents whether the weather description contains the expression of rainy.
- Snow: A dummy variable that represents whether the weather description contains the expression of snow.
- Wind: A dummy variable that represents whether the weather description contains the expression of wind.
Import Data
# Python
# CSV file
import pandas as pd
data=pd.read_csv('~/SJWilgi_Seoul_Weather_YR1623_1910.csv',encoding="utf-8")
# JSON file
data=pd.read_json('~/SJWilgi_Seoul_Weather_YR1623_1910.json',encoding="utf-8")
# Excel file
data=pd.read_excel('~/SJWilgi_Seoul_Weather_YR1623_1910.xlsx') # Excel file
# R
# CSV file
library(readr)
data<- read_csv("~/SJWilgi_Seoul_Weather_YR1623_1910.csv")
# Excel file
library(readxl)
data <- read_excel("~/SJWilgi_Seoul_Weather_YR1623_1910.xlsx")
Files
README.md
Files
(32.6 MB)
Name | Size | Download all |
---|---|---|
md5:2af7820833ecafc04ab79cc46dc474ab
|
3.0 kB | Preview Download |
md5:8e7fb345d14174dcf747af10a43dbecd
|
65.4 kB | Preview Download |
md5:a19307532127f64bec13f79f09cae50d
|
1.6 kB | Preview Download |
md5:a64cd492a640e402ce4b2073b7a7c182
|
6.7 MB | Preview Download |
md5:8f796abd0a4be18626211c9d95f79fe6
|
19.3 MB | Preview Download |
md5:0532fd0b7d12373c0b72b70851a55440
|
6.5 MB | Download |