Financial News dataset for text mining
Description
please cite this dataset by :
Nicolas Turenne, Ziwei Chen, Guitao Fan, Jianlong Li, Yiwen Li, Siyuan Wang, Jiaqi Zhou (2021) Mining an English-Chinese parallel Corpus of Financial News, BNU HKBU UIC, technical report
The dataset comes from Financial Times news website (https://www.ft.com/)
news are written in both languages Chinese and English.
FTIE.zip contains all documents in a file individually
FT-en-zh.rar contains all documents in one file
Below is a sample document in the dataset defined by these fields and syntax :
id;time;english_title;chinese_title;integer;english_body;chinese_body
1021892;2008-09-10T00:00:00Z;FLAW IN TWIN TOWERS REVEALED;科学家发现纽约双子塔倒塌的根本原因;1;Scientists have discovered the fundamental reason the Twin Towers collapsed on September 11 2001. The steel used in the buildings softened fatally at 500?C – far below its melting point – as a result of a magnetic change in the metal. @ The finding, announced at the BA Festival of Science in Liverpool yesterday, should lead to a new generation of steels capable of retaining strength at much higher temperatures.;科学家发现了纽约世贸双子大厦(Twin Towers)在2001年9月11日倒塌的根本原因。由于磁性变化,大厦使用的钢在500摄氏度——远远低于其熔点——时变软,从而产生致命后果。 @ 这一发现在昨日利物浦举行的BA科学节(BA Festival of Science)上公布。这应会推动能够在更高温度下保持强度的新一代钢铁的问世。
The dataset contains 60,473 bilingual documents.
Time range is from 2007 and 2020.
This dataset has been used for parallel bilingual news mining in Finance domain.
Notes
Files
FTIE.zip
Files
(155.3 MB)
Name | Size | Download all |
---|---|---|
md5:9f24842bcef907dd5f85c7f9aa260998
|
50.7 MB | Download |
md5:25479a51e5749a2f462461c29e850844
|
104.6 MB | Preview Download |
Additional details
References
- Turenne N et al (2021) Mining an English-Chinese parallel Corpus of nancial News