Published October 14, 2021 | Version v2
Dataset Open

Financial News dataset for text mining

  • 1. INRAE

Description

please cite this dataset by :

Nicolas Turenne, Ziwei Chen, Guitao Fan, Jianlong Li, Yiwen Li, Siyuan Wang, Jiaqi Zhou  (2021) Mining an English-Chinese parallel Corpus of Financial News,  BNU HKBU UIC, technical report

 

The dataset comes from Financial Times news website (https://www.ft.com/)

news are written in both languages Chinese and English.

FTIE.zip contains all documents in a file individually

FT-en-zh.rar contains all documents in one file

Below is a sample document in the dataset defined by these fields and syntax : 

id;time;english_title;chinese_title;integer;english_body;chinese_body

 

1021892;2008-09-10T00:00:00Z;FLAW IN TWIN TOWERS REVEALED;科学家发现纽约双子塔倒塌的根本原因;1;Scientists have discovered the fundamental reason the Twin Towers collapsed on September 11 2001. The steel used in the buildings softened fatally at 500?C – far below its melting point – as a result of a magnetic change in the metal. @ The finding, announced at the BA Festival of Science in Liverpool yesterday, should lead to a new generation of steels capable of retaining strength at much higher temperatures.;科学家发现了纽约世贸双子大厦(Twin Towers)在2001年9月11日倒塌的根本原因。由于磁性变化,大厦使用的钢在500摄氏度——远远低于其熔点——时变软,从而产生致命后果。 @ 这一发现在昨日利物浦举行的BA科学节(BA Festival of Science)上公布。这应会推动能够在更高温度下保持强度的新一代钢铁的问世。
 

The dataset contains 60,473 bilingual  documents.

Time range is from 2007 and 2020.   

This dataset has been used for parallel bilingual news mining in Finance domain.

Notes

Turenne N et al (2021) Mining an English-Chinese parallel Corpus of nancial News

Files

FTIE.zip

Files (155.3 MB)

Name Size Download all
md5:9f24842bcef907dd5f85c7f9aa260998
50.7 MB Download
md5:25479a51e5749a2f462461c29e850844
104.6 MB Preview Download

Additional details

References

  • Turenne N et al (2021) Mining an English-Chinese parallel Corpus of nancial News