Trend Dataset: Evaluation Datasets for Trend Forecasting Studies

Overview

This dataset is created for the purpose of evaluating trend forecasting methods. The dataset contains 400 entities in 21 domains consisting of 5 categories (geography, organization, person, product, and content). Each entity is annotated with three types of trend attributes: trending status (trending or non-trending), degree of trending (how well it is recognized), and trend period (when the trend started and ended).

There are three main features of this dataset. First, a questionnaire-based recognition rate is used as the gold standard for annotating trend attributes. Second, a collection of entities from a wide range of domains while covering both trending and non-trending, without significant imbalance. Third, trend period annotation on a weekly resolution through interpolation using Internet search volume data.

See our paper "Construction of Evaluation Datasets for Trend Forecasting Studies" *1 (hereafter known as “paper”) for details.

*1 This paper is currently under review in single blind.

Conditions under this dataset construction

Target Trending Phenomenon

| attribute | value | | -------------- | ------------------------------------------------------ | | Target Country | Japan | | Survey Period | from 2015 to 2019 | | # of entities | 400 | | Target Domains | 21 domains of 5 categories (See Target Domain Section) |

Target Domain

| Category | Domain | | ------------------ | ------------------------------------------------------------ | | Location/Geography | City/region/landmark | | Organization | Restaurant/facility, company/brand | | Person/Group | Politician/political party, researcher, athlete, actor/actress, celebrity/entertainer/comedian, music band/music group | | Products | Cosmetics, daily necessities, clothing, beverage, foodstuff, others | | Art/Content | Game, publication, comic/animation, movie, broadcast program, music |

Dataset Files

The dataset contains two TSV files and a single Markdown file. The description and metadata about the dataset are provided in a Markdown file. The list of files is shown in the table below.

| data label | file name | file format | explanation | | ------------------ | ---------------------- | ----------- | ------------------------------------------------------------ | | Metadata | README.md | Markdown | Overview of this dataset and the schema of each file in English and Japanese | | Trend Dataset | trend_dataset.tsv | TSV | Main body of this dataset | | Master Entity list | master_entity_list.tsv | TSV | Information of entities included in Trend Dataset |

TSV File format

| attribute | value | | ---------------- | -------------------------- | | header row | exists (first row) | | index column | “ID” column (first column) | | encoding | UTF-8 | | delimiter | \t (tab) | | quoting | None *2 | | escape character | \ (back slash) | | line terminator | \n |

*2: csv.QUOTE_NONE in Python

Description of the dataset body

Trend Dataset

This file is the main body of the dataset: 400 entities annotated with trend attributes. The following fields correspond to the terminologies used in the paper.

  • entity name: entity_en, entity_ja
  • domain name: domain_en, domain_ja
  • trending status: trend_status
  • degree of trending: degree_of_trending
  • trend period {start, end}: trend_period_{start, end}

Other fields contain the raw / intermediate information that was used to annotate trend attributes. Please refer to the paper for detailed calculation procedures.

| field name | type | explanation_en | explanation_ja | example | notes_en | notes_ja | | -------------------- | ------------------------------------------- | ------------------------------------------------------------ | -------------------------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | | ID | string | entity ID | „Ç®„É≥„ÉÜ„Ç£„ÉÜ„Ç£ID | entity00001 | used to link entities between files | „Éá„ɺ„Çø„Ǫ„ÉÉ„ÉàÈñì„ÅÆÁ¥ê‰ªò„Åë„Å´Âà©Áî® | | domain_en | string | domain name in English | „Éâ„É°„ǧ„É≥(Ëã±Ë™û) | Person/Group-Athelete | {category}-{subcategory} | {§ßÂå∫ÂàÜ}-{Â∞èÂå∫ÂàÜ} | | domain_ja | string | domain name in Japanese | „Éâ„É°„ǧ„É≥(Êó•Êú¨Ë™û) | ‰∫∫Á⩄ɪ„Ç∞„É´„ɺ„Éó-„Çπ„Éù„ɺ„ÉÑÈÅ∏Êâã | | | | interest_pattern | string | interest pattern of an entity | Èñ¢ÂøÉ„Éë„Çø„ɺ„É≥ | positive | positive: recognized-surge; negative: non-recognized-unknown; negative-popular: recognized-widespread | positiveÔºöË™çÁü•-Êĕ¢óÔºånegativeÔºöÈùûË™çÁü•-ÁÑ°ÂêçÔºånegative-popularÔºöË™çÁü•-ÊôÆÂèä | | title_en | string | Wikipedia article title in English | WikipediaË®ò‰∫ã„Çø„ǧ„Éà„É´(Ëã±Ë™û) | Bohemian | | | | Rhapsody (film) | | | | | | | | title_ja | string | Wikipedia article title in Japanese | WikipediaË®ò‰∫ã„Çø„ǧ„Éà„É´(Ëã±Ë™û) | „Éú„Éò„Éü„Ç¢„É≥„ɪ„É©„Éó„ÇΩ„Éá„Ç£ | | | | (ÊòÝÁîª) | | | | | | | | entity_en | string | entity name in English | „Ç®„É≥„ÉÜ„Ç£„ÉÜ„Ç£Âêç(Ëã±Ë™û) | Bohemian | | | | Rhapsody | Wikipedia article title without parentheses | Êã¨Âºß„ÇíÂâäÈô§„Åó„ÅüWikipediaË®ò‰∫ã„Çø„ǧ„Éà„É´ | | | | | | entity_ja | string | entity name in Japanese | „Ç®„É≥„ÉÜ„Ç£„ÉÜ„Ç£Âêç(Êó•Êú¨Ë™û) | „Éú„Éò„Éü„Ç¢„É≥„ɪ„É©„Éó„ÇΩ„Éá„Ç£ | | | | lead_sentence | string | first sentence of Wikipedia article | WikipediaË®ò‰∫ã„ÅÆÂÖàÈÝ≠Êñá | „Äé„Éú„Éò„Éü„Ç¢„É≥„ɪ„É©„Éó„ÇΩ„Éá„Ç£„ÄèÔºàÔºâ„ÅØ„ÄÅ2018Âπ¥„ÅƄǧ„ÇÆ„É™„Çπ„ɪ„Ç¢„É°„É™„Ç´ÂêàË°ÜÂõΩË£Ω‰Ωú„ÅƉºùË®òÊòÝÁÄljºùË™¨ÁöÑ„É≠„ÉÉ„ÇØ„Éê„É≥„Éâ„Äå„Ç؄ǧ„ɺ„É≥„Äç„ÅÆ„Éú„ɺ„Ç´„É™„Çπ„Éà„ɪ„Éï„ɨ„Éá„Ç£„ɪ„Éû„ɺ„Ç≠„É•„É™„ɺ„Çí‰∏ª‰∫∫ÂÖ¨„Å®„Åô„Çã„ÄÇÁ¨¨76Âõû„Ç¥„ɺ„É´„Éá„É≥„Ç∞„É≠„ɺ„ÉñË≥û„Åß„Å؉ΩúÂìÅË≥ûÔºà„Éâ„É©„ÉûÈÉ®ÈñÄÔºâ„Å®‰∏ªÊºîÁî∑ÂÑ™Ë≥ûÔºà„Éâ„É©„ÉûÈÉ®ÈñÄÔºâ„ÇíÁç≤Âæó„ÄÇÁ¨¨91Âõû„Ç¢„Ç´„Éá„Éü„ɺË≥û„Åß„ÅØ„ÄʼnΩúÂìÅË≥û„ÇíÂê´„ÇÄ5ÈÉ®ÈñÄ„Å´„Éé„Éü„Éç„ɺ„Éà„Åï„Çå„Äʼn∏ªÊºîÁî∑ÂÑ™Ë≥û„ÄÅÁ∑®ÈõÜË≥û„ÄÅÈå≤Èü≥Ë≥û„ÄÅÈü≥ÈüøÁ∑®ÈõÜË≥û„ÅÆÊúħö4ÂÜÝ„ÇíÁç≤Âæó„Åó„Åü„ÄÇËààË°åÂèéÂÖ•„ÅØÈü≥Ê•Ω‰ºùË®òÊòÝÁÅÆ„Ç∏„É£„É≥„É´„ÅßÂè≤‰∏ä1‰Ωç„ÄÅÊó•Êú¨„Åß„ÅØ2018Âπ¥ÂÖ¨Èñã„ÅÆÊòÝÁŮ„Åó„ŶÊúÄÈ´ò„Å®„Å™„Å£„Åü„ÄÇ | first sentence of Wikipedia article | | | n_response | int | number of crowdworkers who attended recognition rate questionnaire survey | Ë™çÁü•ÁéáË™øÊüª„Å´ÂõûÁ≠î„Åó„Åü„ÇØ„É©„Ƕ„Éâ„É؄ɺ„Ç´Êï∞ | 485 | excluding dishonest workers. Minimum 444, Maximum 485 | ‰∏çÊ≠£„É؄ɺ„Ç´„ÇíÈô§„ÅèÔºéÊúÄÂ∞è444ÔºåÊúħß485 | | n_unrecognized | int | number of workers who answered "I don't know" | „ÄåÁü•„Çâ„Å™„ÅÑ„Äç„Å®ÂõûÁ≠î„Åó„Åü„É؄ɺ„Ç´Êï∞ | 87 | | | | n_recognized | int | number of workers who answered "I knew" | „ÄåÁü•„Å£„Ŷ„ÅÑ„Çã„Äç„ÅÆ„ÅÑ„Åö„Çå„Åã„ÅÆÈÅ∏ÊäûËÇ¢„ÇíÂõûÁ≠î„Åó„Åü„É؄ɺ„Ç´Êï∞„ÅÆÂêàË®à | 398 | | | | recognition_rate_bos | float | questionnaire-based recognition rate as of the year-end of 2015 | 2015Âπ¥Êú´ÊôÇÁÇπ„ÅÆË™çÁü•Áéá | 0.151 | the year-end of 2015 corresponds to the start of the survey period | 2015Âπ¥Êú´„ÅØÔºåË™øÊüªÊúüÈñì„ÅÆÊúüÂàù„Å´Áõ∏ÂΩì | | recognition_rate_eos | float | questionnaire-based recognition rate as of August 31st, 2019 | 2109Âπ¥8ÊúàÊú´ÊôÇÁÇπ„ÅÆË™çÁü•Áéá | 0.821 | the end of August 2019 corresponds to the end of the survey period | 2019Âπ¥8ÊúàÊú´„ÅØÔºåË™øÊüªÊúüÈñì„ÅÆÊúüÊú´„Å´Áõ∏ÂΩì | | degree_of_trending | float | How well an entity recognized | ʵÅË°åÂ∫¶ | 0.670 | the degree of trending is defined as the increase in recognition rate during the survey period | ʵÅË°åÂ∫¶„Å®„ÅØÔºåË™øÊüªÊúüÈñì‰∏≠„ÅÆË™çÁü•Â∫¶„ÅÆ¢óÂäÝÈáè„Åß„ÅÇ„Çã | | trend_status | bool | flag whether an entity is trending or non-trending | ʵÅË°å„Ç®„É≥„ÉÜ„Ç£„ÉÜ„Ç£„ÅãÂ궄Åã | TRUE | entities those the recognition rate at the end of 2015 is less than 0.3 and the degree of trending is more and equal to 0.25. The number of positive examples is 80. | 2015Âπ¥Êú´„ÅÆË™çÁü•Áéá„Åå0.3Êú™Ê∫Ä „Åã„ŧ „ÅÇ„ÇãÊôÇÁÇπ„ÅÆʵÅË°åÂ∫¶„Åå0.25‰ª•‰∏äÔºéÊ≠£‰æã„ÅØ80‰ª∂ | | trend_period_start | date | start date of the trend period | ʵÅË°åÊúüÈñì„ÅÆÈñãÂßãÊó• | 2016-12-11 | The point of time which corresponds to the 25 percentile of the interpolated recognition rate | the interpolated recognition rate„ÅÆ25„Éë„ɺ„Ǫ„É≥„Çø„ǧ„É´„Å´ÂØæÂøú„Åô„ÇãÊôÇÁÇπ | | trend_period_peak | date | peak date of the trend period | ʵÅË°åÊúüÈñì„ÅÆ„Éî„ɺ„ÇØÊó• | 2017-08-20 | The point of time which corresponds to the 50 percentile of the interpolated recognition rate | the interpolated recognition rate„ÅÆ50„Éë„ɺ„Ǫ„É≥„Çø„ǧ„É´„Å´ÂØæÂøú„Åô„ÇãÊôÇÁÇπ | | trend_period_end | date | end date of the trend period | ʵÅË°åÊúüÈñì„ÅÆÁµÇ‰∫ÜÊó• | 2018-02-18 | The point of time which corresponds to the 75 percentile of the interpolated recognition rate | the interpolated recognition rate„ÅÆ75„Éë„ɺ„Ǫ„É≥„Çø„ǧ„É´„Å´ÂØæÂøú„Åô„ÇãÊôÇÁÇπ | | trend_ranking | int | ranking of the degree of trending, | ʵÅË°åÂ∫¶„ÅÆÈÝ܉Ωç | 22 | Rank of entities by descending order of degree_of_trending. If the same values exist, we assign them in the order they appear. | degree_of_trending„ÇíÈôçÈÝÜ„Å´ÈÝ܉Ω牪ò„ÅëÔºé„Çø„ǧ„Éá„ɺ„Çø„ÅØÂá∫ÁèæÈÝÜÔºé |

domain_en object domain_ja object interest_pattern object title_en object title_ja object entity_en object entity_ja object lead_sentence object n_response int64 n_unrecognized int64 n_recognized int64 recognition_rate_bos float64 recognition_rate_eos float64 degree_of_trending float64 trend_status bool trend_period_start datetime64[ns, UTC] trend_period_peak datetime64[ns, UTC] trend_period_end datetime64[ns, UTC] trend_ranking int64 dtype: object

Master Entity List

This file contains detailed information about the entities included in the dataset. It is useful for the following purposes:

  • Reproduce the dataset construction procedure.
  • Obtain synonyms and abbreviations of entity names in order to query Internet activity data, e.g., the volume of search keywords.

| fieldÂêç | type | explanation_en | explanation_ja | example | notes_en | notes_ja | | ------------------------ | ------ | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | | ID | string | entity ID | „Ç®„É≥„ÉÜ„Ç£„ÉÜ„Ç£ID | entity00388 | used to link entities between files | „Éá„ɺ„Çø„Ǫ„ÉÉ„ÉàÈñì„ÅÆÁ¥ê‰ªò„Åë„Å´Âà©Áî® | | domain_en | string | domain name in English | „Éâ„É°„ǧ„É≥ÂêçÔºàËã±Ë™ûÔºâ | Art/Content-Game | {Category}-{Subcategory} | {„Ç´„ÉÜ„Ç¥„É™}-{„ǵ„Éñ„Ç´„ÉÜ„Ç¥„É™} | | domain_ja | string | domain name in Japanese | „Éâ„É°„ǧ„É≥ÂêçÔºàÊó•Êú¨Ë™ûÔºâ | ÂⵉΩúÁâ©-„Ç≤„ɺ„ÉÝ | {Category}-{Subcategory} | {„Ç´„ÉÜ„Ç¥„É™}-{„ǵ„Éñ„Ç´„ÉÜ„Ç¥„É™} | | class | string | interest pattern of entity | Èñ¢ÂøÉ„Éë„Çø„ɺ„É≥ | positive | positive: recognized-surge; negative: non-recognized-unknown; negative-popular: recognized-widespread | positiveÔºöË™çÁü•-Êĕ¢óÔºånegativeÔºöÈùûË™çÁü•-ÁÑ°ÂêçÔºånegative-popularÔºöË™çÁü•-ÊôÆÂèä | | title_en | string | Wikipedia article title in English | WikipediaË®ò‰∫ã„Çø„ǧ„Éà„É´ÔºàËã±Ë™ûÔºâ | Dragon Quest XI | | | | title_ja | string | Wikipedia article title in Japanese | WikipediaË®ò‰∫ã„Çø„ǧ„Éà„É´ÔºàÊó•Êú¨Ë™ûÔºâ | „Éâ„É©„Ç¥„É≥„ÇØ„Ç®„Çπ„ÉàXI ÈÅé„ÅéÂ骄Çä„ÅóÊôÇ„ÇíʱDŽÇńŶ | | | | lead_sentence | string | first sentence of Wikipedia articles | WikipediaË®ò‰∫ã„ÅÆÂÖàÈÝ≠Êñá | „Äé„Éâ„É©„Ç¥„É≥„ÇØ„Ç®„Çπ„ÉàXI ÈÅé„ÅéÂ骄Çä„ÅóÊôÇ„ÇíʱDŽÇńŶ„ÄèÔºà„Éâ„É©„Ç¥„É≥„ÇØ„Ç®„Çπ„Éà „Ç§„ɨ„Éñ„É≥ „Åô„Åé„Åï„Çä„Åó„Å®„Åç„Çí„ÇÇ„Å®„ÇńŶԺâ„ÅØ„ÄÅ2017Âπ¥7Êúà29Êó•„Å´„Çπ„Ç؄Ƕ„Çß„Ç¢„ɪ„Ç®„Éã„ÉÉ„ÇØ„Çπ„Çà„ÇäPlayStation 4ÔºàÊó•Êú¨„ÄÅÂè∞ÊπæÔºâ„ɪ„Éã„É≥„ÉÜ„É≥„Éâ„ɺ3DSÔºàÊó•Êú¨„ÅÆ„ÅøÔºâ„ÅßÂêåÊôÇÁô∫£≤„Åï„Çå„Åü„Ç≥„É≥„Éî„É•„ɺ„ÇøRPG„ÄÇÁï•Áß∞„ÅØ„ÄéDQXI„Äè„Å™„Å©„ÄÇ | first sentence of Wikipedia article | WikipediaË®ò‰∫ã„ÅÆÂÖàÈÝ≠Êñá | | n_unique_editor | float | Number of editors of Wikipedia articles | WikipediaË®ò‰∫ã„ÅÆÁ∑®ÈõÜËÄÖÊï∞ | 193 | the number of unique editors from the article registration date to the year-end of 2018; the value is empty for entities with negative-popular class. | Ë®ò‰∫ã‰ΩúÊàêÊó•„Åã„Çâ2018Âπ¥Êú´„Åæ„Åß„ÅƄɶ„Éã„ɺ„ÇØ„Å™Á∑®ÈõÜËÄÖÊï∞Ôºéclass=negative-popular „ÅÆ„Ç®„É≥„ÉÜ„Ç£„ÉÜ„Ç£„ÅØʨÝÊêçÔºé | | n_daily_view | float | Daily average page view of Wikipedia articles | WikipediaË®ò‰∫ã„ÅÆ1Êó•„ÅÇ„Åü„ÇäÈñ≤˶ßÊï∞ | 879.018 | the average daily page view of the article from the article registration date to the year-end of 2018; the value is empty for entities with negative-popular class. | Ë®ò‰∫ã‰ΩúÊàêÊó•„Åã„Çâ2018Âπ¥Êú´„Åæ„Åß„ÅÆÈñ≤˶ßÊï∞„Å´Âü∫„Å•„ÅçÔºåÊó•Ê¨°Âπ≥ÂùáÂħ„ÇíÁÆóÂá∫Ôºéclass=negative-popular „ÅÆ„Ç®„É≥„ÉÜ„Ç£„ÉÜ„Ç£„ÅØʨÝÊêçÔºé | | interest_popensity_score | float | interest propensity score calculated using Wikiepdia article text | WikipediaË®ò‰∫ã„Åã„ÇâË®àÁÆó„Åï„Çå„ÅüÈñ¢ÂøÉÂÇæÂêë„Çπ„Ç≥„Ç¢ | 0.080 | the similarity between a word that suggests a fad (e.g. boom, trend) and an entity, calculated using Wikipedia2Vec; the value is empty for entities with negative-popular class. | ʵÅË°å„ÇíÁ§∫ÂîÜ„Åô„ÇãÂçòË™û(„Éñ„ɺ„ÉÝ,„Éà„ɨ„É≥„Éâ „Å™„Å©)„ŮԺå„Ç®„É≥„ÉÜ„Ç£„ÉÜ„Ç£„ÅÆÈ°û‰ººÂ∫¶ÔºéWikipedia2VecÂàÜÊï£Ë°®Áèæ„ÇíÁÅфŶÁÆóÂá∫„Åó„Åü„ÇÇ„ÅÆÔºéclass=negative-popular „ÅÆ„Ç®„É≥„ÉÜ„Ç£„ÉÜ„Ç£„ÅØʨÝÊêçÔºé | | interest_score | float | interest score | Èñ¢ÂøÉ„Çπ„Ç≥„Ç¢ | 0.217 | the score that combines the above three values. Higher scores indicate surge in interests; the value is empty for entities of negative-popular class. | ‰∏äËø∞„Åó„Åü3Á®ÆÈ°û„ÅÆÊï∞Âħ„ÇíÁµ±Âêà„Åó„Åü„ÇÇ„ÅÆÔºé„Çπ„Ç≥„Ç¢„Åå§߄Åç„ÅфŪ„Å©ÔºåÊÄ•ÈÄü„Å´Èñ¢ÂøÉ„ÅåÈ´ò„Åæ„Å£„Åü„Åì„Å®„ÇíÁ§∫ÂîÜÔºéclass=negative-popular „ÅÆ„Ç®„É≥„ÉÜ„Ç£„ÉÜ„Ç£„ÅØʨÝÊêçÔºé | | synonyms | string | synonyms of entity name (Japanese Only) | „Ç®„É≥„ÉÜ„Ç£„ÉÜ„Ç£Âêç„ÅÆÂêåÁæ©Ë™û(Êó•Êú¨Ë™û„ÅÆ„Åø) | dq11‚ñÅ„Éâ„É©„ÇØ„Ç®xi‚ñÅ„Éâ„É©„Ç¥„É≥„ÇØ„Ç®„Çπ„Éàxi ÈÅé„ÅéÂ骄Çä„ÅóÊôÇ„ÇíʱDŽÇńŶ‚ñÅ„Éâ„É©„Ç¥„É≥„ÇØ„Ç®„Çπ„ÉàxiÈÅé„ÅéÂ骄Çä„ÅóÊôÇ„ÇíʱDŽÇńŶ‚ñÅ„Éâ„É©„Ç¥„É≥„ÇØ„Ç®„Çπ„Éà11‚ñÅ„Éâ„É©„Ç¥„É≥„ÇØ„Ç®„Çπ„Éàxi ÈÅé„ÅéÂ骄Çä„ÅóÊôÇ„ÇíʱDŽÇńŶ s‚ñÅ„Éâ„É©„ÇØ„Ç®11‚ñÅ„Éâ„É©„Ç¥„É≥„ÇØ„Ç®„Çπ„Éàxi | synonyms automatically assigned using structured information from Wikipedia; The delimiter is "‚ñÅ" (U+2581). | Wikipedia„ÅÆÊßãÈÄÝÂåñÊÉÖÂݱ„ÇíÂà©ÁÅó„ŶԺåËá™Âãò‰∏é„Åó„Åü„ÇÇ„ÅÆÔºé„Éá„É™„Éü„Çø„ÅØ "‚ñÅ" (U+2581) | | abbreviations | string | abbreviation of entity name (Japanese Only) | „Ç®„É≥„ÉÜ„Ç£„ÉÜ„Ç£Âêç„ÅÆÁï•Ë™û(Êó•Êú¨Ë™û„ÅÆ„Åø) | dq11‚ñÅ„Éâ„É©„ÇØ„Ç®xi‚ñÅ„Éâ„É©„Ç¥„É≥„ÇØ„Ç®„Çπ„Éà11‚ñÅ„Éâ„É©„ÇØ„Ç®11‚ñÅ„Éâ„É©„Ç¥„É≥„ÇØ„Ç®„Çπ„Éàxi | abbreviations automatically assigned using structured information from Wikipedia; The delimiter is "‚ñÅ" (U+2581). | Wikipedia„ÅÆÊßãÈÄÝÂåñÊÉÖÂݱ„ÇíÂà©ÁÅó„ŶԺåËá™Âãò‰∏é„Åó„Åü„ÇÇ„ÅÆÔºé„Éá„É™„Éü„Çø„ÅØ "‚ñÅ" (U+2581) |

How to Use

The Python code snippets to read the dataset using pandas package are shown below.

read trend_dataset.tsv

snippet

import pandas as pd
import csv

df_trend = pd.read_table(
 ¬Ý  './trend_dataset.tsv',
 ¬Ý  header=0,
 ¬Ý  index_col=0,
 ¬Ý  encoding='utf-8',
 ¬Ý  quoting=csv.QUOTE_NONE,
 ¬Ý  escapechar='\\',
 ¬Ý  lineterminator='\n',
 ¬Ý  parse_dates=['trend_period_start', 'trend_period_peak', 'trend_period_end']
)

print(df_trend.dtypes)

Output

domain_en                            object
domain_ja                            object
interest_pattern                     object
title_en                             object
title_ja                             object
entity_en                            object
entity_ja                            object
lead_sentence                        object
n_response                            int64
n_unrecognized                        int64
n_recognized                          int64
recognition_rate_bos                float64
recognition_rate_eos                float64
degree_of_trending                  float64
trend_status                           bool
trend_period_start      datetime64[ns, UTC]
trend_period_peak       datetime64[ns, UTC]
trend_period_end        datetime64[ns, UTC]
trend_ranking                         int64
dtype: object

read master_entity_list.tsv

snippet

import pandas as pd
import csv

df_master = pd.read_table(
 ¬Ý  './master_entity_list.tsv',
 ¬Ý  header=0,
 ¬Ý  index_col=0,
 ¬Ý  encoding='utf-8',
 ¬Ý  quoting=csv.QUOTE_NONE,
 ¬Ý  escapechar='\\',
 ¬Ý  lineterminator='\n',
 ¬Ý  )
print(df_master.dtypes)

Output

domain_en ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý  object
domain_ja ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý  object
class ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý  object
title_en ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý object
title_ja ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý object
lead_sentence ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý  object
n_unique_editor ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý float64
n_daily_view ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý  float64
interest_popensity_score ¬Ý  float64
interest_score ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý  float64
synonyms ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý object
abbreviations ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý ¬Ý  object
dtype: object