Published August 8, 2018 | Version v1.1.0
Software Open

pmyteh/RISJbot: New scrapers and readability fallback extractor

Authors/Creators

  • 1. Reuters Institute for the Study of Journalism, University of Oxford

Description

The highlights of this release are the addition of a number of new news source scrapers, and particularly the addition of a fallback headline and text extractor based on the readability language. This means that, even if the site being scraped serves up an unknown page format, we can attempt to extract the body text and headline using heuristics.

A number of bug fixes are also included.

Files

pmyteh/RISJbot-v1.1.0.zip

Files (108.8 kB)

Name Size Download all
md5:e9ee3cc4a7b784b5fb29dbb305958168
108.8 kB Preview Download

Additional details

Related works