Published April 16, 2018 | Version 1.0.0
Dataset Open

115th U.S. Congress Member Website (Full JavaScript-enabled Scrape) Collection

Creators

Description

This data set represents a point-in-time full JavaScript-enabled scrape of all available 115th U.S. Congress member web sites. The data collection originated and completed on 2018-04-13 and the results are in ndjson/jsonlines/streaming JSON format. File format information is in the enclosed README.md file.

The data was used to evaluate the privacy profiles of each U.S. Congress members' official (.gov hosted) websites for the discussion in <https://rud.is/b/2018/04/13/does-congress-really-care-about-your-privacy/>.

ScrapingHub's "Splash" platform (<https://github.com/scrapinghub/splash>) was used along with the "splashr" R package (<https://github.com/hrbrmstr/splashr>) to retrieve the content.

Files

LICENSE.txt

Files (1.9 GB)

Name Size Download all
md5:fa06420a2a38d74c3d7b79fff917217d
1.9 GB Download
md5:63f9a9d76e5388688597d204cecb9d5b
33 Bytes Preview Download
md5:8c6db81cfc046a341fa04ecd306a69ca
1.0 kB Preview Download
md5:6c26a5d59079b9056fdb869a5b69dda3
1.4 kB Download

Additional details