There is a newer version of the record available.

Published July 4, 2022 | Version v2
Dataset Open

Web requests analysis of Italy websites which use Google Analytics

Description

List of 504,038 domains of Italy found to contain Google Analytics.

The front page for Italy-related domain names has been accessed through HTTPS or HTTP and analysed with webbkoll and jq to gather data about third-party requests, cookies and other privacy-invasive features. Together with the actual URL visited, the user/property ID is provided for 495,663 domains (extracted either from the cookies deposited or the URL of requests to Google Analytics). MX and TXT records for the domains are also provided.

The most common ID found was 23LNSPS7Q6, with over 35k domains calling it (seemingly associated with italiaonline.it). The most common responding IP addresses were 3 AWS IPv4 addresses (over 40k domains) and 2 CloudFlare IPv6 addresses (over 12k domains).

Files

2022-07_GA_domains.txt

Files (39.5 MB)

Name Size Download all
md5:0cfcc294560b95c8bd8b20ddb48fd58a
5.7 MB Download
md5:2ebcc8b03c98da0222e1d0fc1025abf4
8.9 MB Preview Download
md5:97fed334c24089f31af7e171a0c2f3a6
9.8 MB Download
md5:e0fe110f64284cc7a84d249f55af534a
10.2 MB Download
md5:2a224ae49df4d1054a62c3dd502f9d8a
4.7 MB Preview Download
md5:3b867bc8d3077eae281c37a62dd4bd88
30.4 kB Download
md5:e18485e79c47a95cfce539f4cf12172b
4.4 kB Preview Download

Additional details

Related works

Cites
Journal article: 10.14722/ndss.2019.23386 (DOI)

References

  • Victor Le Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Maciej KorczyƄski, and Wouter Joosen. 2019. "Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation," Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS 2019). https://doi.org/10.14722/ndss.2019.23386
  • Anders Jensen-Urstad (2022). webbkoll. https://github.com/andersju/webbkoll