Published December 19, 2022 | Version 1.0.0
Dataset Restricted

Attribution Signals from Five Android Markets

  • 1. Aarhus University
  • 2. IMDEA Networks Institute
  • 3. TU Wien
  • 4. Universidad Carlos III de Madrid

Description

Attribution signals extracted from five major Android markets (Google Play, Baidu, APKMonk, Tencent, APKMirror). The data has been collected by a crawler, deployed at difference instances between December 2019 and November 2021.

The data set consists of 4,978,937 records and is provided as a single JSON file compressed using bzip2. An individual "market entry" (i.e., a unique set of signals associated with a single publication version of an app on a market) can have multiple records, as the data set is provided in 'long' form, with each signing certificate associated with the market entry is contained in its own record.

Each record containing the following properties:

Signal

Description

market_entry_id

Unique identifier for the record

pkg_name

Package name (prioritizes package name from APK file over the web page reported package name)

internal_market_id

The internal identifier used by the market (as part of the URL)

market

The market this record was collected from

timestamp

Timestamp of the collection of the market entry

file_sha256

The SHA256 hash of the APK file from which the signals are extracted

meta_version

The reported reported by the meta data on web page description

app_name

The name of the app extract from the meta data

developer_name

The name of the publishing developer

developer_website

The website of the publishing developer

developer_email

The email address of the publishing developer

developer_address

The physical address of the publishing developer

meta_pkg_name

The package name extracted from the app web page

privacy_policy_url

The privacy policy URL

cert_sha256

The SHA256 hash of the certificate used to sign the APK

subject_c

The “country” field extracted from the subject of the signing certificate

subject_s

The “state” field extracted from the subject of the signing certificate

subject_o

The “organisation” field extracted from the subject of the signing certificate

subject_l

The “locale” field extracted from the subject of the signing certificate

subject_ou

The “organisation unit” field extracted from the subject of the signing certificate

subject_cn

The “common name” field extracted from the subject of the signing certificate

self_signed

True if the signing certificate is self-signed, False otherwise

cert_version

The Android signing version of the certificate

apk_version_code

The version code of the APK

apk_pkg_name

The package name extracted from the APK file

apk_app_name

The app name extracted from the APK file

market_id

Identifier that combines the "pkg_name" and "internal_market_id". Deprecated field (use 'market_entry_id' instead)

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Related works