Dataset Open Access

Hathi Trust Library Vectorized features

Benjamin M. Schmidt


JSON Export

{
  "files": [
    {
      "links": {
        "self": "https://zenodo.org/api/files/7d530f84-6cc3-4dcd-8488-873fab51283a/chi.bin"
      }, 
      "checksum": "md5:9d42e87889548753c87a0789b2974eb7", 
      "bucket": "7d530f84-6cc3-4dcd-8488-873fab51283a", 
      "key": "chi.bin", 
      "type": "bin", 
      "size": 672517424
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7d530f84-6cc3-4dcd-8488-873fab51283a/eng_1922_or_before.bin"
      }, 
      "checksum": "md5:3f568d8da7a8ea91cce543c97c2c6548", 
      "bucket": "7d530f84-6cc3-4dcd-8488-873fab51283a", 
      "key": "eng_1922_or_before.bin", 
      "type": "bin", 
      "size": 3083208261
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7d530f84-6cc3-4dcd-8488-873fab51283a/eng_1923-1979.bin"
      }, 
      "checksum": "md5:1c0cf6c2f28a460f06819d70972f9158", 
      "bucket": "7d530f84-6cc3-4dcd-8488-873fab51283a", 
      "key": "eng_1923-1979.bin", 
      "type": "bin", 
      "size": 3362350702
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7d530f84-6cc3-4dcd-8488-873fab51283a/eng_1980_or_after.bin"
      }, 
      "checksum": "md5:e2e6849025ed6ed9bcf4a1a66fefb6d4", 
      "bucket": "7d530f84-6cc3-4dcd-8488-873fab51283a", 
      "key": "eng_1980_or_after.bin", 
      "type": "bin", 
      "size": 2992494559
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7d530f84-6cc3-4dcd-8488-873fab51283a/fre.bin"
      }, 
      "checksum": "md5:9ffcd0bc224808f4eb26a1405ad7ef6c", 
      "bucket": "7d530f84-6cc3-4dcd-8488-873fab51283a", 
      "key": "fre.bin", 
      "type": "bin", 
      "size": 1334038267
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7d530f84-6cc3-4dcd-8488-873fab51283a/ger.bin"
      }, 
      "checksum": "md5:901a65063a63db2d61c72ccfec9817fc", 
      "bucket": "7d530f84-6cc3-4dcd-8488-873fab51283a", 
      "key": "ger.bin", 
      "type": "bin", 
      "size": 1665531761
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7d530f84-6cc3-4dcd-8488-873fab51283a/ht-640d-complete-half-precision.bin"
      }, 
      "checksum": "md5:e3752fd49b778674a321fc619c9f81a2", 
      "bucket": "7d530f84-6cc3-4dcd-8488-873fab51283a", 
      "key": "ht-640d-complete-half-precision.bin", 
      "type": "bin", 
      "size": 17721951909
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7d530f84-6cc3-4dcd-8488-873fab51283a/ita.bin"
      }, 
      "checksum": "md5:efadff54374de2e5fab31457ace0d8fc", 
      "bucket": "7d530f84-6cc3-4dcd-8488-873fab51283a", 
      "key": "ita.bin", 
      "type": "bin", 
      "size": 417971058
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7d530f84-6cc3-4dcd-8488-873fab51283a/jpn.bin"
      }, 
      "checksum": "md5:d1b710006321ae4b99fd7227be4f0bcf", 
      "bucket": "7d530f84-6cc3-4dcd-8488-873fab51283a", 
      "key": "jpn.bin", 
      "type": "bin", 
      "size": 632023904
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7d530f84-6cc3-4dcd-8488-873fab51283a/Other.bin"
      }, 
      "checksum": "md5:fc0b487e8633f95f38bb93ffb229e7bf", 
      "bucket": "7d530f84-6cc3-4dcd-8488-873fab51283a", 
      "key": "Other.bin", 
      "type": "bin", 
      "size": 2380959253
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7d530f84-6cc3-4dcd-8488-873fab51283a/rus.bin"
      }, 
      "checksum": "md5:1466aa25e5440176d41f3e23870bf84b", 
      "bucket": "7d530f84-6cc3-4dcd-8488-873fab51283a", 
      "key": "rus.bin", 
      "type": "bin", 
      "size": 530610491
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7d530f84-6cc3-4dcd-8488-873fab51283a/spa.bin"
      }, 
      "checksum": "md5:11997b2761a6fb8ea3a409e92d8678c8", 
      "bucket": "7d530f84-6cc3-4dcd-8488-873fab51283a", 
      "key": "spa.bin", 
      "type": "bin", 
      "size": 650245089
    }
  ], 
  "owners": [
    52726
  ], 
  "doi": "10.5281/zenodo.1424831", 
  "stats": {
    "version_unique_downloads": 12.0, 
    "unique_views": 208.0, 
    "views": 219.0, 
    "version_views": 217.0, 
    "unique_downloads": 12.0, 
    "version_unique_views": 206.0, 
    "volume": 393672505994.0, 
    "version_downloads": 58.0, 
    "downloads": 58.0, 
    "version_volume": 393672505994.0
  }, 
  "links": {
    "doi": "https://doi.org/10.5281/zenodo.1424831", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.1424830", 
    "bucket": "https://zenodo.org/api/files/7d530f84-6cc3-4dcd-8488-873fab51283a", 
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.1424830.svg", 
    "html": "https://zenodo.org/record/1424831", 
    "latest_html": "https://zenodo.org/record/1424831", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.1424831.svg", 
    "latest": "https://zenodo.org/api/records/1424831"
  }, 
  "conceptdoi": "10.5281/zenodo.1424830", 
  "created": "2018-10-15T22:00:03.363256+00:00", 
  "updated": "2020-01-24T19:25:33.067717+00:00", 
  "conceptrecid": "1424830", 
  "revision": 24, 
  "id": 1424831, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.1424831", 
    "description": "<p>A smaller-resolution (and therefore more portable) version of the Stable Random Projection Hathi Trust features described in my forthcoming article. The Northeastern repository is many individual files with 1280 random dimensions; this is just 640 random dimensions. The numbers are also experimentally encoded as half-precision floats, which cuts the file size by half at the cost of only being supported by my Python module. The net result is a file 1/4 the size of the full resolution ones for the paper that has, probably, something like 60-80% of the information content.</p>\n\n<p>The full file is &#39;<a href=\"https://www.zenodo.org/api/files/6d615dbd-65de-4391-93ac-91b302bb57e4/ht-640d-complete-half-precision.bin?versionId=0e9f5551-0888-454c-8bb8-0dd3f3d6d949\">ht-640d-complete-half-precision.bin</a>&#39;. You can also download 11 smaller files organized by language.</p>\n\n<p>Code to read these files is at&nbsp;https://github.com/bmschmidt/pySRP.&nbsp;</p>", 
    "license": {
      "id": "CC-BY-4.0"
    }, 
    "title": "Hathi Trust Library Vectorized features", 
    "relations": {
      "version": [
        {
          "count": 1, 
          "index": 0, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "1424830"
          }, 
          "is_last": true, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "1424831"
          }
        }
      ]
    }, 
    "version": "1.0", 
    "keywords": [
      "Dimensionality reduction", 
      "Digital Libraries", 
      "Digital humanities"
    ], 
    "publication_date": "2018-09-21", 
    "creators": [
      {
        "orcid": "0000-0002-1142-5720", 
        "affiliation": "Northeastern University", 
        "name": "Benjamin M. Schmidt"
      }
    ], 
    "access_right": "open", 
    "resource_type": {
      "type": "dataset", 
      "title": "Dataset"
    }, 
    "related_identifiers": [
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.1424830", 
        "relation": "isVersionOf"
      }
    ]
  }
}
217
58
views
downloads
All versions This version
Views 217219
Downloads 5858
Data volume 393.7 GB393.7 GB
Unique views 206208
Unique downloads 1212

Share

Cite as