Published April 1, 2018 | Version 1.0.0
Dataset Open

Project Tycho Level 1 data: Counts of multiple diseases reported in UNITED STATES OF AMERICA, 1916-2011

  • 1. University of Pittsburgh Graduate School of Public Health
  • 1. University of Pittsburgh Graduate School of Public Health
  • 2. University of California, Los Angeles
  • 3. Johns Hopkins Bloomberg School of Public Health
  • 4. University of Pittsburgh School of Library and Information Science
  • 5. Pittsburgh Supercomputing Center
  • 6. University of Florida Department of Biology

Description

Project Tycho data include counts of infectious disease cases or deaths per time interval. A count is equivalent to a data point. Project Tycho level 1 data include data counts that have been standardized for a specific, published, analysis. Standardization of level 1 data included representing various types of data counts into a common format and excluding data counts that are not required for the intended analysis. In addition, external data such as population data may have been integrated with disease data to derive rates or for other applications.

Version 1.0.0 of level 1 data includes counts at the state level for smallpox, polio, measles, mumps, rubella, hepatitis A, and whooping cough and at the city level for diphtheria. The time period of data varies per disease somewhere between 1916 and 2011. This version includes cases as well as incidence rates per 100,000 population based on historical population estimates. These data have been used by investigators at the University of Pittsburgh to estimate the impact of vaccination programs in the United States, published in the New England Journal of Medicine: http://www.nejm.org/doi/full/10.1056/NEJMms1215400. See this paper for additional methods and detail about the origin of level 1 version 1.0.0 data.

Level 1 version 1.0.0 data is represented in a CSV file with 7 columns:

  • epi_week: a six digit number that represents the year and epidemiological week for which disease cases or deaths were reported (yyyyww)
  • state: the two digit postal code state abbreviation that represents the state for which a count has been reported
  • loc: the name of a state or city for which a count has been reported, capitalized
  • loc_type: the type of location (STATE or CITY) for which a count has been reported
  • disease: the disease for which a count has been reported: HEPATITIS A, MEASLES, MUMPS, PERTUSSIS, POLIO, RUBELLA, SMALLPOX, or DIPHTHERIA
  • cases: the number of cases reported for the specified disease, epidemiological week, and location
  • incidence_per_100000: the number of cases per 100,000 people, computed using historical population counts for cities and states as reported by the US Census Bureau

Files

ProjectTycho_Level1_v1.0.0.json

Files (3.7 MB)

Name Size Download all
md5:31913bcc56ff83f19127b6225c4bb467
83.6 kB Preview Download
md5:e75c5ac2f97a17cd056fb53f9bf38117
10.5 kB Preview Download
md5:4c3991594d8dbad117e69ef49808eb5b
3.6 MB Preview Download

Additional details

Dates

Collected
1916/2011
Time interval of the counts in the dataset