Planned intervention: On Thursday March 28th 07:00 UTC Zenodo will be unavailable for up to 5 minutes to perform a database upgrade.
Published December 8, 2021 | Version 1.0.0
Dataset Open

MACIE scores for human genome assembly GRCh37 Part 1 (Chr1 - Chr3)

  • 1. Department of Biostatistics, Harvard T.H. Chan School of Public Health
  • 2. Genentech/Roche
  • 3. Department of Biostatistics, University of Texas MD Anderson Cancer Center
  • 4. Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles
  • 5. Department of Epidemiology, Harvard T.H. Chan School of Public Health
  • 6. School of Statistics, Southwestern University of Finance and Economics
  • 7. Department of Biostatistics, Columbia University Mailman School of Public Health

Description

MACIE (Multi-dimensional Annotation Class Integrative Estimation) is an unsupervised multivariate mixed model framework to assess multi-dimensional functional impacts for both coding and non-coding variants in the human genome. MACIE integrates a variety of functional annotations, including protein function scores, evolutionary conservation scores, and epigenetic annotations from ENCODE and Roadmap Epigenomics, and estimates the joint posterior probabilities of each genetic variant being functional.

For each non-synonymous coding variant, the MACIE score is a vector of length 4, representing the estimated joint posterior probabilities of “not damaging protein functional and evolutionarily conserved” (MACIE01); “damaging protein functional and not evolutionarily conserved” (MACIE10); “not damaging protein functional and not evolutionarily conserved” (MACIE00); “both damaging protein functional and evolutionarily conserved” (MACIE11). MACIE_protein is the estimated posterior probability of “damaging protein functional”, which is the sum of MACIE10 and MACIE11; MACIE_conserved is the estimated posterior probability of “evolutionarily conserved”, which is the sum of MACIE01 and MACIE11; MACIE_anyclass is the estimated posterior probability of “damaging protein functional” or “evolutionarily conserved”, which is the sum of MACIE01, MACIE10, and MACIE11.

For each non-coding and synonymous coding variant, the MACIE score is a vector of length 4, representing the estimated joint posterior probabilities of “not evolutionarily conserved and regulatory functional” (MACIE01); “evolutionarily conserved and not regulatory functional” (MACIE10); “not evolutionarily conserved and not regulatory functional” (MACIE00); “both evolutionarily conserved and regulatory functional (MACIE11). MACIE_conserved is the estimated posterior probability of “evolutionarily conserved”, which is the sum of MACIE10 and MACIE11; MACIE_regulatory is the estimated posterior probability of “regulatory functional”, which is the sum of MACIE01 and MACIE11; MACIE_anyclass is the estimated posterior probability of “evolutionarily conserved” or “regulatory functional”, which is the sum of MACIE01, MACIE10, and MACIE11.

Files

Files (45.3 GB)

Name Size Download all
md5:374292a27ed2fb89d4415956581b58bc
3.9 GB Download
md5:dae53a5b33b53d9b6b2beaaa09dbe274
768.7 kB Download
md5:7af825caa21941973a11dfb454f61a1c
14.1 GB Download
md5:dd9c16cf70d4fe11158b1215525ff2cb
216.2 kB Download
md5:6f9371196a03ab745c54acaca636f89d
15.0 GB Download
md5:ef7001acf298b38fe772401dced03f01
228.3 kB Download
md5:35263fe07bb145f93752ffc72111da9f
12.3 GB Download
md5:5ee7605e2947a4036a80febecbde2862
186.7 kB Download