Published July 5, 2021 | Version v2
Dataset Open

1000 Genomes Project Cleaned Dataset

Description

The first four authors performed standard quality control analysis on the 1000 Genomes (1KG) Project genotypes that were generated on the Illumina Omni2.5M chip, at the Broad and Sanger Institutes. The datasets were then posted on the website of The Centre for Applied Genomics at Sick Kids Hospital at https://www.tcag.ca/tools/1000genomes.html. The last two authors then looked for overlap between those datasets and the Hapmap3 datasets that had gene expression for Endoplasmic Reticulum Aminopeptidase 2 (ERAP2), and chose the Yoruban from Ibadan, Nigeria (YRI) and Utah residents with Northern and Western European ancestry (CEU) subpopulations. These two subpopulations had the largest overlap between the 1KG and HapMap3 datasets, with 91 YRI and 104 CEU samples. The text files provided in this repository contain the IDs of all invidividuals and phenotypes for the labelled populations e.g. ERAP2_CEU_YRI_phenotypes.txt has phenotypes for both populations. The two *_pc_outliers.txt contain the IDs of the individuals excluded from analysis due to extraneous principal components. In summary, 88 YRI and 102 CEU individuals were included in the analysis. 

Files

CEU_pc_outliers.txt

Files (935.0 MB)

Name Size Download all
md5:fc630aac6d4531f548b41928307453ef
25 Bytes Preview Download
md5:9f75eb9d0e5e7c07f4d9639da2b633cd
42.1 kB Preview Download
md5:76bc779772f847e3445f1df815f49d64
43.4 kB Preview Download
md5:f13e64bb2c6668126ed88bb46aebc7a3
42.0 kB Preview Download
md5:641ffa37909a33fffef0e2469b96a2a0
873.3 MB Download
md5:3f33c1e34d794d1ce50cd9425fe35b27
61.6 MB Download
md5:e7fe24cfaf57c641e28849ea986d1774
41.7 kB Download
md5:bd8510bec6c4c8a77a9d5364a25d26a6
38 Bytes Preview Download

Additional details

References