Published June 12, 2022
| Version GRCh38.p13
Dataset
Open
GRCh38.p13 Reference FASTA (bgzip'd with faidx)
Creators
Description
This is derived from https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.39_GRCh38.p13/GRCh38_major_release_seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna.gz.
All non-primary sequences have been removed.
It has then been recompressed with bgzip and indexed with samtools:
curl -#fSL https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.39_GRCh38.p13/GRCh38_major_release_seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna.gz -o genomic.fna.gz
gunzip genomic.fna.gz
awk '{ if ((NR>1)&&($0~/^>/)) { printf("\n%s", $0); } else if (NR==1) { printf("%s", $0); } else { printf("\t%s", $0); } }' genomic.fna | grep -v "^>chr\S*_" - | tr "\t" "\n" > genomic.short.fna
bgzip -c genomic.short.fna > reference.fna.bgz
samtools faidx reference.fna.bgz
tar -czvf GRCh38_reference_fasta.tar reference.fna.bgz reference.fna.bgz.fai reference.fna.bgz.gzi
This tar file contains:
- reference.fna.bgz
- reference.fna.bgz.fai
- reference.fna.bgz.gzi
Files
Files
(882.8 MB)
Name | Size | Download all |
---|---|---|
md5:a4047175ae90e2df36900f039f1cf260
|
882.8 MB | Download |