Published June 12, 2022 | Version GRCh38.p13
Dataset Open

GRCh38.p13 Reference FASTA (bgzip'd with faidx)

Creators

Description

This is derived from https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.39_GRCh38.p13/GRCh38_major_release_seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna.gz.

All non-primary sequences have been removed.

It has then been recompressed with bgzip and indexed with samtools:

curl -#fSL https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.39_GRCh38.p13/GRCh38_major_release_seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna.gz -o genomic.fna.gz
gunzip genomic.fna.gz
awk '{ if ((NR>1)&&($0~/^>/)) { printf("\n%s", $0); } else if (NR==1) { printf("%s", $0); } else { printf("\t%s", $0); } }' genomic.fna | grep -v "^>chr\S*_" - | tr "\t" "\n" > genomic.short.fna
bgzip -c genomic.short.fna > reference.fna.bgz
samtools faidx reference.fna.bgz
tar -czvf GRCh38_reference_fasta.tar reference.fna.bgz reference.fna.bgz.fai reference.fna.bgz.gzi

 

This tar file contains:

  • reference.fna.bgz
  • reference.fna.bgz.fai
  • reference.fna.bgz.gzi

 

Files

Files (882.8 MB)

Name Size Download all
md5:a4047175ae90e2df36900f039f1cf260
882.8 MB Download