Pre-processed IgH repertoire sequencing data from BioProject PRJNA748239
Creators
- 1. Division of Immunology, University Children's Hospital, University of Zurich, Zurich, Switzerland
Description
Data Processing
Samples were demultiplexed via their Illumina indices, and processed using the Immcantation toolkit(1,2). Raw fastq files were filtered based on a quality score threshold of 20. Paired reads were joined if they had a minimum length of 10 nt, maximum error rate of 0.3 and a significance threshold of 0.0001. Reads with identical UMI were collapsed to a consensus sequence. Reads with identical full-length sequence and identical constant primer but differing UMI were further collapsed. Sequences were then submitted to IgBlast (3) for VDJ assignment and sequence annotation. Constant region sequences were mapped to germline using Stampy(4). The number and type of V gene mutations was calculated using the shazam R package.(2)
software_versions pRESTO:0.5.3,Change-O:0.3.4,IgBlast 1.6.1, stampy1.0.21. shazam0.1.8
quality_thresholds FilterSeq.py pRESTO Q>20
paired_reads_assembly AssemblePairs.py pRESTO minlen 10 maxerror 0.3 alpha 0.0001
primer_match_cutoffs MaskPrimers.py pRESTO C primer & V primer maxerror 0.2
consensus_building BuildConsensus.py pRESTO maxerror 0.1 maxgap 0.5
collapsing_method CollapseSeq.py pRESTO
germline_database IMGT
Format
Processed sequences are provided in a tab delimited file format, including the following annotations:
C_CALL Isotype subclass
SEQUENCE_ID Sequence identifier
V_CALL V segment gene and allele
D_CALL D segment gene and allele
J_CALL J segment gene and allele
JUNCTION_LENGTH Junction length
CONSCOUNT Raw read count from which UMI consensus sequences were generated, summed over all UMIs for the given unique sequence.
DUPCOUNT UMI count for the given unique sequence
ISOTYPE Constant region primer (isotype)
MU_COUNT_CDR_R Number of replacement mutations in CDR region
MU_COUNT_CDR_S Number of silent mutations in CDR region
MU_COUNT_FWR_R Number of replacement mutations in FWR region
MU_COUNT_FWR_S Number of silent mutations in FWR region
MUT_TOTAL Total number of mutations in V gene
NP_LENGTH Total number of N and P additions
SEQUENCE_INPUT Full length sequence
SEQUENCE_IMGT Gapped IMGT sequence
V_GERM_START_VDJ position of the first nucleotide in ungapped V germline sequence alignment
JUNCTION Junction nucleotide sequence
GERMLINE_IMGT_D_MASK IMGT-gapped germline nucleotide sequence with ns masking the NP1-D-NP2 regions
CDR3_AA_GRAVY CDR3 hydrophobicity
CDR3_AA_BULK CDR3 bulkiness
CDR3_AA_ALIPHATIC Normalized aliphatic index
CDR3_AA_POLARITY CDR3 polarity
CDR3_AA_CHARGE normalised net charge
CDR3_AA_BASIC Basic side chain residue content
CDR3_AA_ACIDIC Acidic side chain residue content
CDR3_AA_AROMATIC aromatic side chain conten
Subset Defined B cell subset
Repertoire Defined B cell repertoire (Naive, Memory IgM/IgD, IgA, IgG)
R_SCDR R/S ratio in CDR region
R_SFWR R/S ratio in FWR region
V_GENE V segment gene
D_GENE D segment gene
J_GENE J segment gene
V_FAM V family gene
Run ID of sequencing run
Sex Sex of the Subject
Age Age of the subject
UNIQUE_ID Subject identifier
SAMPLE Sample identifier, linking back to raw data
Bcellno Number of input B cells
Cells Cell type
References
1. Vander Heiden, J. A., G. Yaari, M. Uduman, J. N. H. Stern, K. C. O’Connor, D. A. Hafler, F. Vigneault, and S. H. Kleinstein. 2014. PRESTO: A toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics30: 1930–1932.
2. Gupta, N. T., J. A. Vander Heiden, M. Uduman, D. Gadala-Maria, G. Yaari, and S. H. Kleinstein. 2015. Change-O: A toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics31: 3356–3358.
3. Ye, J., N. Ma, T. L. Madden, and J. M. Ostell. 2013. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res.41.
4. Lunter, G., and M. Goodson. 2011. Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res.21: 936–939.
Files
Files
(627.2 MB)
Name | Size | Download all |
---|---|---|
md5:23fa5acc9e4c8da3f7301308dbf2eddd
|
627.2 MB | Download |