Published September 16, 2021 | Version v1
Dataset Open

Pre-processed IgH receptor repertoire data from MS patients after aHSCT from BioProject PRJNA763367

  • 1. Division of Immunology, University Children's Hospital, University of Zurich, Zurich, Switzerland

Description

Data Processing

Samples were demultiplexed via their Illumina indices, and processed using the Immcantation toolkit(1,2). Raw fastq files were filtered based on a quality score threshold of 20. Paired reads were joined if they had a minimum length of 10 nt, maximum error rate of 0.3 and a significance threshold of 0.0001. Reads with identical UMI were collapsed to a consensus sequence. Reads with identical full-length sequence and identical constant primer but differing UMI were further collapsed. Sequences were then submitted to IgBlast (3) for VDJ assignment and sequence annotation. Constant region sequences were mapped to germline using Stampy(4). The number and type of V gene mutations was calculated using the shazam R package.(2)

 

software_versions            pRESTO:0.5.3,Change-O:0.3.4,IgBlast 1.6.1, stampy1.0.21. shazam0.1.8

quality_thresholds            FilterSeq.py pRESTO Q>20

paired_reads_assembly        AssemblePairs.py pRESTO minlen 10 maxerror 0.3 alpha 0.0001

primer_match_cutoffs        MaskPrimers.py pRESTO C primer & V primer maxerror 0.2

consensus_building        BuildConsensus.py pRESTO maxerror 0.1 maxgap 0.5

collapsing_method        CollapseSeq.py pRESTO

germline_database        IMGT

 

Format

Processed sequences are provided in a tab delimited file format, including the following annotations:

 

ISOTYPE_SUBCLASS                    Isotype subclass

SEQUENCE_ID                Sequence identifier

JUNCTION_LENGTH            Junction length

CONSCOUNT                Raw read count from which UMI consensus sequences were generated, summed over all UMIs for the given unique sequence.

DUPCOUNT                    UMI count for the given unique sequence

ISOTYPE                    Constant region primer (isotype)

MUT_TOTAL                    Total number of mutations in V gene 

SAMPLE                    Sample identifier, linking back to raw data

JUNCTION                    Junction nucleotide sequence

Protein_seq                    Amino acid sequence

CDR3_AA_GRAVY                CDR3 hydrophobicity index

CDR3_AA_BULK                CDR3 bulkiness

CDR3_AA_ALIPHATIC                CDR3 aliphatic index

CDR3_AA_POLARITY                CDR3 polarity

CDR3_AA_CHARGE                CDR3 normalized net charge

CDR3_AA_BASIC                CDR3 basic side chain residue content

CDR3_AA_ACIDIC            CDR3 acidic side chain residue content

CDR3_AA_AROMATIC                CDR3 aromatic side chain content

Subset                    Defined B cell subset 

Repertoire                    Defined B cell repertoire (Naive, Memory IgM/IgD, IgA, IgG)

R_SCDR                    R/S ratio in CDR region

R_SFWR                    R/S ratio in FWR region

V_GENE                    V segment gene

D_GENE                    D segment gene

J_GENE                    J segment gene

V_FAM                    V family gene

Clust_REPRES                Cluster representative

Clust_SIZE                    Cluster size

Sex                        Sex of the Subject

UNIQUE_ID                    Sample identifier 

Bcellno                    Input B cell number

Days_posttx                        Sampling time point relative to transplantation

Age_at_tx                        Age of the subject (at aHSCT)

Disease                        MS subtype

Last_therapy                        Last therapy prior to aHSCT

Disease_duration                        Disease duration

CMV_reactivation                        Cytomegalovirus reactivation

Month_label                        Month post-aHSCT inverval bin

Patient_label                        Subject identifier

 

References

1. Vander Heiden, J. A., G. Yaari, M. Uduman, J. N. H. Stern, K. C. O’Connor, D. A. Hafler, F. Vigneault, and S. H. Kleinstein. 2014. PRESTO: A toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics30: 1930–1932.

2. Gupta, N. T., J. A. Vander Heiden, M. Uduman, D. Gadala-Maria, G. Yaari, and S. H. Kleinstein. 2015. Change-O: A toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics31: 3356–3358.

3. Ye, J., N. Ma, T. L. Madden, and J. M. Ostell. 2013. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res.41.

4. Lunter, G., and M. Goodson. 2011. Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res.21: 936–939.

Files

Files (930.0 MB)

Name Size Download all
md5:6acfba5efab4fccac9d7e24739169297
930.0 MB Download