PRJNA638224 - BCR repertoire sequencing from COVID-19 patients

Jacob D Galson

doi:10.5281/zenodo.3886395

Published June 9, 2020 | Version 1.0

Dataset Open

PRJNA638224 - BCR repertoire sequencing from COVID-19 patients

Jacob D Galson¹

1. Alchemab Therapeutics Ltd

Description

These are the processed BCR repertoire sequence data that accompany the following manuscript: “Deep sequencing of B cell receptor repertoires from COVID-19 patients reveals strong convergent immune signatures”. The manuscript preprint is available at doi: https://doi.org/10.1101/2020.05.20.106294. The raw sequence data are available on SRA under the BioProject PRJNA638224

Sequence processing

The Immcantation framework (docker container v3.0.0) was used for sequence processing. Briefly, paired-end reads were joined based on a minimum overlap of 20 nt, and a max error of 0.2, and reads with a mean phred score below 20 were removed. Primer regions, including UMIs and sample barcodes, were then identified within each read, and trimmed. Together, the sample barcode, UMI, and constant region primer were used to assign molecular groupings for each read. Within each grouping, usearch, was used to subdivide the grouping, with a cutoff of 80% nucleotide identity, to account for randomly overlapping UMIs. Each of the resulting groupings is assumed to represent reads arising from a single RNA. Reads within each grouping were then aligned, and a consensus sequence determined. For each processed sequence, IgBlast was used to determine V, D and J gene segments, and locations of the CDRs and FWRs. Isotype was determined based on comparison to germline constant region sequences. Sequences annotated as unproductive by IgBlast were removed.

Sequence data column description

sample_id Unique identifier for each sequencing library
sequence_id Unique identifier for a sequence within a sample_id
sequence_alignment IMGT gapped nucleotide sequence
germline_alignment IMGT gapped germline sequence
v_call IGHV gene segment(s) and allele
d_call IGHD gene segment(s) and allele
j_call IGHJ gene segment(s) and allele
c_call Isotype subclass
junction Junction nucleotide sequence
junction_aa Junction amino acid sequence
duplicate_count UMI count for the given unique sequence
consensus_count Raw read count for the given unique sequence

Sequence metadata column description

sample_id Unique identifier for each sequencing library
bioproject_accession NCBI BioProject accession number
biosample_accession NCBI BioSample accession number
sra_accession NCBI SRA accession number
sex Sex of patient
age Age of patient at time of sampling
ethnicity Ethnicity of patient
health_state One of worsening, stable, or improving

Files

Files (425.5 MB)

Name	Size
alchemab-covid19-metadata-200609.tsv md5:f25c20340746a5ad076569013a018ecc	1.7 kB	Download
alchemab-covid19-processed-data-200609.csv.gz md5:5aa7bf432270bb0d3069b947ead736dc	425.5 MB	Download

Additional details

Is cited by: Preprint: 10.1101/2020.05.20.106294 (DOI)

	All versions	This version
Views	2,318	1,014
Downloads	651	392
Data volume	185.7 GB	119.6 GB

PRJNA638224 - BCR repertoire sequencing from COVID-19 patients

Authors/Creators

Description

Files

Files (425.5 MB)

Additional details

Related works