Jacob D Galson
2020-06-09
<p><strong>Description</strong></p>
<p>These are the processed BCR repertoire sequence data that accompany the following manuscript: “Deep sequencing of B cell receptor repertoires from COVID-19 patients reveals strong convergent immune signatures”. The manuscript preprint is available at doi: <a href="https://doi.org/10.1101/2020.05.20.106294">https://doi.org/10.1101/2020.05.20.106294</a>. The raw sequence data are available on SRA under the BioProject PRJNA638224</p>
<p> </p>
<p><strong>Sequence processing</strong></p>
<p>The Immcantation framework (docker container v3.0.0) was used for sequence processing. Briefly, paired-end reads were joined based on a minimum overlap of 20 nt, and a max error of 0.2, and reads with a mean phred score below 20 were removed. Primer regions, including UMIs and sample barcodes, were then identified within each read, and trimmed. Together, the sample barcode, UMI, and constant region primer were used to assign molecular groupings for each read. Within each grouping, usearch, was used to subdivide the grouping, with a cutoff of 80% nucleotide identity, to account for randomly overlapping UMIs. Each of the resulting groupings is assumed to represent reads arising from a single RNA. Reads within each grouping were then aligned, and a consensus sequence determined. For each processed sequence, IgBlast was used to determine V, D and J gene segments, and locations of the CDRs and FWRs. Isotype was determined based on comparison to germline constant region sequences. Sequences annotated as unproductive by IgBlast were removed.</p>
<p> </p>
<p><strong>Sequence data column description</strong></p>
<ul>
<li><strong>sample_id </strong>Unique identifier for each sequencing library</li>
<li><strong>sequence_id </strong>Unique identifier for a sequence within a sample_id</li>
<li><strong>sequence_alignment </strong>IMGT gapped nucleotide sequence</li>
<li><strong>germline_alignment </strong>IMGT gapped germline sequence</li>
<li><strong>v_call </strong>IGHV gene segment(s) and allele</li>
<li><strong>d_call </strong>IGHD gene segment(s) and allele</li>
<li><strong>j_call </strong>IGHJ gene segment(s) and allele</li>
<li><strong>c_call </strong>Isotype subclass</li>
<li><strong>junction </strong>Junction nucleotide sequence</li>
<li><strong>junction_aa </strong>Junction amino acid sequence</li>
<li><strong>duplicate_count </strong>UMI count for the given unique sequence</li>
<li><strong>consensus_count </strong>Raw read count for the given unique sequence</li>
</ul>
<p> </p>
<p><strong>Sequence metadata column description</strong></p>
<ul>
<li><strong>sample_id </strong>Unique identifier for each sequencing library</li>
<li><strong>bioproject_accession </strong>NCBI BioProject accession number</li>
<li><strong>biosample_accession </strong>NCBI BioSample accession number</li>
<li><strong>sra_accession </strong>NCBI SRA accession number</li>
<li><strong>sex </strong>Sex of patient</li>
<li><strong>age </strong>Age of patient at time of sampling</li>
<li><strong>ethnicity </strong>Ethnicity of patient</li>
<li><strong>health_state </strong>One of worsening, stable, or improving</li>
</ul>
https://doi.org/10.5281/zenodo.3886395
oai:zenodo.org:3886395
eng
Zenodo
https://doi.org/10.1101/2020.05.20.106294
https://zenodo.org/communities/covid-19
https://zenodo.org/communities/airr
https://doi.org/10.5281/zenodo.3886394
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
COVID-19
B cell repertoire
Antibody
SARS-CoV-2
PRJNA638224 - BCR repertoire sequencing from COVID-19 patients
info:eu-repo/semantics/other