Published June 13, 2017 | Version v1
Dataset Open

Pre-processed B-cell receptor amplicon sequencing data from SRR1842411

  • 1. Masaryk University

Description

An example dataset containing B-cell receptor (BCR) gene sequences. This dataset is intended to be used for testing software tools developed to annotate (i.e. map Variable, Diversity and Joining segments) and perform clonal analysis of BCR sequencing data.

Sequencing:

Libraries prepared using 5'RACE from PBMCs of a healthy donor. Input molecules were tagged with unique molecular identifiers (UMIs). Sequencing was ran on MiSeq , 300+300bp reads.

Contents:

The dataset contains both raw sequencing reads and high-quality consensus sequences assembled using unique molecular tagging (UMI) approach. Consensus assembly corrects for sequencing errors and eliminates sequencing artifacts.

  • age_ig_s7_R1.fastq.gz and age_ig_s7_R2.fastq.gz contain raw reads
  • age_ig_s7_R1.t10.cf.fastq.gz and age_ig_s7_R2.t10.cf.fastq.gz contain consensus sequences

All files contain an UMI tag sequence in their header, in form UMI:NNNN:QQQQ where N is the base character and Q is the quality character (for assembled consensuses the total number of reads is given instead of Q string).

Note that consensus sequences were assembled using only raw sequences that correspond to UMI tags supported by at least 10 sequencing reads. That means that consensus sequence files contain a subset of all UMI tags found in raw sequences. Thus, if one wants to assess software performance on raw sequencing reads using assembled consensus sequences as a high-quality data standard, raw sequencing reads should be filtered to contain only those UMI tags that are present in consensus sequence file.

Citations:

The whole dataset was used to benchmark MiXCR software and was originally referenced in Bolotin DA, et al. MiXCR: software for comprehensive adaptive immunity profiling Nature methods 12(5):380-381, 2015.

Data pre-processing was carried out using MIGEC software, Shugay M et al. Towards error-free profiling of immune repertoires. Nature Methods 11(6):653-655, 2014.

Contributors:

The dataset was generated in Prof. Chudakov lab (Adaptive Immunity Group in Masaryk University, Brno and Genomics of Adaptive Immunity Lab in Institute of Bioorganic Chemistry, Moscow). Sample preparation and sequencing was performed by Dr. Olga Britanova and Dr. Maria Turchaninova. Raw sequencing reads were pre-processed and uploaded by Dr. Mikhail Shugay.

Files

Files (948.0 MB)

Name Size Download all
md5:40d7445eac26646e62016904fb27617b
416.5 MB Download
md5:616370fe0edc2a4777d74bd5d7dcd431
5.1 MB Download
md5:f09b9582464e44d3037be31138918579
518.3 MB Download
md5:22f2b0548edb05effe7fd424c277239f
8.1 MB Download