A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.
Report
generated on 2021-02-10, 17:03
based on data in:
/Users/cerebis/git/qc3C/Feb7_2020/drr
qc3C
qc3C provides reference-free and BAM based quality control for Hi-C data
BAM mode analysis details
This table details various alignment features which are potentially of interest to researchers attempting to assess the quality of a Hi-C library.
| Sample | Digest | Accepted pairs | Read length | Insert length | Unobserved | Read-thru |
|---|---|---|---|---|---|---|
| DRR177157 | HindIII | 200000 | 127bp | 217bp | 0.0% | 35.8% |
| DRR177158 | HindIII | 200000 | 127bp | 233bp | 0.0% | 39.1% |
| DRR177159 | DpnII | 200000 | 127bp | 219bp | 0.0% | 49.1% |
| DRR177160 | DpnII | 200000 | 127bp | 218bp | 0.0% | 46.0% |
| DRR177161 | Sau3AI MluCI | 200000 | 127bp | 511bp | 53.2% | 5.1% |
| DRR177162 | Sau3AI MluCI | 200000 | 127bp | 550bp | 56.6% | 3.6% |
| DRR177163 | DpnII HinfI | 200000 | 127bp | 297bp | 19.3% | 30.8% |
| DRR177164 | DpnII HinfI | 200000 | 127bp | 319bp | 24.4% | 29.9% |
| DRR177165 | HindIII | 200000 | 127bp | 248bp | 4.1% | 41.4% |
| DRR177166 | HindIII | 200000 | 127bp | 248bp | 4.1% | 36.7% |
| DRR177167 | HindIII | 200000 | 151bp | 219bp | 0.0% | 42.3% |
| DRR177168 | HindIII | 200000 | 151bp | 227bp | 0.0% | 46.0% |
| DRR177169 | DpnII | 200000 | 151bp | 218bp | 0.0% | 57.7% |
| DRR177170 | DpnII | 200000 | 151bp | 213bp | 0.0% | 53.7% |
| DRR177171 | Sau3AI MluCI | 200000 | 151bp | 485bp | 41.2% | 5.2% |
| DRR177172 | DpnII HinfI | 200000 | 151bp | 314bp | 8.0% | 37.1% |
| DRR177173 | HindIII | 200000 | 151bp | 237bp | 0.0% | 49.6% |
BAM mode read parsing
This figure displays a breakdown of proportion of parsed reads rejected due to various criteria and the proportion that were accepted.
BAM mode HiC-Pro validation
A visualisation of the read-pair categories devised by HiC-Pro.
As the field has moved from 6-cutter to 4-cutter enzymes, and subsequently dual-enzyme digests, the higher density of sites has made this framework less useful, since it has become increasingly easy to satisfy the intervening site criteria.
BAM mode long-range pairs
This plot visualises the breakdown of read-pairs based on separation distance.
The breakdown of separation distance is only calculated for cis-mapping pairs.
Ideally, Hi-C proximity ligation should produce many pairs which are greater than 1000 bp apart. However, these statistics are strongly influenced by the state of the reference. For draft assemblies the distance at which pairs can map is limited by the degree of fragmentation and length of contigs. As a result, many more pairs will be categorised as trans-mapping and pairs which are truly inter-molecular cannot be distinguished from those which are merely inter-contig.
BAM mode distribution of fragment separation
This figure displays the a normalised histogram of read-pair separation binned uniformly in log-space.
Due to the binning strategy, the x-axis is log-scaled and visually accommodates pair separations up to 1 million bp. The inferred insert size for each library is represented by a dashed, grey vertical line. The y-axis is log-scaled by default, allowing the density attributed to long-range pairs to be more easily seen.
A characteristic of Hi-C libraries, is the presence of a large peak below 1000 bp. qc3C attributes this to regular (and undesirable) shotgun pairs creeping through the Hi-C protocol. The peak is used by qc3C to infer the insert size, which is later employed to estimate unobservable extent of inserts.
Note: the inferred insert size can be significantly smaller than what a sequencing facility might report the experimentally determined insert size to be. This discrepancy can be explained by the failure to account for the additional adapter sequence when fragments are assessed during library preparation.
BAM mode junction breakdown
This figure displays the frequency at which a library's possible junction sequences are actually observed in the reads. (Trivial single-digests are ignored)
For trivial single-enzyme digests, there is only one possible junction sequence and so the result for these experiments are not plotted. For dual-enzyme (such as Phase Genomics) there are four potential junctions, while for dual-enzyme digests with one ambiguous site (such as Arima Genomics) there are 16 possible junction sequences.
How efficiently the more complicated library protocols are at producing hybrid junctions is possibly just a point of interest.
Junctions are named for which enzymes was responsible for creating the 5' and 3' ends.
E.g. Sau3AI/MluCI would involve two different enzymes, while Sau3AI/Sau3AI only one, as
would be the case in a single-enzyme digest. Proceeding the name is the actual junction
sequence.
The junctions are grouped by their 5' and then 3' enzyme, while the color spectrum used across each bar aims to emphasise these enzymatic sources.
Note: in BAM mode, the counts are controlled for false positives, in the sense that read alignments must terminate at a cutsite, but the read sequence must continue and contain the observed junction.
K-mer mode runtime details
This table includes user specified input options, observed read-length and unobservable fraction.
| Sample | Digest | Accepted reads | Insert length | Read length | Unobservable extent |
|---|---|---|---|---|---|
| DRR177157 | HindIII | 200000 | 214bp | 127bp | 3.1% |
| DRR177158 | HindIII | 200000 | 231bp | 127bp | 3.6% |
| DRR177159 | DpnII | 200000 | 216bp | 127bp | 4.7% |
| DRR177160 | DpnII | 200000 | 215bp | 127bp | 4.5% |
| DRR177161 | Sau3AI MluCI | 200000 | 509bp | 127bp | 54.4% |
| DRR177162 | Sau3AI MluCI | 200000 | 549bp | 127bp | 57.9% |
| DRR177163 | DpnII HinfI | 200000 | 295bp | 127bp | 22.8% |
| DRR177164 | DpnII HinfI | 200000 | 315bp | 127bp | 27.7% |
| DRR177165 | HindIII | 200000 | 244bp | 127bp | 7.9% |
| DRR177166 | HindIII | 200000 | 244bp | 127bp | 10.0% |
| DRR177167 | HindIII | 200000 | 216bp | 151bp | 1.6% |
| DRR177168 | HindIII | 200000 | 225bp | 151bp | 2.0% |
| DRR177169 | DpnII | 200000 | 215bp | 151bp | 2.5% |
| DRR177170 | DpnII | 200000 | 210bp | 151bp | 2.2% |
| DRR177171 | Sau3AI MluCI | 200000 | 486bp | 151bp | 41.8% |
| DRR177172 | DpnII HinfI | 200000 | 311bp | 151bp | 11.2% |
| DRR177173 | HindIII | 200000 | 234bp | 151bp | 2.3% |
K-mer mode Hi-C fraction
This table lists the inferred proportion of Hi-C proximity ligation fragments.
Here, Mean adjusted Hi-C fraction represents the best estimate of the proportion of a library's read-pairs which are a product of proximity ligation. This figure is arrived at by correcting the raw estimate for the fraction of insert extent which was not observable.
The observable extent is limited by the length of reads relative to the supplied insert size, as well as a further constraint on flanking sequence around any suspected junction sequence.
| Sample | Mean raw Hi-C fraction | Mean adjusted Hi-C fraction |
|---|---|---|
| DRR177157 | 40.4% | 43.1% |
| DRR177158 | 45.5% | 49.0% |
| DRR177159 | 51.0% | 56.5% |
| DRR177160 | 49.3% | 54.4% |
| DRR177161 | 6.8% | 15.4% |
| DRR177162 | 4.8% | 11.8% |
| DRR177163 | 41.5% | 56.2% |
| DRR177164 | 41.0% | 59.4% |
| DRR177165 | 49.4% | 55.8% |
| DRR177166 | 43.8% | 51.9% |
| DRR177167 | 46.0% | 47.9% |
| DRR177168 | 51.3% | 53.8% |
| DRR177169 | 55.4% | 59.0% |
| DRR177170 | 52.7% | 55.8% |
| DRR177171 | 6.7% | 11.6% |
| DRR177172 | 47.1% | 54.9% |
| DRR177173 | 53.4% | 56.3% |
K-mer mode read parsing
This figure displays a breakdown of proportion of parsed reads rejected due to various criteria and the proportion that were accepted.
K-mer mode junction breakdown
This figure displays the frequency at which a library's possible junction sequences are actually observed in the reads. (Trivial single-digests are ignored)
For trivial single-enzyme digests, there is only one possible junction sequence and so the
result for these experiments are not plotted. For dual-enzyme (such as Phase Genomics) there are
four potential junctions, while for dual-enzyme digests with one ambiguous site (such as Arima
Genomics) there are 16 possible junction sequences.
How efficiently the more complicated library protocols are at producing hybrid junctions is
possibly just a point of interest.
Junctions are named for which enzymes was responsible for creating the 5' and 3' ends.
E.g. `Sau3AI/MluCI` would involve two different enzymes, while `Sau3AI/Sau3AI` only one, as
would be the case in a single-enzyme digest. Proceeding the name is the actual junction
sequence.
The junctions are grouped by their 5' and then 3' enzyme, while the color spectrum used across
each bar aims to emphasise these enzymatic sources.
Note: in k-mer mode, the counts are not controlled for false positives.