ewels/MultiQC: MultiQC Version 1.0
Creators
- Phil Ewels1
- Tim Booth2
- Tobias Neumann3
- Måns Magnusson4
- alexanderscholz
- Vlad Saveliev5
- Guillermo Carrasco6
- Rickard Hammarén
- Lorena Pantano7
- Robin Andeer8
- Brad Chapman9
- Heather L. Wiencko
- Fredrik Boulund10
- Elmar Pruesse
- Roman Valls Guimerà11
- Albert Vilella12
- Mike Lusignan
- Rory Kirchner13
- Tomás Di Domenico
- koelling
- Marcel Martin
- Leif Väremo
- Kamil Slowikowski14
- Kaisa15
- Johan Viklund
- Arthur Vigil
- 1. Science for Life Laboratory
- 2. Edinburgh Genomics
- 3. @IMPIMBA
- 4. Science for Life Labs
- 5. Center for Algorithmic Biotechnology, St. Petersburg State University
- 6. @Fanzone
- 7. Harvard Chan School of Public Health
- 8. @Clinical-Genomics
- 9. Harvard Chan Bioinformatics Core
- 10. Karolinska Institute
- 11. UMCCR
- 12. @cegx
- 13. Harvard School of Public Health
- 14. Harvard University
- 15. @ctmrbio, Karolinska Institutet/Science for Life Laborator
Description
Version 1.0! This release has been a long time coming and brings with it some fairly major improvements in speed, report filesize and report performance. There's also a bunch of new modules, more options, features and a whole lot of bug fixes.
The version number is being bumped up to 1.0 for a couple of reasons:
- MultiQC is now (hopefully) relatively stable. A number of facilities and users are now using it in a production setting and it's published. It feels like it probably deserves v1 status now somehow.
- This update brings some fairly major changes which will break backwards compatibility for plugins. As such, semantic versioning suggests a change in major version number.
For most people, you shouldn't have any problems upgrading. There are two scenarios where you may need to make changes with this update:
1. You have custom file search patternsSearch patterns have been flattened and may no longer have arbitrary depth. For example, you may need to change the following:
fastqc:
data:
fn: 'fastqc_data.txt'
zip:
fn: '*_fastqc.zip'
to this:
fastqc/data:
fn: 'fastqc_data.txt'
fastqc/zip:
fn: '*_fastqc.zip'
See the documentation for instructions on how to write the new file search syntax.
See search_patterns.yaml
for the new module search keys
and more examples.
To see what changes need to applied to your custom plugin code, please see the MultiQC docs.
Module updates:- Adapter Removal - new module!
- AdapterRemoval v2 - rapid adapter trimming, identification, and read merging
- BUSCO - new module!
- New module for the
BUSCO v2
tool, used for assessing genome assembly and annotation completeness.
- New module for the
- Cluster Flow - new module!
- Cluster Flow is a workflow tool for bioinformatics pipelines. The new module parses executed tool commands.
- RNA-SeQC - new module!
- New module to parse output from RNA-SeQC, a java program which computes a series of quality control metrics for RNA-seq data.
- goleft indexcov - new module! Thanks to @chapmanb and @brentp
- goleft indexcov uses the PED and ROC data files to create diagnostic plots of coverage per sample, helping to identify sample gender and coverage issues.
- SortMeRNA - new module! Written by @bschiffthaler
- New module for
SortMeRNA
, commonly used for removing rRNA contamination from datasets.
- New module for
- Bcftools
- Fixed bug with display of indels when only one sample
- Cutadapt
- Now takes the filename if the sample name is
-
(stdin). Thanks to @tdido
- Now takes the filename if the sample name is
- FastQC
- Data for the Sequence content plot can now be downloaded from reports as a JSON file.
- FastQ Screen
- Rewritten plotting method for high sample numbers plot (~ > 20 samples)
- Now shows counts for single-species hits and bins all multi-species hits
- Allows plot to show proper percentage view for each sample, much easier to interpret.
- HTSeq
- Fix bug where header lines caused module to crash
- Picard
- New
RrbsSummaryMetrics
Submodule! - New
WgsMetrics
Submodule! CollectGcBiasMetrics
module now prints summary statistics tomultiqc_data
if found. Thanks to @ahvigil
- New
- Preseq
- Now trims the x axis to the point that meets 90% of
min(unique molecules)
. Hopefully prevents ridiculous x axes without sacrificing too much useful information. - Allows to show estimated depth of coverage instead of less informative molecule counts (see details).
- Plots dots with externally calculated real read counts (see details).
- Now trims the x axis to the point that meets 90% of
- Qualimap
- RNASeq Transcript Profile now has correct axis units. Thanks to @roryk
- BamQC module now doesn't crash if reports don't have genome gc distributions
- RSeQC
- Fixed Python3 error in Junction Saturation code
- Fixed JS error for Junction Saturation that made the single-sample combined plot only show All Junctions
- Change in module structure and import statements (see details).
- Module file search has been rewritten (see above changes to configs)
- Significant improvement in search speed (test dataset runs in approximately half the time)
- More options for modules to find their logs, eg. filename and contents matching regexes (see the docs)
- Report plot data is now compressed, significantly reducing report filesizes.
- New
--ignore-samples
option to skip samples based on parsed sample name- Alternative to filtering by input filename, which doesn't always work
- Also can use config vars
sample_names_ignore
(glob patterns) andsample_names_ignore_re
(regex patterns).
- New
--sample-names
command line option to give file with alternative sample names- Allows one-click batch renaming in reports
- New
--cl_config
option to supply MultiQC config YAML directly on the command line. - New config option to change numeric multiplier in General Stats
- For example, if reports have few reads, can show
Thousands of Reads
instead ofMillions of Reads
- Set config options
read_count_multiplier
,read_count_prefix
andread_count_desc
- For example, if reports have few reads, can show
- Config options
decimalPoint_format
andthousandsSep_format
now apply to tables as well as plots- By default, thosands will now be separated with a space and
.
used for decimal places.
- By default, thosands will now be separated with a space and
- Tables now have a maximum-height by default and scroll within this.
- Speeds up report rendering in the web browser and makes report less stupidly long with lots of samples
- Button beneath table toggles full length if you want a zoomed-out view
- Refactored and removed previous code to make the table header "float"
- Set
config.collapse_tables
toFalse
to disable table maximum-heights
- Bar graphs and heatmaps can now be zoomed in on
- Interactive plots sometimes hide labels due to lack of space. These can now be zoomed in on to see specific samples in more detail.
- Report plots now load sequentially instead of all at once
- Prevents the browser from locking up when large reports load
- Report plot and section HTML IDs are now sanitised and checked for duplicates
- New template available (called sections) which has faster loading
- Only shows results from one module at a time
- Makes big reports load in the browser much more quickly, but requires more clicking
- Try it out by specifying
-t sections
- Module sections tidied and refactored
- New helper function
self.add_section()
- Sections hidden in nav if no title (no more need for the hacky
self.intro +=
) - Content broken into
description
,help
andplot
, with automatic formatting - Empty module sections are now skipped in reports. No need to check if a plot function returns
None
! - Changes should be backwards-compatible
- New helper function
- Report plot data export code refactored
- Now doesn't export hidden samples (uses HighCharts export-csv plugin)
- Handle error when
git
isn't installed on the system. - Refactored colouring of table cells
- Docs updates (thanks to @varemo)
- Previously hidden log file
.multiqc.log
renamed tomultiqc.log
inmultiqc_data
- Added option to load MultiQC config file from a path specified in the environment variable
MULTIQC_CONFIG_PATH
- New table configuration options
sortRows: False
prevents table rows from being sorted alphabeticallycol1_header
allows the default first column header to be changed from "Sample Name"
- Tables no longer show Configure Columns and Plot buttons if they only have a single column
- Custom content updates
- New
custom_content
/order
config option to specify order of Custom Content sections - Tables now use the header for the first column instead of always having
Sample Name
- JSON + YAML tables now remember order of table columns
- Many minor bugfixes
- New
- Line graphs and scatter graphs axis limits
- If limits are specified, data exceeding this is no longer saved in report
- Visually identical, but can make report file sizes considerable smaller in some cases
- Creating multiple plots without a config dict now works (previously just gave grey boxes in report)
- All changes are now tested on a Windows system, using AppVeyor
- Fixed rare error where some reports could get empty General Statistics tables when no data present.
- Fixed minor bug where config option
force: true
didn't work. Now you don't have to always specify-f
!
Files
ewels/MultiQC-v1.0.zip
Files
(1.8 MB)
Name | Size | Download all |
---|---|---|
md5:9d9c1b8794dc495d27fc40847fe8bbfd
|
1.8 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/ewels/MultiQC/tree/v1.0 (URL)