The Bioconductor project - analysis and comprehension of high-throughput proteomics data

Laurent Gatto

The Bioconductor project - analysis and comprehension of high-throughput proteomics data

Laurent Gatto                      Computational Proteomics Unit
https://lgatto.github.io           University of Cambridge
lg390@cam.ac.uk                    @lgatt0

Link to slides: http://bit.ly/20170623pmfDOI

Licence

These slides are available under a creative common CC-BY license. You are free to share (copy and redistribute the material in any medium or format) and adapt (remix, transform, and build upon the material) for any purpose, even commercially.

CC-BY

Table of content

Data analysisProteomicsR/BioconductorConclusions

CSAMA 2017, Plose

(Bioconductor CSAMA 2017 workshop, Mount Plose)

Data analysis

What is data analysis

Data analysis is the process by which data becomes understanding, knowledge and insight. Hadley Wickham

The ability to prepare and explore data, identify patterns (good and pathological ones) and convincingly demonstrate that the patterns are genuine (rather than random).

It’s not analysing data, it’s investigating data - requires flexibility.

And also

Data programming, but:

To analyse data, you need

To analyse data, you need

To analyse data, you need

Visualisation

To understand and communication data:

Graphics reveal data.

Visualization can surprise you, but it doesn’t scale well. Modeling scales well, but it can’t surprise you. Hadley Wickham

Proteomics

Quantitative proteomics data analysis

Example …

Proteomics data analysis schematics

Example …

Proteomics data analysis schematics

It is not for the tool/software to tell me what plotting/analysis to perform; it is for me to apply the most appropriate analysis or visualisation.

It is not for the tool/software to tell me what plotting/analysis to perform; it is for me to ask the most appropriate question.

Software: R/Bioconductor

Data analysis tools should enables you to manipulate your data, give some guarantees about the integrity of the data, support effective extract/subset components of the data, visualise them, enable transformation of the data, give access to infrastucture for statistical analysis, and enable annotation of the data.

Bioconductor provides tools for the analysis and comprehension of high-throughput biology data. Uses the R statistical programming language.

Collaborative project: open source and open development, involving biologists, statisticians, programmers, …

Huber W et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods. 2015 Jan 29;12(2):115-21.

Bioconductor stickers

~ 1400 packages ● 62 for mass spectrometry ● 92 for proteomics

Bioc proteomics dependency graph

MSnbase collaborative development

MSnbase contributions

The mzR package

mzR: Efficient access to raw and (netCDF, mzData, mzXML, mzML) identification (mzIdentML).

Chambers et al.. A cross-platform toolkit for mass spectrometry and proteomics. Nature Biotechnology (2012).

The MSnbase package

MSnbase: Convenient infrastucture for mass spectrometry and proteomics data analyis.

Laurent Gatto and Kathryn S. Lilley. MSnbase - an R/Bioconductor package for isobaric tagged mass spectrometry data visualization, processing and quantitation. Bioinformatics 28, 288-289 (2012).

The MSnSet class for quantitative data

Can be subsetted, transformed, visualised, annotated, statistics, …

The pRoloc package

pRoloc: A unifying analysis framework for spatial proteomics: visualisation, classification, novelty detection, transfer learning, Bayesian learning (coming soon).

Gatto L, Breckels LM, Wieczorek S, Burger T, Lilley KS. Mass-spectrometry-based spatial proteomics data analysis using pRoloc and pRolocdata. Bioinformatics. 2014 May 1;30(9):1322-4.
Breckels LM, Mulvey CM, Lilley KS and Gatto L. A Bioconductor workflow for processing and analysing spatial proteomics data. F1000Research 2016, 5:2926 (doi: 10.12688/f1000research.10411.1).

Transform, annotate, visualise

plot2D(msnset, fcol = "loc",
       method = "PCA")

References, resources

Thank you for your attention