class: center, middle, inverse # emzed  # LCMS workflows the easy way --- class: center, middle, inverse # e**mz**ed --- class: center, middle, inverse # e **m/z** ed --- class: center, middle, inverse # **m/z** is the unit # of x-axis of mass spectra --- class: center, middle, inverse # germans spell "**mz**" as "emzed" --- class: center, middle, inverse # no idea how native speakers pronounce # **emzed** --- class: center, middle, inverse # Why emzed ? --- # Setting the stage - LCMS = Liquid Chromatography Mass Spectroscopy - Mathematicans have coffee machines ... - ... modern biologists and chemists have LCMS machines - LCMS changed a lot in these sciences -- # Software - LCMS devices are progressing very fast - but vendor software is (always) behind or very simple - There is a big demand for software - Every researcher has special needs - "one for all" software unfeasible --- # Software landscape in year 1 BE (before emzed): - Applications: closed boxed solutions with a GUI, easy to use but rigid - C++ libraries: flexible but hard to use - R libraries: quite flexible but no specific GUI - In the lab: mixture of tools, self written scripts (Matlab, R, Perl), Excel sheets -- # Consequences: - Error prone semiautomatic "workflows" - Results are hardly to reproduce - Good ideas from users never get implemented --- # Software wishlist: - flexible - easy to use - integrative - interactive graphical data inspection tools -- # emzed concepts: - flexible: workflows are Python scripts - easy to use: workflows compose emzed functions - easy to use: matlab like workbench - integrative: bridges to R and OpenMS - good tools for interactive data analysis --- # How did it start ? - Julia Vorholt and Patrick Kiefer asked for assistance in 2012. - Stategic IT fund of DBIOL payed 4 months of work for emzed 1 - Since then incremental improvements - Today: emzed 2.7.0 --- class: center, middle, inverse # About emzed --- class: center, middle # If you start emzed:  This is the emzed workbench. --- # emzed functions overview - I/O: LCMS data formats, CSV - Cherry picked LCMS algorithms - SQL like relational tables - Interactive data inspection - Access to chemical data(bases) - Easy GUI creation for minimalistic workflow frontends - Packaging system for distributing workflows (aka emzed modules) --- class: center, middle, inverse # API Examples --- # Example: I/O + peak picking ````python >>> import emzed >>> data = emzed.io.loadPeakmap("abc.mzML") >>> print len(data) 2332 >>> peaks = emzed.ff.runMetaboFeatureFinder(data, config="std") >>> print len(peaks) 122 ```` --- # Example: Table handling ````python >>> targets = emzed.io.loadCSV("targets.csv") >>> print targets name mf str str ------ ------ water H2O sodium NaCl fullerene C60 cryptonite Kr >>> print emzed.mass.of("H2O") 18.0105650638 >>> targets.addColumn("m0", targets.mf.apply(emzed.mass.of)) >>> print targets name mf m0 str str float ------ ------ ------ water H2O 18.01057 sodium NaCl 57.95862 fullerene C60 720.00000 cryptonite Kr - ```` --- # Example: Table handling continued ````python >>> on_earth = targets.filter(targets.m0.isNotNone()) >>> print on_earth name mf m0 str str float ------ ------ ------ water H2O 18.01057 sodium NaCl 57.95862 fullerene C60 720.00000 >>> hits = peaks.join(on_earth, peaks.mz.approxEqual(on_earth.m0, 1e-3) >>> print hits mz rt name__0 mf__0 m0__0 float float str str float ------ ----- ------ ------ ------ 18.0105 2.21m water H2O 18.01057 >>> emzed.gui.inspect(hits) ```` --- class: center, middle # Result from last command is similar to  --- # Impact - Increased trust of scientists in analysis results. - emzed as a playground for testing new analysis strategies - Several publications from DBIOL using emzed --- class: center, middle, inverse # emzed online # http://emzed.ethz.ch --- # emzed internals: - Python 2.7 - some C (Cython) extensions for speed - some emzed functions support multicore - GUI based on **PyQt** + **guiqwt** - Workbench is patched **Spyder** - R bridge uses **rpype** (stdio/stdout pipes to R subprocess) - Bridge to **OpenMS** uses Cython (pyOpenMS) --- # Plans - Algorithms for merging several measurement modes - Algorithms for MSMS measurements - Refactor wrapped EAWAG algorithms as emzed extensions - Backporting some modules to emzed: pacer, presettr - Faster R bridge (based on pyRserve) - Rewrite table data structures using pandas. --- class: center, middle, inverse # Questions ? --- class: center, middle, inverse # Thanks !