As pointed out by D. Donoho,
An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. This directly applies to high throughput biology data analysis and I strongly believe that being able to reproduce once results and replicate an analysis with new data are essential aspects of the process of doing research.
As such, I regard the development of scientific software and analysis pipelines (including good software and process documentation) that facilitate reproducible research, an important aspect of my scientific activity.
Mass-spectrometry data analysis My work on the design and implementation of reproducible low-level mass-spectometry data analysis pipeline is materialised by the development of MSnbase (see the R/Bioconductor packages page for a short description and the poster and talk sections in publications). A too often neglegted aspect of data analysis is the storage and handling of meta data, which is an essential part of the data structures implemented in MSnbase. An essential goal of this work is to enable to define rigorous and reproducible data analysis pipelines and best practices. The synapter package and the associated publications (see Bond et al. 2013 and Shliaha et al. 2013) addresses MSE label-free quantiation, optionally including ion mobility separation.
MS-based organelle proteomics
In biology, localisation is function: knowledge of the localisation of proteins is of paramount importance to assess and study their function. (Gatto et al., 2010) My work in organelle proteomics is focused on the analysis of quantitative mass-spectrometry data (mainly gradient approaches) to infer sub-cellular localisation of proteins. This work is implemented in the pRoloc package (see the R/Bioconductor packages page for a short description and the poster and talk sections in publications). This framework allows researcher to manage data and meta data (using data structure from MSnbase) as well as organelle marker sets, apply contemporary linear and non-linear machine learning techniques to predict protein-organelle association based in quantitative proteomics data for a review of organelle proteomics designs), easily incorporate data from other organelle proteomics initiatives like the Human Protein Atlas as well as GO annotation terms. As advances in technology enable more sophisticated organelle proteomics experiments to be performed, it is essential that tool kits are created to support scientists in the analysis of large and complex data sets.