About [ Contact, Education/positions, Publications ] ● Research [ Current and past research, R/Bioconductor ]

This page is being deprecated. You will be redirected to my new research group page.


msGUI is a simple graphical user interface to visualise and browse raw mass spectrometry data. It supporst data in mzData, mzXML and mzML via the mzR infrastrucutre. It allows users to navigate MS1 and/or higher level spectra as well as displaying (extracted) ion chromatograms.

RforProteomics illustrates the usage of several R/Bioconductor packages to analyse proteomics data. It provides all the code to reproduce the examples and figures of the Using R and Bioconductor for proteomics data analysis manuscript (submitted). The package is available from the Bioconductor project.

MSnbase This Bioconductor package (release or devel, which includes many more features) aims at providing a reproducible research framework to the proteomics community. It should allow researcher to easily mine MS(MS) data, its statistical properties and visually display these. Some underlying structures and use cases are presented in these poster (BSPR/EBI conference, July 2010) and slides (Bioconductor Developer Meeting, November 2010).
MSnbase includes a demo vignette that illustrates it's usage with dummy data included with the package. The vignette is a good starting point to learn how to use MSnbase and can be accessed with vignette("MSnbase-demo",package="MSnbase"). There is also a document that describes the underlying classes. See the package documentation ?MSnbase and references therein for more details.

synapter is a Bioconductor package to analyse MSE data. It allows to optimise identification and quantitation accuracy by tranfering identifications between multiple runs. An example html report is avaiable here. Several experimental designs, combining runs with or without ion mobilit separation and a complete workflow up to statistical analysis of a complete dataset are described in the vignette.

mzR is a Bioconductor package that provides a unified API to the common file formats and parsers available for mass spectrometry data. It comes with a wrapper for the ISB random acces parser for mzXML, mzData and mzML files. The package contains the original code written by the ISB, and a subset of the proteowizard library for mzML. The netCDF reading code has previously been used in XCMS. mzR is a joint effort between Bernd Fischer (EMBL), Steffen Neumann (IBP Halle) and Laurent Gatto (University of Cambridge).

mzID is a Bioconductor package developed by Thomas Dybdal Pedersen and Vladislav A Petyuk with support from Laurent Gatto that parses mzIdentML files using the XML package. It supports versions 1.0 and 1.1 of the mzIdentML standard.

RpepXML is an R package that allows to import and manipulate generic MSMS identification data saved as pepXML files (see also here for an browsable scheme). It is hosted by R-Forge. Installation and getting started information can be found on the project home page.

hpar is a small R package that distributes data from the Human Protein Atlas and allow to write simple queries against it.

rpx is a Bioconductor package to access and download proteomics data from the ProteomeXchange repository.

More details

See my and the CPU GitHub pages for a complete overview of my programming activites.

Organelle proteomics

pRoloc is a Bioconductor package for the analysis and interpretation of quantitative organelle MS data from a typical gradient approach (Gatto et al., Proteomics 2010) using the methods described in the literature (protein correlation profiling, partial least square discriminant analysis, support vector machines) as well as novel approaches to asses sub-cellular localisation. It relies on MSnbase's data structures to store quantitative proteomics data and the associated meta data. Published data sets are available in pRolocdata experiment package. An interactive visualisation graphical user pRolocGUI interface is currently under development.


rols is a Bioconductor package providing an interface to EBI's Ontology Lookup Service (OLS). It allows to programmatically query 88 ontologies through OLS' SOAP webservice.


sequences is a R package used in the frame of the Advanced R programming and development course. It implements generic sequences as a top virtual class and specific biological (DNA and RNA) sequences.

Tips'n trick page, fancy blog and package with many links to external resources.

Cambridge R user group

CambR As R enthusiasts, we are eager to bring useRs of Cambridge and surroundings together to share their experiences using R. We are aiming at organising meetings in Cambridge, including talks, mini workshops, data analysis sessions,... with an emphasis on interaction among participants. We have set up a CambR google group to discuss any matter related to the CambR group and meetings or any other aspect related to R that you may want to share. Please feel free to register or get in contact with one of the organisers.


yaqcaffy is a Bioconductor package for the quality analysis of Affymetrix Expression GeneChips. The library basically does 2 things: (1) quality control analyses and (2) comparison of in-house MAQC reference RNAs with (a subset) of the MAQC reference datasets (depends on the MAQCsubsetAFX package). See the vignette for more details. This package has been written and tested in the frame of my bioinformatician position at DNAVision.

MAQCsubset MAQCsubsetAFX and MAQCsubsetILM are Bioconductor data packages. They include one MAQC data file (raw CEL files in case of MAQCsubsetAFX) out of 5 replicates par test site. These two packages have been written and tested in the frame of my bioinformatician position at DNAVision.