Organelle Proteomics Data Analysis Workshop

Prediction is very difficult, especially about the future. Niels Bohr

When: Mon 12 - Tue 13 November 2012
Where: University Centre, Meade Room (1st floor), Cambridge, UK

Table of Contents

1 Introduction

We would like to announce 2 days workshop dedicated to the analysis of organelle proteomics data, in which we will demonstrate the tools we have been developing at the Cambridge Centre for Proteomics (CCP) over the years.

The programme will include a short introduction to R, instructions on how to import organelle proteomics data into R and how to apply contemporary machine learning algorithms to organelle proteomics data as well as a few lectures. It will mostly be hands-on, with direct support and many opportunities for questions and interaction. The goal of the workshop is that, at the end of the 2 days, you will be autonomous to repeat the pipelines. We would also encourage participants to bring their own organelle proteomics data, including gradient based approaches (LOPIT, PCP), subtractive proteomics, … or any data including label-free and labelled quantification quantitation of multiple distinctly enriched fractions.

2 Tentative programme

2.1 Monday 12 November 2012

  • 9:00 Warming up: registration, first cup of coffee/tea, unpacking the laptops.
  • 9:30 [Hands on - 45min] R help desk - R and specialised R packages installation (LG and LMS).
  • 10:15 [Lecture - 30 min] Introduction to machine learning (ML) (1/3) (TB)
    • Philosophy of ML, how to represent a data set, difference between (un)/(semi)-supervised, intuition of kernel matrix, …
  • 10:50 [Lecture] Organelle proteomics
    • [30 min] Experimental designs (KSL)
    • [30 min] Relevant points from a data analysis point of view (LG, KSL)
  • 12:15 Lunch in the University Centre main dining hall
  • 13:30 [Hands on - 30min] Why R and mini crash course to R. (LG)
  • 14:05 [Lecture] Introduction to machine learning (2/3 and 3/3) (TB)
    • [30 min] Various algorithms (kNN, SVM, …) and dimensionality reduction (PCA)
    • [30 min] Methods and global pipeline analysis (accuracy metrics, CV, (hyper)-parameter selection, …)
  • 15:15 [Hands on - 60min] Introduction to R (LG)
  • 16:30 [Hands on - 60min] Relevant R data structure for organelle proteomics data analysis. (LG)
  • 17:30 End of Day 1.
  • 19:30 Social dinner at Trinity Hall College. More details about the venue.

2.2 Tuesday 13 November 2012

  • 9:00 Warming up: questions/clarifications about Monday's material.
  • 9:30 [Hands on - 60min] Data visualisation and quality control. (LG)
  • 10:35 [Hands on - 90min] Unsupervised and supervised ML on organelle proteomics data. (LG)
  • 12:15 Lunch in the University Centre main dining hall
  • 13:30 [Lecture - 60min] Semi-supervised machine learning and the PhenoDisco algorithm. (LMS)
  • 14:30 [Hands on - 45min] The PhenoDisco algorithm in pRoloc. (LG and LMS)
  • 15:15 Wrap up (all)
  • 15:30 End of the workshop

Q&A sessions throughout the workshop.

Note on coffee/tea: I would hope we could have coffee throughout the workshop so that people could get a drink whenever they feel to. In addition, the informal format and extra time between the sessions should accommodate opportunities for short breaks/Q&As/coffee/chats.

Data: For the hands on sessions, real example data sets will be provided with the dedicated software packages themselves. Participant will also have the opportunity to apply the methods on their own data sets. You will need to have a spreadsheet with quantitation data for proteins, peptides of individual spectra in multiple fractions. In addition, it is essential to prepare a set of labelled marker proteins, that represent well-defined feature clusters and defined the organelles of interest. Please get in touch with us in advance to prepare the data and relevant meta-data.

Prerequisites for the workshop are knowledge of organelle proteomics; no R or machine learning knowledge or experience is required. Participants are expected to bring their own laptop with R installed. In case of difficulties to install R, help will be provided at the beginning of the workshop.

Instructors: Kathryn Lilley (KSL), Laurent Gatto (LG), Lisa Simpson (LMS), Thomas Burger (TB).

3 Useful information

  • The main R web page
  • Although you can run R in a native dos console or unix terminal, it is often convenient to have some integration between R and a source code editor. Unless you already have an editor of choice, we recommend RStudio.
  • For those that would want to do a bit a reading before the workshop (totally optional), you can familiarise yourself with R by reading some of the introductory material available on the R manual page, especially the Introduction to R (html, pdf) or R for beginners (pdf) from the Contributed Documentation.
  • One of the most important things in Bioinformatics and Computational Biology are proper data containers, that will store the data itself and the associated meta-data across analysis iterations. We will extensively make use such a dedicated container, called MSnSet, defined in the MSnbase package. You may have a look at section 6 the package vignette (pdf) (again optional, as this will be covered in the workshop).
  • You will need a recent version of R. The latest stable version is 2.15.2. You may also install the development version (2.16.0). If you have problems, do not hesitate to get in touch. The R help desk session will make sure that everyone has all the software installed.
  • You can also already install the dependencies by running an installation script. Open R and line below and all dependencies will be installed/updated accordingly. Note that is it essential to have a recent R installed (2.15.2 or 2.16.0). Do not hesitate to get in touch in case of issues.

4 Accommodations

Here is short list of hotels/B&B. A (x) indicates that I (Laurent) have been there once in the 2009.

The Travelodge is the furthers - about 15 minutes by foot. All the other ones are close to very close (as in down the road).

5 Getting to Cambridge

  • Stansted airport + train to Cambridge.
  • Eurostar to St Pancras + train from King's Cross (across the road) to Cambridge.
  • Getting to Cambridge - advises from the University page.

6 Contact

If you are interested in participating or would like not to be contacted any more about this event, or have any other comment, please contact Laurent <>.

Laurent Gatto, Lisa Simpson and Kathryn Lilley
Cambridge Centre for Proteomics (CCP)

Date: 2012-11-13 13:22:26 GMT

Author: Laurent Gatto

Org version 7.8.11 with Emacs version 24

Validate XHTML 1.0