Part III Systems Biology Proteomics Practical

 

 

 

Aims:

i)                    Identification of a protein separated by 1D polyacrylamide gel electrophoresis (PAGE) by peptide mass fingerprinting

ii)                   Identification of a protein from another 1D PAGE band by peptide mass fingerprinting

iii)                 De novo sequencing of peptide for data generated by LC-MSMS

iv)                 Identification of proteins using tandem mass spectrometry data

v)                  Analysis of modification of peptides from mass spectrometry data

vi)                 If time – quantification of in vivo stable isotope labelled proteins

 

Requirements:

 

i)                    Access to MASCOT search engine

ii)                   Access to BLAST sequence search engine

iii)                 Data files

a.      Peptide mass list

b.      MS1 spectrum

c.      .mgf files

d.      MS/MS spectrum plus table of amino acid residue masses

iv)                 Patience!!!

 

You are collaborating with a researcher who works on Circadian rhythm using Drosophila melanogaster as a model organism. Your local proteomics laboratory has been provided with protein extracts from multiple sets of Drosophila harvested at different time points during the day. The researcher is interested in knowing what the changes in protein expression are between different time points and also changes in post translational modification.  Some of the protein extracts come from Drosophila fed on yeast that have been stable isotope labelled in vivo. One set are 15N labelled (yeast grown in the presence of a sole nitrogen source which is 15N rather than 14N). Another set have been fed yeast which are SILAC labelled (yeast where arginine and lysine amino acids within proteins are stable isotope labelled, each amino acid being 6Da heavier than the naturally occurring version of the amino acid).

 

The proteomics lab collected data from the samples using mass spectrometry methods and promptly went away to a conference in Hawaii leaving you in charge of analysing the data. It had been very cold in the lab and the technical staff were looking forward to switching their woollen jumpers for shorts and a tee shirt!! In their hurry to pack their bathing suits and get to the airport on time, they failed to adequately label the files they collected, and not only do you have to identify the proteins, you also have to work out which data set corresponds to each sample.

 

 

Figure 1:

Image of 1D polyacrylamide gel that the samples were taken from.

 

 

 

 

Methods used to in-gel digest proteins to peptides

 

1.       homogenise 10 flies from each set (6 sets in total) in lysis buffer (10mM Tris/HCl pH 8.5, 8M urea, 2% CHAPS detergent, 5mM magnesium acetate).

2.       spin to remove debris

3.       measure protein concentration of supernatant.

4.       apply 100 ml of each sample to a 12% SDS PAGE mini gel in sample loading buffer.

5.       after electrophoresis is complete stain gel with colloidal coomassie total protein stain and destain with distilled water.

6.       wash sample with 50mM ammonium hydrogen carbonate /50% acetonitrile

7.       reduce with DTT

8.       alkylate cysteine residues with iodoacetamide to produce carboxymethylamido cysteine

9.       wash with 50mM ammonium hydrogen carbonate /50% acetonitrile

10.   dehydrate with acetonitrile

11.   add trypsin (125ng/40ml)

12.   leave on ice for 30mins.

13.   incubate overnight at 37oC

14.   remove supernatant which contains peptides

 

NB: upon electrophoresis certain amino acids tend to become oxidised. Methionine is particularly susceptible.

 

Unfortunately the notes left by the proteomics lab are sketchy and you know that the three samples are, a control sample which is just homogenised drosophila and then two other samples which are mixtures of in vivo stable isotope labelled flies and control flies collected at different time points, but you don’t know which track corresponds to which set of samples.

 

 

 

Experiment #1

 

The peptides digested from track 1 band A have been subjected to MALDI-ToF mass spectrometry and you have a list of peptide masses. To identify the protein take the major masses (monoisotopic) listed below and search using the following program:

 

 

www.matrixscience.co.uk

 

Ask the demonstrator about monoisotopic and average masses

 

Select MASCOT from the menu at the top of the page and peptide mass fingerprint. Fill in the submission details, thinking carefully about which modifications you have carried out on your sample during the gel running and peptide digestion protocols. Use a peptide tolerance value of 150ppm and select Drosophila (fruit fly) as the taxomony and NCBInr as the database.

 

 

Band A

 

m/z values

 

984.47

1016.50

1181.56

1266.63

1461.74

1687.93

1950.88

2083.01

2367.17

2534.24

 

 

What is the protein ?

 

What are the predicted sequences of the tryptic peptides you have submitted to the search?

 

What % of the total protein do these peptides represent?

 

 

 

Experiment #2

 

The peptides digested from track 1 band B have been subjected to MALDI-ToF mass spectrometry and you have a list of peptide masses. To identify the protein take the major masses (monoisotopic) listed below and search using MASCOT:

 

Band B

 

m/z values 

 

984.47

1016.50

1181.56

1266.63

1461.74

1687.93

1950.88

2083.01

2367.17

2534.24

3253.59

1514.74

1186.56

1131.52

975.44

799.52

1170.56

1160.61

2214.06

1789.88

1052.60

1129.59

1129.59

1134.69

1695.82

1856.95

2797.34

 

 

What do the resulting ‘hits’ represent?

 

 

 

Experiment #3

 

You have tried to identify the protein in B and C using MASCOT submitting the .mgf file from an LC-MS/MS experiment. The results have been very poor and thus in desperation you have tried to de novo sequence a peptide from this run.

 

Figure 2 – possible fragmentation ions of peptides.

 

 

            Deduce the sequence of the peptide from the spectra in Figure 3 and Figure 4:

 

 

 

 

 

 

The easiest way to do this is to assign the y ion series.

 

The spectrum shown was collected using a QTof instrument. The y-ion series is always most prevalent at larger m/z values.

 

The y1 ion from a tryptic peptide should be either arginine or lysine

 

Determine the mass of the two ions you should see for arginine and lysine y1 ions (use table 1)

 

(C= 12, O =16, N=14, H = 1)

 

The charge (z) is most often 2+, therefore the mass of the precursor peptide is obtained by multiplying the m/z value by two and subtracting two (roughly the mass of two protons).

 

What is the C-terminal residue of this peptide?

 

Now assign as many y ion as possible. (hint: start with the m/z value at 773.43 and work in both directions)

 

NB: the following pairs of amino acids are isobaric (same molecular weight)

 

Isoleucine and leucine

Glutamine and lysine

 

NB: assume there are no missed tryptic cleavage sites in the peptide sequences

 

Once you have deduced the sequence (or parts of the sequence) perform a BLAST search on the sequence.

 

http://blast.ncbi.nlm.nih.gov/Blast.cgi

 

 

What is the protein from which this peptide has been generated by trypsinolysis?

 

Why is this protein present in the Drosophila samples?

 

 

 

Table 1.

Amino Acid Information and Mass Prediction


Amino acid

3 letter code

1 letter code

Codons

Composition

monoisotopic residue mass

average residue mass

Alanine

Ala

A

GCT,    GCC GCA,   GCG

C3H5NO

71.037

71.080

Arginine

Arg

R

CGT,    CGC CGA,   CGG AGA,  AGG

C6H12N4O

156.101

156.190

Asparagine

Asn

N

AAT  AAC

C4H6N2O2

114.043

114.104

Aspartic Acid

Asp

D

GAT  GAC

C3H5NO3

115.027

115.090

Cysteine

Cys

C

TGT   TGC

C3H5NOS

103.009

103.144

Glutamic acid

Glu

E

GAA  GAG

C5H7NO3

129.043

129.117

Glutamine

Gln

Q

CAA  CAG

C5H7N2O2

128.059

128.131

Glycine

Gly

G

GGT,   GGC GGA,  GGG

C2H3NO

57.022

57.053

Histidine

His

H

CAT  CAC

C6H7N3O

137.059

137.141

Isoleucine

Ile

I

ATT,  ATC ATA

C6H11NO

113.084

113.159

Leucine

Leu

L

TTA,  TTG CTT,   CTC CTA,  CTG

C6H11NO

113.084

113.159

Lysine

Lys

K

AAA AAG

C6H12N2O

128.095

128.174

Methionine

Met

M

ATG

C5H9NOS

131.041

131.193

Phenylalanine

Phe

F

TTT   TTC

C9H9NO

147.068

147.180

Proline

Pro

P

CCT,   CCC CCA,  CCG

C5H7NO

97.053

97.117

Serine

Ser

S

TCT,   TCC TCA,  TCG AGT,  AGC

C3H5N2O2

87.032

87.079

Threonine

Thr

T

ACT, ACC ACA, ACG

C4H7NO2

101.048

101.105

Tryptophan

Trp

W

TGG

C11H10N2O

186.079

186.213

Tyrosine

Tyr

Y

TAT TAC

C9H9NO2

163.063

163.176

Valine

Val

V

GTT, GTC GTA, GTG

C5H9NO

99.068

99.132

Aspartate or Asparagine

 

B

 

 

 

 

Gluamate or Glutamine

 

Z

 

 

 

 

 

 

 

Experiment #4

 

You have found a .mgf file called bandD_track2.mgf.

 

Using MS/MS Ions Search within Mascot can you identify the proteins within the sample.

 

Match the MS spectra

 

The person who ran your samples on the mass spectrometer sent you some pictures of the mass spectra of your data for you to use in a presentation.

 

The pictures are from both your 15N and SILAC labelling experiments, as well as an earlier experiment which used iTRAQ tagging.

 

iTRAQ tags are isobaric peptide labels which are ‘balanced’ to have the same mass, which means peptides with different tags will have the same peptide ion m/z. During collision induced dissociation, differently weighted ‘reporter’ ions are released by each tag, giving masses of 114.1, 115.1, 116.1 or 117.1 depending on the tag. This allows multiple samples to be analysed in one mass spec run, with the quantitation information in the MS/MS spectra, rather than in the MS spectra as with 15N/SILAC.


Unfortunately, in a pattern that has become depressingly familiar, your contact seems to have been in a great hurry sending the email and has unhelpfully labelled the pictures 1 to 6. The email does list which pictures were sent, but the order doesn’t seem to match up with the file names of the pictures.

 

Can you match each mass spectrum to the corresponding picture title?

 

·         MS of an unlabelled, unmodified peptide

·         MS of an unmodified peptide and the same peptide labelled with the amino acid Lysine-6 (i.e. SILAC-labelled)

·         MS of an unmodified peptide and the same peptide labelled with 15N, at an unknown level of incorporation (less than 100%)

·         MS/MS of an unlabelled peptide

·         MS/MS of an peptide labelled with 4-plex iTRAQ tags (114.1,115.1,116.1,117.1)

·         A close up MS/MS view of the 4 iTRAQ tags.

 

Picture 1:

 

Picture 2:

 

 

            Picture 3:

 

Picture 4:

 

            Picture 5:

 

 

            Picture 6:

 

 

 

15N incorporation estimation

By sifting through previous emails, you were able to work out that the peptide shown in the 15N-labelling spectrum you have just identified was TGAIVDVPVGDELLGR, from an Drosophila ATP synthase subunit protein called 'bellwether'.

 

Using an ion distribution calculation program, you have generated predicted ion distributions for all 15N incorporation percentages for TGAIVDVPVGDELLGR with a 2+ charge from 1 to 100%. These are available here.

Using the predicted ion distribution distributions, estimate the 15N incorporation rate in the labelled partner in the 15N spectrum above.

 

 

 

(suppl. figures)