University of Dundee

Dr David Martin FRSB FHEA

David explores large biological data sets from next generation sequencing and proteomics, developing new tools to unlock the secrets of life within
Programme Lead (Core Curriculum Levels 1 and 2)
D'Arcy Thompson Unit, School of Life Sciences, University of Dundee, Dundee
Full Telephone: 
+44 (0) 1382 388704, int ext 88704


David Martin's research focused on high throughput analysis of large data sets. One major challenge of modern biology is dealing with the increasing rate at which data is produced, and extracting biological meaning from it. This data needs to be captured, managed, interpreted and represented in ways which allow inferences to be drawn by experimentalists.

David spends most of his time teaching but former projects include the following:

Automated interpretation of post-translational modification by MS-MS

Protein phosphorylation (the post-translational modification of a protein by addition a phosphate ion at a specific point in a polypeptide chain) is a key mechanism that is used by the cell to control processes such as growth and cell death. In collaboration with Professor Mike Ferguson we are performing a global phosphoproteome analysis of the Trypanosoma brucei parasite.

Mass spectrometry can identify proteins through accurate measurement of the mass of peptide fragments. Subsequent fragmentation of specific peptide ions provides a tandem MS (MS-MS) spectrum, allowing the peptide sequence to be determined following database searching with appropriate software such as MASCOT. This software provides a list of matches in the database and highlights the putative position of any post-translational modification. In order to verify the database match, the MS-MS spectrum must be inspected by an expert as the database search algorithms are not designed for post-translational modification identification. However, complex protein mixtures such as whole cell lysates can provide many hundreds of thousands of spectra, manual validation of which is extremely time consuming.

To automate this process, a post-search validation method has been developed. The database match is re-evaluated against a set of criteria which model the expected chemical behaviour of the peptide fragments. The criteria are stringent so the method provides high confidence hits, allowing the expert to assess the harder spectra which are at present unable to be assessed manually. With this methodology we can at present automatically identify approximately half the phosphorylation sites with negligible false positives. There is plenty of potential for further improvement, giving the possibility of rapid, high throughput phosphoproteome scans.

Sequencing Chromosome 4 of the Potato genome

Genome sequencing raises considerable genomic challenges. Integration of genetic and physical map data, mapping of clones and assembly of sequence data require considerable bioinformatic support. In addition, next generation sequencing technologies such as Solexa and 454 have the potential to radically change the approach taken to obtaining novel genomes.

In collaboration with colleagues at the Scottish Crop Research Institute (Dr Glenn Bryan), Teagasc Ireland (Dr Dan Milbourne) and Imperial College London (Dr Gerard Bishop), we will be determining the genomic sequence of potato (Solanum tuberosum) chromosome 4.

The project provides many bioinformatic challenges and opportunities. The strain being sequenced is heterozygous, adding complexity to construction of tiling paths and data integration. There are many strains with different traits which can be mapped to regions of interest, and Potato is closely related to Tomato allowing for direct interspecific comparison. Data at present has been determined using classical Sanger sequencing in a BAC by BAC strategy. We will be exploring the utility of NGS data, both for BAC by BAC approaches and whole genome shotgun assemblies. In addition, new methods and techniques are being developed.

This project is funded by BBSRC grant  BB/F012640/1.

GOtcha - providing qualified functional assignments.

GOtcha [4] is a methodology for rapid functional annotation of gene products. Gene Ontology is an hierarchical description of the function of a gene product. Specific terms are linked to more general terms, providing a tree-like structure. GOtcha makes use of this hierarchy to examine the functional assignment of similar gene products and provide a likelihood for the sequence in question to have that particular function.

GOtcha assignments for an entire genome can be mapped to metabolic pathways, indicating biological processes which may be present or absent in that organism. GOtcha was used in the analysis of the genome data for Plasmodium falciparum (Malaria), Trypanosoma brucei (African Sleeping Sickness), Trypanosoma cruzi (Chagas disease), Leishmania major (Leishmaniasis), and Brugia malayi (elephantiasis).

Kinomer- Detailed classification of protein kinases

In collaboration with Dr Diego Miranda-Saavedra (now at Cambridge University). Application of subgroup specific Hidden Markov Models for classification and identification of protein kinases.

Detailed classification of SNF2-like helicases

In collaboration with Dr Andrew Flaus (now at University of Galway, Ireland). Calculation and application of subgroup specific Hidden Markov Models for classification and identification of SNF2-like [1] chromatin remodellers.


David is part of the core level 1 & 2 teaching team, delivering a core curriculum across all Life Sciences degrees. His particular focus is on numeracy and data literacy, bringing quantitative methodologies into the approach that students take as part of their developing roles as active scientists. Students are encouraged to build up their data analysis skills as a key tool in enabling them to question and evaluate the world around them.

Modules taught:

Core level 1 and 2 modules

BS11005 Introduction to Maths, Physics and Chemistry

BS31003 Molecular Structure and Interactions

BS32010 Applied Bioinformatics

BS32011/2 Bioinformatics practical project

BS42003 Advanced Bioinformatics

Modules managed:

BS21002 The Cell and the Gene

BS22002 Biological Sciences

BS32010 Applied Bioinformatics

BS42003 Advanced Bioinformatics

Key Teaching Achievements:

Nominated for Student Led Teaching Awards 2014

Produced 'Lost in Translation', a board game for 1st year/A2 students to reinforce understanding of DNA transcription and translation

Produced Gigsaw - a teaching tool for understanding Next Generation Sequence analysis

Other roles:

D'Arcy Thompson Unit Divisional Representative for the Division of Plant Sciences.



34. Urbaniak, MD.  Martin, DMA Ferguson, MAJ.  Global Quantitative SILAC Phosphoproteomics Reveals Differential   Phosphorylation Is Widespread between the Procyclic and Bloodstream Form   Lifecycle Stages of Trypanosoma brucei Journal Of Proteome Research 2013  12 2233-2244

33. Nelson, SA.   Li, Z   Newton, IP.   Fraser, D  Milne, RE.   Martin, DMA.   Schiffmann, D,  Yang, X,  Dormann, D,   Weijer, CJ.   Appleton, PL.   Naethke, IS.Tumorigenic fragments of APC cause dominant defects in directional cell  migration in multiple model systems  Disease Models & Mechanisms 2012 5  940-947

32. Martin DMA Gigsaw - physical simulation of next-generation sequencing for education and outreach  EMBnet Jounal 2012 18(1) 28-32

31.  Potato Genome Sequencing Consortium Genome sequence and analysis of the tuber crop potato Nature  2011 475 189-194

30. van Koningsbruggen S, Gierlinski M, Schofield P, Martin DMA,  Barton GJ, Ariyurek Y, den Dunnen JT,  Lamond AI High-resolution whole-genome sequencing reveals that specific chromatin domains from most human chromosomes associate with nucleoli Molecular Biology of the Cell 2010  21(21) 3735-48.

29. Martin DMA, Nett IRE, Vandermoere F, Barber JD, Morrice N and Ferguson MAJ Prophossi: Automating Expert Validation of  Phosphopeptides from Tandem Mass Spectrometry Bioinformatics 2010 26 (17), 2153-2159

28. Nett IRE, Martin DMA, Miranda-Saavedra D, Lamont D, Barber JD, Mehlert A and Ferguson MAJ The phosphoproteome of bloodstream form Trypanonosoma brucei, causative agent of African Sleeping Sickness Molecular and Cellular Proteomics 2008 8(7) 1527-38

27.  Waterhouse AM, Procter JB,  Martin DMA, Clamp M and Barton GJ Jalview Version 2 - a multiple sequence alignment editor and analysis workbench Bioinformatics 2009 25 (9) 1189-1191; doi:10.1093/bioinformatics/btp033

26. Martin DMA, Miranda-Saavedra D and Barton GJ Kinomer v. 1.0:  A database of systematically classified eukaryotic protein kinases  Nucleic Acids Research 2009 37: D244-D250; doi:10.1093/nar/gkn834

25. Towler MC, Fogarty S, Hawley SA, Pan DA, Martin DMA, Morrice NA, McCarthy A, Galardo MN, Meroni SB, Cigorraga SB, Ashworth A, Sakamoto K and Hardie DG A novel short splice variant of the tumour suppressor LKB1 is required for spermiogenesis Biochem. J. 2008 416 1–14; doi:10.1042/BJ20081447

24. Overton IM, van Niekerk CA, Carter LG, Dawson A, Martin DM, Cameron S, McMahon SA, White MF, Hunter WN, Naismith JH, Barton GJ. TarO: a target optimisation system for structural biology. Nucleic Acids Res. 2008 36 W190-W196; doi:10.1093/nar/gkn141

23. Ghedin E, Wang S, Spiro D, Caler E, Zhao Q, Crabtree J, Allen JE, Delcher AL, Guiliano DB, Miranda-Saavedra D, Angiuoli SV, Creasy T, Amedeo P, Haas B, El-Sayed NM, Wortman JR, Feldblyum T, Tallon L, Schatz M, Shumway M, Koo H, Salzberg SL, Schobel S, Pertea M, Pop M, White O, Barton GJ, Carlow CK, Crawford MJ, Daub J, Dimmic MW, Estes CF, Foster JM, Ganatra M, Gregory WF, Johnson NM, Jin J, Komuniecki R, Korf I, Kumar S, Laney S, Li BW, Li W, Lindblom TH, Lustigman S, Ma D, Maina CV, Martin DMA, McCarter JP, McReynolds L, Mitreva M, Nutman TB, Parkinson J, Peregrín-Alvarez JM, Poole C, Ren Q, Saunders L, Sluder AE, Smith K, Stanke M, Unnasch TR, Ware J, Wei AD, Weil G, Williams DJ, Zhang Y, Williams SA, Fraser-Liggett C, Slatko B, Blaxter ML, Scott AL. Draft genome of the filarial nematode parasite Brugia malayi. Science. 2007, 317, 1756-60

22. Flaus A, Martin DMA, Barton GJ, Owen-Hughes T Identification of multiple distinct Snf2 subfamilies with conserved structural motifs. Nucleic Acids Res. 2006, 34, 2887-905.

21: Berriman M, Ghedin E, Hertz-Fowler C, Blandin G, Renauld H, Bartholomeu DC, Lennard NJ, Caler E, Hamlin NE, Haas B, Bohme U, Hannick L, Aslett MA, Shallom J, Marcello L, Hou L, Wickstead B, Alsmark UC, Arrowsmith C, Atkin RJ, Barron AJ, Bringaud F, Brooks K, Carrington M, Cherevach I, Chillingworth TJ, Churcher C, Clark LN, Corton CH, Cronin A, Davies RM, Doggett J, Djikeng A, Feldblyum T, Field MC, Fraser A, Goodhead I, Hance Z, Harper D, Harris BR, Hauser H, Hostetler J, Ivens A, Jagels K, Johnson D, Johnson J, Jones K, Kerhornou AX, Koo H, Larke N, Landfear S, Larkin C, Leech V, Line A, Lord A, Macleod A, Mooney PJ, Moule S, Martin DMA, Morgan GW, Mungall K, Norbertczak H, Ormond D, Pai G, Peacock CS, Peterson J, Quail MA, Rabbinowitsch E, Rajandream MA, Reitter C, Salzberg SL, Sanders M, Schobel S, Sharp S, Simmonds M, Simpson AJ, Tallon L, Turner CM, Tait A, Tivey AR, Van Aken S, Walker D, Wanless D, Wang S, White B, White O, Whitehead S, Woodward J, Wortman J, Adams MD, Embley TM, Gull K, Ullu E, Barry JD, Fairlamb AH, Opperdoes F, Barrell BG, Donelson JE, Hall N, Fraser CM, Melville SE, El-Sayed NM. The genome of the African trypanosome Trypanosoma brucei.  Science 2005, 309 416-22

20:  Byres E, Martin DMA, Hunter WN. A preliminary crystallographic analysis of the putative mevalonate diphosphate decarboxylase from Trypanosoma brucei  Acta Crystallographica 2005, F61 581-584

19:  Hill P, Burford D, Martin DMA, Flavell AJ.  Retrotransposon populations of Vicia species with varying genome size. Molecular Genetics and Genomics 2005, 273 371-81

18:  Martin DM, Berriman M, Barton GJ.  GOtcha: a new method for prediction of protein function assessed by the

annotation of seven genomes. BMC Bioinformatics. 2004, 5 178

17:  Puntervoll P, Linding R, Gemund C, Chabanis-Davidson S, Mattingsdal M,

Cameron S, Martin DM, Ausiello G, Brannetti B, Costantini A, Ferre F, Maselli V,

Via A, Cesareni G, Diella F, Superti-Furga G, Wyrwicz L, Ramu C, McGuigan C,

Gudavalli R, Letunic I, Bork P, Rychlewski L, Kuster B, Helmer-Citterich M,

Hunter WN, Aasland R, Gibson TJ.  ELM server: A new resource for investigating short functional sites in modular

eukaryotic proteins. Nucleic Acids Res. 2003, 31 3625-30.

16:  Martin DM, Hill P, Barton GJ, Flavell AJ.  Visual representation of database search results: the RHIMS Plot.

Bioinformatics. 2003, 19 1037-8.

15:  Iliopoulos I, Tsoka S, Andrade MA, Enright AJ, Carroll M, Poullet P,

Promponas V, Liakopoulos T, Palaios G, Pasquier C, Hamodrakas S, Tamames J,

Yagnik AT, Tramontano A, Devos D, Blaschke C, Valencia A, Brett D, Martin D,

Leroy C, Rigoutsos I, Sander C, Ouzounis CA.  Evaluation of annotation strategies using an entire genome sequence. Bioinformatics. 2003, 19 717-26.

14:  Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain

A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg

SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S,

Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb

AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM,

Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM,

Barrell B. Genome sequence of the human malaria parasite Plasmodium falciparum.

Nature. 2002, 419 498-511.

13:  Martin DM, Wiiger MT, Prydz H.  Tissue factor and biotechnology.

Thromb Res. 1998, 90 1-25.

 12:  Ashton AW, Kemball-Cook G, Johnson DJ, Martin DM, O'Brien DP, Tuddenham EG,

Perkins SJ.  Factor VIIa and the extracellular domains of human tissue factor form a compact

complex: a study by X-ray and neutron solution scattering.

FEBS Lett. 1995, 374 141-6.

11:  Martin DM, Boys CW, Ruf W. Tissue factor: molecular recognition and cofactor function.

FASEB J. 1995, 9 852-9

10:  Ruf W, Kelly CR, Schullek JR, Martin DM, Polikarpov I, Boys CW, Tuddenham

EG, Edgington TS.  Energetic contributions and topographical organization of ligand binding

residues of tissue factor. Biochemistry. 1995, 34 6310-5.

9:  O'Brien DP, Kemball-Cook G, Hutchinson AM, Martin DM, Johnson DJ, Byfield

PG, Takamiya O, Tuddenham EG, McVey JH.  Surface plasmon resonance studies of the interaction between factor VII and tissue factor. Demonstration of defective tissue factor binding in a variant FVII molecule (FVII-R79Q).

Biochemistry. 1994, 33 14162-9.

8:  Harlos K, Martin DM, O'Brien DP, Jones EY, Stuart DI, Polikarpov I, Miller

A, Tuddenham EG, Boys CW.  Crystal structure of the extracellular region of human tissue factor.

Nature. 1994, 370 662-6.

7:  Martin DM, Tuddenham EG.  Activation of factor X by factor VIIa on monocyte cell surfaces.

Blood. 1994, 83 3828-9.

 6:  Martin DM, O'Brien DP, Tuddenham EG, Byfield PG.  Synthesis and characterization of wild-type and variant gamma-carboxyglutamic acid-containing domains of factor VII. Biochemistry. 1993, 32 13949-55.

5:  Boys CW, Miller A, Harlos K, Martin DM, Tuddenham EG, O'Brien DP. Crystallization and preliminary X-ray analysis of human tissue factor extracellular domain. J Mol Biol. 1993, 234 1263-5.

4:  Takamiya O, Kemball-Cook G, Martin DM, Cooper DN, von Felten A, Meili E,

Hann I, Prangnell DR, Lumley H, Tuddenham EG, et al.  Detection of missense mutations by single-strand conformational polymorphism (SSCP) analysis in five dysfunctional variants of coagulation factor VII.

Hum Mol Genet. 1993, 2 1355-9.

3:  O'Brien DP, Anderson JS, Martin DM, Byfield PG, Tuddenham EG.  Structural requirements for the interaction between tissue factor and factor VII: characterization of chymotrypsin-derived tissue factor polypeptides.

Biochem J. 1993, 292  7-12.

2:  Cooke RM, Carter BG, Martin DM, Murray-Rust P, Weir MP.  Nuclear magnetic resonance studies of the snake toxin echistatin. 1H resonance assignments and secondary structure. Eur J Biochem. 1991, 202 323-8.

1:  Gould H, Sutton B, Beavil A, Edmeades R, Martin D.  Immunoglobulin E receptors. Clin Exp Allergy. 1991, 21 138-47




As well as the research tools listed above, my key impacts have been:

Developing novel teaching tools that have been very well received.

Writing the user manual for key bioinformatics software


Why I Teach

The best thing in life is discovering things that you never knew before. Every day is a learning day. As a researcher it was and is my priviledge to discover things that nobody has ever known about how the world around us is made and works. But that is magnified if you can share it. Teaching is a key component of research, and it is a priviledge to be able to open the eyes of the next generation to see the world around them in new ways. In Dundee we have great students and it is a pleasure to teach and train them to change the world.

What I Teach

Statistics/Data analysis:

We use Rstudio as the core technology for all our data analysis and statistical analysis. Reproducibilty is key to effective science so students are encouraged to include their R scripts in project reports and analyses so that errors or misunderstandings can be cleared up, or so we can learn new ways of doing things when they have found something really cool.

Techniques and Tools

Wherever possible i like students to perform real research instead of 'toy' practicals. We should contribute to knowledge as we learn.

We use Jalview extensively as a workbench for sequence analysis and comparison form the beginning of our course. In addition chimera is used for protein visualisation. Where possible, all the data analysis and visualisation softwre is cross platform and freely available - tools and knowledge the students cna take with them.

I introduce students to a taster of Python programming in year 2. For thiose keen to pursue this further, the bioinformatics 3rd year modules extend skills and abilities in Python and R  applied to current activities including next generation sequencing to analyse gene expression. In fourth year we broaden knowledge by looking at application of bioinformatics algorithms across different subject areas.


Student Projects:

I supervise a number of students in their honours year project. This can be a  lab project where we apply bioinformatics to a real problem and maybe test our hypotheses in the lab, or science communication where students develop and deliver educational activities to an appropriate audience.

Selected recent student projects: 

  • Identification and validation of novel microsatellite markers in the Soprano Pipistrelle Pipistrellus pygmaeus
  • A card game to educate on crop development and GMO technology

Where I Come From

I grew up in London and studied Chemistry with Biochemistry at Kings' College London.

During my degree I took an industrial year with Glaxo where I worked on HIV protease and snake venom proteins. This led to an interest in protein structure and function and a PhD at the MRC Clinical Research Centre in the group of Professor Ted Tuddenham working on blood coagulation proteins.

After completing my PhD I moved to the Biotechnology Centre at the University of Oslo, discovered that practical molecular biology was probably not my strong point and transferred to bioinformatics, working on the EU GeneQuiz project. This was my first experience of a multinational EU project.

After a short period as manager of the Norwegian EMBnet node I moved to Dundee in 2001, working in a number of different projects. In 2013 I moved across campus to take up a full time position in Learning and Teaching with the aim of developing the data literacy (numeracy/stats/bioinformatics) aspect of the curriculum to the level required for a modern life scientist.