Test Case Title

Pathway analysis of a poorly annotated but sequenced plant genome

Test Case Acronyme

PlantPathX

Test Case Class

Plants

Contact person

Erik Andreasson - SLU Sweden

Contact

nd

Test Case Description

In order to be able to efficiently analyze large scale ‘–omics data’ (e.g. from microarrays, RNA-seq or quantitative proteomics) and put them in biological context a division of genes into functional categories, such as pathways, is very efficient. For non-model organisms, however, little functional information for individual genes or proteins exists. Instead, researchers have to rely on information derived from sequence identity to better studied model organisms. With the drop in cost for sequencing organisms, the genome sequences of more and more ‘non-model’ are known and subsequently ‘-omics’ data can efficiently be generated as well. These large-scale ‘-omics’-data can be very informative and should ultimately be used together with sequence identity in the annotation effort of the sequenced non-model species. Efficient method/workflows to derive information by sequence identities to model organisms combined with efficient use of existent “-omics” data for the species studied would be desirable for visualisation and analysis. We are studying potato, which genome was sequenced last year. Closely, related to this species is the tomato genome which is expected to be released in 2012.

Background knowledge

When exploring a non-model organism with a sequenced genome, comparison of gene families (e.g. presence or absence and putative number of homologs) is an initial, but crucial step giving a first overview. For plant species PLAZA db provides a good platform for this. Tools for visualisation, incorporation of ‘-omics’ data and network analysis exist in Cytoscape. However, for a biologist a best-practice concept and/or training in using these tools would be helpful. Also an integrated approach could be useful.

Initial state of the Test case

GSII Illumina reads exist for 3 different potato cultivars. In addition, microarray data and quantitative proteomics data from various states exist. Samples are from leaves. Currently, the OrthoMCL clusters generated in conjunction to the publication of the potato genome (Nature 475, 189–195) are used for gene family analysis together with BLAST. A MapMan binning file has been generated based on the potato genome and is publicly available. Visualisation of ‘-omics’ data has been done in a commercial software, QluCore, but this does not handle multiple data types simultaneously well and does not visualise functional pathways.

Desired final state of the Test Case

A good way to easily compare similarities and differences of putative pathways between species and cultivars, and at the same time visualise and explore multiple data types that can be linked to genes and gene products (transcripts and proteins); “multi-omics”.

Test Case Work Plan

nd

Discussion

This type of analysis would be useful for all non-model organisms with a sequenced genome – a category that is rapidly expanding due to lower sequencing costs.

LF: a test case similar to TC12, but for larger plant genomes, with the ability to add and crossvalidate *omics information. Huge amount of work required and still a research field. I would not try to do everything, may be ask them to specify the most important tool they need.