public:hack4 – AllBio

This is an old revision of the document!

Hack-a-thon session #4 for Test Cases #9 and #12

19 - 21 September 2013 in Amsterdam, the Netherlands

In September we will organise the fourth hack-a-thon. This hack-a-thon will take place in Amsterdam at SARA.

Go to Google maps. Find Watergraafsmeer in Amsterdam and in Watergraafsmeer the Science Park. In the top right corner of the triangular half of the science park you find a boomerang-shaped building. That's it.

SARA and the eScience Centre share the building. Google places SARA at the location where we will get a room for the hack-a-thon, and if you search for eScience Centre Amsterdam you find the building where the eScience centre (who will help us too) and SARA (who will officially host us) are housed.

Hotel CASA 400 Eerste Ringdijkstraat 4 1097 BC Amsterdam

It will start Thursday Sept 19 at 12:00 with lunch in the hotel. Oren will take everybody to SARA after lunch.

We will end Saturday with lunch (around 13).

Provisional program

Wednesday 18 September
arrival of 1st participants

Thursday 19 September
morning: arrival of participants
~12:00: lunch at CASA hotel
~13:00: walk to SARA
~13:30: hackathon part 1 (details to be completed)
~19:00: return to CASA hotel
20:00: dinner at CASA hotel

Friday 20 September
early morning: breakfast at CASA hotel
8:45: walk to SARA
9:15: hackathon part 2 (details to be completed)
~13:00: lunch at SARA
~14:00: hackathon part 3 (details to be completed)
~19:30: return to CASA hotel
~20h45: dinner at Restaurant Merkelbach (Middenweg 72)

Saturday 21 September
early morning: breakfast at CASA hotel
8:45: check out
9:00: hackathon summary and perspectives at CASA hotel (meeting room)
~12:00: lunch at CASA hotel
~13:00: end of the meeting

TC Leader: Oren Tzfadia and Erik Alexandersson

Participants:
Monika Brandt
Agnieszka Danek (Silesian University of Technology, Gliwice, Poland)
Estelle Wera (SLU)
Itziar Frades (SLU)
Didi Amar (Weizmann Institute)
Tatyana Goldberg (Technische Universitaet Muenchen)
Erik Alexandersson (SLU)
Oren Tzfadia (Weizmann Institute)
Sanjeev Kumar Sharma (James Hutton Institut)
Gregoire Rossier (SIB)

Overview

For non-model organisms, genes predicted in the sequenced genome are relatively poorly functionally annotated. Instead, researchers have to rely on information derived from sequence identity to model organisms.

We aim to gather information from several complete sequenced plant genomes and bind/modify existing tools and pipelines for efficiently analyzing large scale ‘-omics data’. By that, we seek to generate a robust and automated framework to assign genes into functional categories, and classify them in a biological context such as biological pathways.

Another challenge is to compare, link and annotate transcript sequences derived from RNA-seq of not yet sequenced genomes with already sequenced genomes. The potato genome was sequenced last year and it is time to ripe for genome comparisons, function assignment of genes, transcriptome and proteomics analysis. During the sequencing project, the potato genome consortium run into several problems, due to sequence heterogeneity and eventually genome assembly could only be successfully done based on a homozygous doubled-monoploid potato clone (S. tuberosum group Phureja). The genome structure of this clone differs greatly from the cultivars that are commonly studied, i.e. crop potato cultivars grown for food or as starch for industrial use.

Currently, the OrthoMCL clusters are used for gene family analysis together with BLAST. Visualisation of ‘-omics’ data has been done in a commercial software, QluCore, but this does not handle multiple data types simultaneously well and does not visualise functional pathways. Gene predictions were done ab initio with parameters trained for A. thaliana and also based on sequence similarity with four other plant genomes. Functional annotation of predicted genes was done by identifying orthologous and paralogous gene families in 12 sequenced plant species by OrthoMCL.

Available data: RNA-seq (GSII Illumina pair.end reads) exist for 3 different potato cultivars and 3 wild potato species. In addition, gene expression data (Agilent microarray based on the 3.4 version of the potato genome) and secretome quantitative proteomics data from various states exist (all samples are from leaves).

Rough Sketch of the Plan before hack-a-thon:

Share and review project “blue print” by all Hack-a-thiners.
Data collection.
Building virtual “warehouse” of relevant existing tools.
Comprehensive related literature survey.
Data pre-processing and protocols set up.
Simulations of “dry” (practice) runs for optimizing “wet” (real) runs to be performed at the time of the hack-a-thon meeting in Amsterdam.
Define computational needs (computer clusters, memory usage, and required packages installments).

During the hack-a-thon, we would like to test and evaluate ~5-6 annotations pipelines: - Trinotate - Blast2GO - PotatoCyc - OrthoMCL / InParanoid - Phytozome - KEGG

Two things are CRUCIAL for us to complete BEFORE going to Amsterdam: 1) Collecting outputs from each pipeline. 2) Parse the output of each pipeline to get a tab delimited file that looks like so: GeneID (Potato ID) Term (GO term ID OR E.C ID) Score (if available p-value/e-value).

public/hack4.1379402121.txt.gz · Last modified: 2019/02/12 09:04 (external edit)