Differences

This shows you the differences between two versions of the page.

--- public:hack4 [2013/08/28 14:11] – andreas
+++ public:hack4 [2019/02/12 09:04] (current) – external edit 127.0.0.1
@@ Line 20: / Line 20: @@
 BC Amsterdam
+TC Leaders: Oren Tzfadia and Erik Alexandersson
-It will start Thursday Sept 19 at 12:00 with lunch in the hotel. Oren will take everybody to SARA after lunch.
+**Participants:**\\
+Monika Brandt\\
+Agnieszka Danek (Silesian University of Technology, Gliwice, Poland)\\
+Estelle Wera (SLU)\\
+Itziar Frades (SLU)\\
+Didi Amar (Weizmann Institute)\\
+Tatyana Goldberg (Technische Universitaet Muenchen)\\
+Erik Alexandersson (SLU)\\
+Oren Tzfadia (Weizmann Institute)\\
+Sanjeev Kumar Sharma (James Hutton Institut)\\
+Gregoire Rossier (SIB)
+It will start Thursday Sept 19 at 12:00 at SARA.
 We will end Saturday with lunch (around 13).
-TC Leader: Oren Tzfadia and Erik Alexandersson
+**Provisional program, schedule is flexible**
-Participants:
+**
-Monika Brandt
+//Wednesday 18 September//**\\
-Agnieszka Danek (Jagiellonian University)
+Arrival of first participants\\
-Estelle Wera (SLU)
-Itziar Frades (SLU)
+**//Thursday 19 September//**\\
-Didi Amar (Weizmann Institute)
+morning:                 Arrival of participants\\
-Tatyana Goldberg (Technische Universitaet Muenchen)
+:00:                   Lunch at SARA\\
-Erik Alexandersson (SLU)
+:45:                   Welcome and AllBio project overview (Greg)\\
-Oren Tzfadia (Weizmann Institute)
+**13:00 - 18:00**            **Hackathon part 1**\\
-Sanjeev Kumar Sharma (James Hutton Institut)
+:00:                   Introductory slides (Oren)\\
-Gregoire Rossier (SIB)
+:30:                   Overview of the test case (Erik)\\
+:45:                   Overview of the data sets, pre-processing and validation scheme (Didi, Oren)\\
+:20:                   Coffee break\\
+:35:                   Organize all data set on computers and SARA server. 1st round or pre-processing runs. In parallel, preparation of ‘gold standard’ discussion (Sanjeev, Erik)\\
+:15:                   Round table discussion on Potato metabolic genes as ‘gold standard’ for the annotation validation schemes (Sanjeev, Erik)\\
+:45:                   1st round of the validation scheme scripts runs\\
+:45:                   Return to CASA hotel\\
+:15:                   Day 1 summary, write a wrap-up mini report (at the hotel)\\
+:00:                   Dinner at CASA hotel\\
+**//Friday 20 September//**\\
+<8:30:                   Breakfast at CASA hotel\\
+:30:                    Taxi to SARA\\
+**9:00 - 12:45:**            **Hackathon part 2**\\
+:00:                    Refine scripts after analysis of 1st round results. Re-running the validation score. Define ‘stable’ scores and scoring schemes\\
+:45:                   Coffee break\\
+:00:                   Round table discussion – preparing for writing the summary report and 		delineating bullet points for a manuscript road map\\
+:45:                   Lunch at SARA\\
+**13:30 - 18:00:**            **Hackathon part 3**\\
+:30:                   Split to task sub-groups and run specific needed computational/biological analyses\\
+:00:                   Video conference (skype) with Kate Dreher from USA. Coffee provided\\
+:00:                   Continue running computational tasks and summarizing results\\
+:00:                   Taxi to CASA hotel\\
+:30 - 20:30            Free slot if needed\\
+:45:                   Walk for dinner at [[http://www.huizefrankendael.nl/en/home/|Restaurant Merkelbach]] (Middenweg 72)\\
+//**Saturday 21 September**//\\
+<8:45:                   Breakfast at CASA hotel\\
+:45:                    Check out\\
+**9:00 - 12:00:**            **Hackathon part 4** (hotel meeting room)\\
+:00:                    Recap video conference with Kate\\
+:45:                    Hackathon summary and perspectives\\
+:00:                   Lunch at CASA hotel\\
+:00:                   End of the event\\
+**Overview**
+For non-model organisms, genes predicted in the sequenced genome are relatively poorly functionally annotated. Instead, researchers have to rely on information derived from sequence identity to model organisms.
+We aim to gather information from several complete sequenced plant genomes and bind/modify existing tools and pipelines for efficiently analyzing large scale ‘-omics data’. By that, we seek to generate a robust and automated framework to assign genes into functional categories, and classify them in a biological context such as biological pathways.
+Another challenge is to compare, link and annotate transcript sequences derived from RNA-seq of not yet sequenced genomes with already sequenced genomes. The potato genome was sequenced last year and it is time to ripe for genome comparisons, function assignment of genes, transcriptome and proteomics analysis. During the sequencing project, the potato genome consortium run into several problems, due to sequence heterogeneity and eventually genome assembly could only be successfully done based on a homozygous doubled-monoploid potato clone (S. tuberosum group Phureja). The genome structure of this clone differs greatly from the cultivars that are commonly studied, i.e. crop potato cultivars grown for food or as starch for industrial use.
+Currently, the OrthoMCL clusters are used for gene family analysis together with BLAST. Visualisation of ‘-omics’ data has been done in a commercial software, QluCore, but this does not handle multiple data types simultaneously well and does not visualise functional pathways. Gene predictions were done ab initio with parameters trained for A. thaliana and also based on sequence similarity with four other plant genomes. Functional annotation of predicted genes was done by identifying orthologous and paralogous gene families in 12 sequenced plant species by OrthoMCL.
+Available data: RNA-seq (GSII Illumina pair.end reads) exist for 3 different potato cultivars and 3 wild potato species. In addition, gene expression data (Agilent microarray based on the 3.4 version of the potato genome) and secretome quantitative proteomics data from various states exist (all samples are from leaves).
+**Rough Sketch of the Plan before hack-a-thon:**
+  * Share and review project “blue print” by all Hack-a-thiners.
+  * Data collection.
+  * Building virtual “warehouse” of relevant existing tools.
+  * Comprehensive related literature survey.
+  * Data pre-processing and protocols set up.
+  * Simulations of “dry” (practice) runs for optimizing “wet” (real) runs to be performed at the time of the hack-a-thon meeting in Amsterdam.
+  * Define computational needs (computer clusters, memory usage, and required packages installments).
+During the hack-a-thon, we would like to test and evaluate ~5-6 annotations pipelines:
+- Trinotate
+- Blast2GO
+- PotatoCyc
+- OrthoMCL / InParanoid
+- Phytozome
+- KEGG
+Two things are CRUCIAL for us to complete BEFORE going to Amsterdam: 1) Collecting outputs from each pipeline. 2) Parse the output of each pipeline to get a tab delimited file that looks like so: GeneID (Potato ID) Term (GO term ID OR E.C ID) Score (if available p-value/e-value).

public/hack4.1377699105.txt.gz · Last modified: 2019/02/12 09:04 (external edit)