Differences

This shows you the differences between two versions of the page.

--- public:hack4 [2013/08/26 12:07] – andreas
+++ public:hack4 [2019/02/12 09:04] (current) – external edit 127.0.0.1
@@ Line 1: / Line 1: @@
-====Hackathon session #4 for Test Cases #9 and #12====
+====Hack-a-thon session #4 for Test Cases #9 and #12====
+==19 - 21 September 2013 in Amsterdam, the Netherlands==
+\\
 [[public:loadedtestcases:tc9|Test Case #9]]
 [[public:loadedtestcases:tc12|Test Case #12]]
-TC Leader: Oren Tzfadia and Erik Alexandersson
+{{:public:datacenter.jpeg |}}In September we will organise the fourth hack-a-thon. This hack-a-thon will take place in Amsterdam at SARA.
+Go to Google maps. Find Watergraafsmeer in Amsterdam and in Watergraafsmeer the Science Park. In the top right corner of the triangular half of the science park you find a boomerang-shaped building. That's it.
+SARA and the eScience Centre share the building. Google places SARA at the location where we will get a room for the hack-a-thon, and if you search for eScience Centre Amsterdam you find the building where the eScience centre (who will help us too) and SARA (who will officially host us) are housed.
+[[http://www.hotelcasa400.nl|Hotel CASA 400]]
+Eerste Ringdijkstraat 4
+BC Amsterdam
+TC Leaders: Oren Tzfadia and Erik Alexandersson
+**Participants:**\\
+Monika Brandt\\
+Agnieszka Danek (Silesian University of Technology, Gliwice, Poland)\\
+Estelle Wera (SLU)\\
+Itziar Frades (SLU)\\
+Didi Amar (Weizmann Institute)\\
+Tatyana Goldberg (Technische Universitaet Muenchen)\\
+Erik Alexandersson (SLU)\\
+Oren Tzfadia (Weizmann Institute)\\
+Sanjeev Kumar Sharma (James Hutton Institut)\\
+Gregoire Rossier (SIB)
+It will start Thursday Sept 19 at 12:00 at SARA.
+We will end Saturday with lunch (around 13).
+**Provisional program, schedule is flexible**
+**
+//Wednesday 18 September//**\\
+Arrival of first participants\\
+**//Thursday 19 September//**\\
+morning:                 Arrival of participants\\
+:00:                   Lunch at SARA\\
+:45:                   Welcome and AllBio project overview (Greg)\\
+**13:00 - 18:00**            **Hackathon part 1**\\
+:00:                   Introductory slides (Oren)\\
+:30:                   Overview of the test case (Erik)\\
+:45:                   Overview of the data sets, pre-processing and validation scheme (Didi, Oren)\\
+:20:                   Coffee break\\
+:35:                   Organize all data set on computers and SARA server. 1st round or pre-processing runs. In parallel, preparation of ‘gold standard’ discussion (Sanjeev, Erik)\\
+:15:                   Round table discussion on Potato metabolic genes as ‘gold standard’ for the annotation validation schemes (Sanjeev, Erik)\\
+:45:                   1st round of the validation scheme scripts runs\\
+:45:                   Return to CASA hotel\\
+:15:                   Day 1 summary, write a wrap-up mini report (at the hotel)\\
+:00:                   Dinner at CASA hotel\\
+**//Friday 20 September//**\\
+<8:30:                   Breakfast at CASA hotel\\
+:30:                    Taxi to SARA\\
+**9:00 - 12:45:**            **Hackathon part 2**\\
+:00:                    Refine scripts after analysis of 1st round results. Re-running the validation score. Define ‘stable’ scores and scoring schemes\\
+:45:                   Coffee break\\
+:00:                   Round table discussion – preparing for writing the summary report and 		delineating bullet points for a manuscript road map\\
+:45:                   Lunch at SARA\\
+**13:30 - 18:00:**            **Hackathon part 3**\\
+:30:                   Split to task sub-groups and run specific needed computational/biological analyses\\
+:00:                   Video conference (skype) with Kate Dreher from USA. Coffee provided\\
+:00:                   Continue running computational tasks and summarizing results\\
+:00:                   Taxi to CASA hotel\\
+:30 - 20:30            Free slot if needed\\
+:45:                   Walk for dinner at [[http://www.huizefrankendael.nl/en/home/|Restaurant Merkelbach]] (Middenweg 72)\\
+//**Saturday 21 September**//\\
+<8:45:                   Breakfast at CASA hotel\\
+:45:                    Check out\\
+**9:00 - 12:00:**            **Hackathon part 4** (hotel meeting room)\\
+:00:                    Recap video conference with Kate\\
+:45:                    Hackathon summary and perspectives\\
+:00:                   Lunch at CASA hotel\\
+:00:                   End of the event\\
+**Overview**
+For non-model organisms, genes predicted in the sequenced genome are relatively poorly functionally annotated. Instead, researchers have to rely on information derived from sequence identity to model organisms.
+We aim to gather information from several complete sequenced plant genomes and bind/modify existing tools and pipelines for efficiently analyzing large scale ‘-omics data’. By that, we seek to generate a robust and automated framework to assign genes into functional categories, and classify them in a biological context such as biological pathways.
+Another challenge is to compare, link and annotate transcript sequences derived from RNA-seq of not yet sequenced genomes with already sequenced genomes. The potato genome was sequenced last year and it is time to ripe for genome comparisons, function assignment of genes, transcriptome and proteomics analysis. During the sequencing project, the potato genome consortium run into several problems, due to sequence heterogeneity and eventually genome assembly could only be successfully done based on a homozygous doubled-monoploid potato clone (S. tuberosum group Phureja). The genome structure of this clone differs greatly from the cultivars that are commonly studied, i.e. crop potato cultivars grown for food or as starch for industrial use.
+Currently, the OrthoMCL clusters are used for gene family analysis together with BLAST. Visualisation of ‘-omics’ data has been done in a commercial software, QluCore, but this does not handle multiple data types simultaneously well and does not visualise functional pathways. Gene predictions were done ab initio with parameters trained for A. thaliana and also based on sequence similarity with four other plant genomes. Functional annotation of predicted genes was done by identifying orthologous and paralogous gene families in 12 sequenced plant species by OrthoMCL.
+Available data: RNA-seq (GSII Illumina pair.end reads) exist for 3 different potato cultivars and 3 wild potato species. In addition, gene expression data (Agilent microarray based on the 3.4 version of the potato genome) and secretome quantitative proteomics data from various states exist (all samples are from leaves).
+**Rough Sketch of the Plan before hack-a-thon:**
+  * Share and review project “blue print” by all Hack-a-thiners.
+  * Data collection.
+  * Building virtual “warehouse” of relevant existing tools.
+  * Comprehensive related literature survey.
+  * Data pre-processing and protocols set up.
+  * Simulations of “dry” (practice) runs for optimizing “wet” (real) runs to be performed at the time of the hack-a-thon meeting in Amsterdam.
+  * Define computational needs (computer clusters, memory usage, and required packages installments).
+During the hack-a-thon, we would like to test and evaluate ~5-6 annotations pipelines:
+- Trinotate
+- Blast2GO
+- PotatoCyc
+- OrthoMCL / InParanoid
+- Phytozome
+- KEGG
+Two things are CRUCIAL for us to complete BEFORE going to Amsterdam: 1) Collecting outputs from each pipeline. 2) Parse the output of each pipeline to get a tab delimited file that looks like so: GeneID (Potato ID) Term (GO term ID OR E.C ID) Score (if available p-value/e-value).

public/hack4.1377518841.txt.gz · Last modified: 2019/02/12 09:04 (external edit)