Day 1 | Day 2 | Day 3 | Short Courses
Companion Meeting: Evolution of Next-Generation Sequencing
WEDNESDAY, SEPTEMBER 29, 2010
|
Sponsored by
7:30 am Breakfast Presentation
Next Gen Data Management for Next Gen Life Sciences
Will McGrath, Business Development Manager, Life Sciences, Quantum
The advent of next generation sequencers are contributing to orders of magnitude more data to sift through, analyze, and share, increasing complexities of genomic sequencing workflows. Added complexity means added risk, jeopardizing time to discovery. A tightly integrated scalable high-performance computing platform with intelligent data management options could make all the difference. Learn how to deploy an end-to-end data management infrastructure that solves the most demanding Next Gen Sequencing Workflows so your scientists’ strive for the next major medical breakthrough or discovery go unimpeded.
8:15 Chairperson’s Remarks
8:20 Cloud Computing: A New Business Paradigm for Biomedical Information Sharing
Arnon Rosenthal, Ph.D., Principal Scientist, Cognitive Tools and Data Management, The MITRE Corporation
We aim to help biomedical informaticists decide whether, where, and how to employ cloud technology. Because the Cloud literature has become extensive, we emphasize considerations for biomedical laboratories, and especially data sharing consortia. Methodologically, we formulate our analyses in terms of the component technologies that typically constitute a cloud, bypassing the cacophony of competing definitions; this formulation also allows us to analyze alternatives (e.g., an institution’s data center, or grid computing) that employ some of the component technologies. Our comparisons are based on two primary criteria: flexibility in establishing or extending collaborations, and the protection accorded to shared data and an organization’s other systems. In view of the many studies showing cost savings (in some cases), we conclude that cloud technology is an attractive option for data sharing consortia because a) the cost of ownership is low, b) new members can easily be added to the consortium and c) security concerns are no worse than the alternatives.
8:50 Very Large Scale Metagenome Analysis with MG-RAST
Folker Meyer, Ph.D., Computational Biologist, Mathematics and Computer Science Division, Argonne National Laboratory
With metagenomic data sets growing both in size and abundance, resources for processing, analysis and comparison of metagenomic data sets are a critical component supporting many groups around the planet in adopting shotgun metagenomics. This talk will present MG-RAST, a platform used by hundreds of groups to upload, process and analyze over 170 gigabases of metagenomic data. A key aspect of the presentation will be data mining in metagenomic data sets and tools for the comparison of metagenomes.
9:20 Addressing the Challenges Faced in Providing Analysis Services for Next-Generation Sequencing Data
Chris Hemmerich, M.S., Biological Database Unit Leader, Center for Genomics and Bioinformatics, Indiana University
Analyses of next-generation sequencing data require bioinformatics expertise and computational resources beyond what a small biology lab may possess. Sequencing centers have these resources; however, customizing in-house pipelines to meet the requirements of individual biologists is exceptionally time consuming. We are addressing this problem through intuitive web tools that allow collaborating biologists to customize and run pipelines across distributed grid or cloud computing resources.
Sponsored by
9:50 Practical Considerations for Sequencing Analysis and Bioinformatics in the Cloud
David Powers, Expert - Life Sciences (formerly of Eli Lilly), Cycle Computing
Jason A. Stowe, CEO, Cycle Computing
Next Generation Sequencers give scientists access to larger amounts of data at cheaper costs. With the $1000 genome around the corner, the cloud offers easy, inexpensive access to computing and storage resources for sequence alignment, SNP detection, and tertiary analysis or re-analysis pipelines. However having access to processing power and scalable, resources is only the first step to running analysis and bioinformatics on the cloud. This talk will discuss the practical considerations including storage, required bandwidth, and people costs for running compute nodes internally or in the cloud. Specific, first-hand use cases for running analysis and bioinformatics in various environments will be discussed, covering the benefits for different approaches. Based upon the experience presented, we'll also discuss larger trends in this area.
10:05 Morning Coffee, Poster and Exhibit Viewing
10:45 Cloud BioLinux: Pre-Configured and On-Demand High Performance Computing for the Genomics Community
Konstantinos Krampis, Ph.D., Bioinformatics Engineer, J. Craig Venter Institute
Cloud BioLinux is a publicly available virtual machine that runs on cloud computing platforms, including Amazon EC2 and the open-source Eucalyptus cloud. During this talk, we will first give an overview of cloud platforms, and how they can provide small labs with access to high performance bioinformatics computing, required to work with next generation sequencing data. We will then demonstrate how users can access the bioinformatics tools included with Cloud BioLinux, by starting virtual machines on Amazon’s EC2 cloud computing platform, authenticating, and interacting with the virtual servers.
11:15 Speaker to be Announced
11:45 Discovering Genes and Alternative Transcripts in Deep RNA Sequencing
Jean Thierry-Mieg, Ph.D., Director of Research, CNRS; Staff Scientist, NLM/NCBI, NIH
Danielle Thierry-Mieg, D.Sc., Research Fellow, CNRS; Staff Scientist, NLM/NCBI, NIH
Massively parallel sequencing of RNA samples allows to characterize with great accuracy, in specific tissues or single individuals, the structure of the expressed genes and to discover and quantitate new genes and new alternative transcripts. Yet there are many pitfalls and sources of errors. We will describe how the NCBI AceView program exploits this massive amount of short cDNA sequences, determines recursively optimal discontinuous alignments to the genome, discovers new exons and introns and identifies SNPs. The new data are deeper yet remarkably consistent with previous cDNA sequences in GenBank and dbEST. They refine our understanding of the wonderful complexity of the human transcriptome and show that transcription is not pervasive, but highly regulated and tamed.
12:30 Close of Morning Session
Sponsored by
12:40 Luncheon Presentation
State-of-the-Art in Whole-and Multi-Genome Analysis: a Discussion and Demonstration of Critical Requirements
Don Gregory, Ph.D., Director of the Field Application Scientist Group, GenomeQuest
Dr. Gregory will outline use cases for multi-genome analysis (MGA), including: Disease/Normal Study, Population Genetics, Pharmacogenomics, and Propensity.. He will also explore science questions enabled by MGA, such as “How prevalent is this variation?” and “How can I prioritize the observed varations?”. Lastly, he will review and demonstrate key requirements of MGA solutions, including: scalability to whole-genome reads, interactive querying of sequence comparison results, and integrated access to public datasets.
1:55 Chairperson’s Remarks
2:00 Analysis of Expression, Splicing and Gene Fusions in Human Prostate Tumors by Deep Sequencing
Serban Nacu, Ph.D., Postdoctoral Researcher, Genentech, Inc.
In a pilot study, we sequenced the transcriptomes of three prostate tumors and their matched normals, and performed a comprehensive analysis of expression, splicing, SNPs and mutations. We developed several computational tools and techniques, including an alignment program with the ability to directly detect gene fusions. The biological findings include a new of differentially expressed transcripts and multiple novel fusions.
2:30 Sequencing of Human Cancer Transcriptomes to Discover Novel Gene Fusions
Christopher Maher, Ph.D., Research Investigator, Michigan Center for Translational Pathology, Center for Computational Medicine and Biology, Department of Pathology, University of Michigan
Characterization of specific genomic aberrations in cancers has led to the identification of several successful therapeutic targets. Therefore, we have employed next-generation transcriptome sequencing to elucidate putative “driver” gene fusions that may be hidden by non-specific aberrations. This talk will focus on our bioinformatics approach, the identification and characterization of novel gene fusions, and the implications of transcriptome sequencing for improved cancer therapeutics.
3:00 Exploring Bacterial Transcriptomes in the Age of High Throughput Sequencing
Jonathan Livny, Ph.D., Research Scientist, Broad Institute of MIT and Harvard; Instructor, Brigham and Women’s Hospital/Harvard Medical School
HTS-based bacterial transcriptomic approaches present significant technical and analytical challenges that have limited their utilization and hindered the extraction of biological insights from the large and complex datasets they produce. To address these challenges, we are developing improved protocols for constructing and sequencing bacterial cDNA libraries and more effective and accessible computational tools and infrastructures for visualizing and analyzing HTS transcriptomics data. I will present a summary of these improved experimental and analytical approaches as well as some examples of how HTS transcriptomics is being utilized to explore various aspects of bacterial physiology and evolution.
3:30 Networking Refreshment Break, Exhibit & Poster Viewing
(Shared session with Evolution of Next-Generation Sequencing Conference)
4:00 Poster Award Sponsored by
4:00 Detecting Rare Genetic Variants in the Large-Scale 1000 Genomes Exome Resequencing Project
Fuli Yu, Ph.D., Assistant Professor, Human Genome Sequencing Center, Baylor College of Medicine
The 1000 Genomes Pilot 3 Project aims to generate high coverage data primarily in the coding regions of approximately 1,000 selected genes from ~900 individuals. From this sequencing program we expect to identify essentially all variants present in the targeted exons, using the exome capture technologies combined with different next-generation sequencing platforms. The key challenge in SNP discovery is to distinguish true individual variants from sequencing errors. We have developed Atlas-SNP2 at BCM-HGSC, a computational tool that detects and accounts for systematic sequencing errors caused by context-related variables in a logistic regression model learned from training data sets.
4:30 Impact of the 1000 Genomes Project on the Next Wave of Pharmacogenomic Discovery
M. Eileen Dolan, Ph.D., Professor, Medicine, University of Chicago
The 1000 Genomes Project aims to provide detailed genetic variation data on >1000 genomes from worldwide populations using the next-generation sequencing technologies. Some of the samples utilized for the 1000 Genomes Project are the International Hap-Map samples that are composed of lymphoblastoid cell lines (LCLs) derived from individuals of different world populations. The detailed map of human genetic variation promised by the 1000 Genomes project will allow a more in-depth analysis of the contribution of genetic variation to drug response. Future studies utilizing this new resource can greatly enhance our understanding of the genetic basis of drug response and other complex traits.
5:00 Integrated Analysis of Human Resequencing Data from Multiple Sequencing Platforms
David Craig, Ph.D., Associate Director, Neurogenomics; Investigator, Neurobehavioral Research Unit, TGen
I will present integrated analysis pipelines and software tools for analyzing next-generation sequencing data using multiple sequencing technologies. Specific focus will be on integrating SOLiD and Illumina datatypes, leveraging complementary strengths of both platforms. We will present whole-genome sequence analysis both in the context of 1000 Genomes and within our own whole-genome sequencing studies.
5:30 Close of Conference
Day 1 | Day 2 | Day 3 | Short Courses
Companion Meeting:
Evolution of Next-Generation Sequencing