Sequencing Data Analysis and Storage 2010 Banner

Monday  |  Tuesday  |  Wednesday  |  Download Brochure 

 

TUESDAY, MARCH 16, 2010

7:00 am Conference Registration

7:30 Breakfast Presentation Sponsored by Data Direct Networks
Storage Solutions for High Throughput Sequencing, Mass Spectrometry, and Genomic Archives
Jeff Denworth, Vice President Marketing, DataDirect Networks
Next-generation gene sequencing and analysis methods have introduced data management challenges into the life sciences research center. Sequence data models, long-term data archiving requirements, and support for scalable sequence alignment systems drive the need for new data management and storage technologies. DataDirect Networks, the leading storage provider to genomics researchers and the most content and data intensive environments in the world, will discuss concepts to simplify and accelerate scalable life sciences data archives. During his talk, Mr. Denworth will discuss storage challenges and approaches for tackling common problems.

 

ENABLING PRODUCTIVITY

8:30 Chairperson’s Remarks
Ting Chen, Ph.D., Associate Professor, Program in Computational Biology, University of Southern California

8:35 Mapping the Next Generation Sequencing Data

Ting Chen, Ph.D., Associate Professor, Program in Computational Biology, University of Southern California

We developed a mapping software, PerM (Periodic seed Mapping) that uses periodic spaced seeds to efficiently map millions of reads for large reference genomes. The data structure in PerM allows the entire genome to be loaded to memory, while multiple processors simultaneously map reads to the reference. The weight maximized periodic seeds offer full sensitivity for up to three mismatches and high sensitivity for four and five mismatches while minimizing the number of random hits per query, significantly speeding up the running time for analyzing both SOLiD and Solexa reads.

9:05 JBrowse: A Next-Generation Genome Browser

Ian H. Holmes, Ph.D., Assistant Professor, Department of Bioengineering, University of California, Berkeley

 

9:35 Population Surveys of Genetic Variation Using Next-Generation Sequencing: Comprehensive Identification of Variants and Association with Phenotype
Vikas Bansal, Ph.D., Research Scientist, Genomic Medicine, Scripps Translational Science Institute
Apart from sequencing of individual human genomes, the massive throughput of next-generation sequencing technologies is also enabling the next-generation of disease association studies through the sequencing of specific genomic regions in large populations. Post-sequencing, it is important to accurately identify all forms of sequence variants before these can be used for association mapping and population-genetic analysis. We describe novel methods that leverage sequence data from a population of individuals and utilize existing alignment/assembly methods for short read sequence data to enable the comprehensive identification and genotyping of both SNPs and indels.  We also describe statistical methods for association mapping with multiple rare variants. We present applications of these methods to sequence data from an obesity candidate gene sequencing study and population sequencing of pathogen genomes.

Aspera10:05 Sponsored Presentation by

Efficient Data Transfers for Life Sciences, Using Next-Generation File Transport Technology

Diego Dugatkin, Ph.D., VP, Product Management, Aspera, Inc.

10:20 Networking Coffee Break, Exhibit & Poster Viewing

11:00 Orthogonal Data and Computational Models of Biological Systems in the Sequence Space

Andrew Kasarskis, Ph.D., Sage Bionetworks

 

11:30 Here Comes the Flood: Making Data from Thousands of Genomes Useful
Steve Lincoln, Ph.D., VP Scientific Applications, Complete Genomics, Inc.
Complete Genomics has established a large-scale commercial genome center to sequence thousands of normal and cancer Human Genomes in 2010.  Given the characteristics of the underlying data and through the use of novel algorithms, highly accurate calls of germ-line and somatic variations are made over the vast majority of the human genome.  While raising new questions and opportunities in downstream analyses of these data sets, the accuracy, scale, and cost-effectiveness of these technologies undoubtedly enable new paradigms in biomedical research.

12:00 pm Close of Morning Session

12:15 Luncheon Presentation Sponsored by CycleComputingRESIZE 
CycleCloud: Boosting Productivity with HPC as a Service on the Cloud
Jason Stowe, CEO, Cycle Computing, LLC
Now/Next generation Sequencing gives scientists access to larger amounts of genome data at cheaper costs.  The cloud offers easy, inexpensive access to computing for sequence analysis. For researchers that rely on computation to analyze these data sets, cloud computing can change the way science gets done. But successfully deploying applications to use the cloud efficiently involves a number of technical challenges. These include making applications scale, securing the clusters, and focusing on ease of use.  CycleCloud makes this easy by pre-engineering clusters/pipelines to work optimally and securely on the Cloud. With a focus on improving researcher productivity, CycleCloud creates fully-managed HPC clusters as a service with key applications pre-installed, the ability to quickly develop new analysis workflows, shared file systems, and auto-scaling to size clusters to the multi-user workloads you place on them.

This session will focus on:
- Bandwidth/data considerations for using the Cloud
- Productivity, ease of use, and security requirements
- Example walkthroughs of pre-engineered pipelines for genome analysis and searching use cases

 

 

ANALYSIS

2:00 Chairperson’s Remarks

Callum Bell, Ph.D., Program Lead, National Center for Genome Resources

2:05 FEATURED PRESENTATION

Pavel PevznerDe novo Sequence Assembly

Pavel Pevzner, Ph.D., Professor, Computer Science & Engineering, University of California, San Diego

 

 

 

 

2:35 Whole Genome Sequencing: Global Alignment, Local Alignment, and Variant Calling in Cancers

Stanley Nelson, M.D., Professor, Human Genetics, University of California, Los Angeles

This frames our work on local sequence alignment, highly sensitive global sequence alignment (BFAST, in press) in order to optimize variant calling within cancer genomes including point mutations, indels, larger insertion deletions and chromosomal translocations.

3:05 The First Time is the Hardest: Lessons Learned Analyzing Whole Cancer Genomes

David Dooling, Ph.D., Assistant Director, Genome Center, Washington University

The traditional paradigm in bioinformatics has been an analyst writing scripts and running analyses in an interactive manner. The Genome Center has developed a high-throughput computing infrastructure and software framework that allows these ad hoc analyses to be quickly integrated into fault-tolerant analysis pipelines. Applications of this infrastructure to new analysis methods and whole cancer genomes will be presented.

3:35 Networking Refreshment Break, Poster & Exhibit Viewing

4:15 Mobile Elements Create Structural Variation: Analysis of a Complete Human Genome

Lynn B. Jorde, Ph.D., Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah

4:45 Carrier Screening for Rare Recessive Genetic Disorders by Second Generation Sequencing

Callum Bell, Ph.D., Program Lead, National Center for Genome Resources

5:15 Close of Day