2014 Archived Content

Cambridge Healthtech Institute’s Third Annual

Bioinformatics for Big Data

How Applications of Big Data will Drive Research Forward

February 10-12, 2014 | Moscone North Convention Center | San Francisco, CA


Day 1 | Day 2 | Day 3 | Download Brochure 

Tuesday, February 11

7:00 am Registration and Morning Coffee

8:00 Plenary Keynote Session (Click Here For More Details) 

9:15 Refreshment Break in the Exhibit Hall with Poster Viewing


10:25 Chairperson’s Remarks
Martin Gollery, CEO, Tahoe Informatics 

10:30 Breaking Down the Wave: A Look at the Data Sources that are Transforming Research

Martin Gollery, CEO, Tahoe Informatics

For well over a decade, the amount of biological data has grown at a rate that exceeds Moore’s Law, a phenomenon that is commonly compared to a ‘Tsunami’ of data. Today that acceleration continues, with the added complexity from a wide range of disparate data sources that must be integrated and filtered by researchers to build a coherent picture of the system being studied. This talk will be a high-level overview of the different technologies that generate big data in the biomedical arena. Finally, we will look into the future at upcoming technologies and the challenges and opportunities that they will present.

11:00 Cancer Genomics

David Haussler, Ph.D., Distinguished Professor and Director, Center for Biomolecular Science & Engineering, University of California Santa Cruz

UCSC has built the Cancer Genomics Hub (CGHub) for the US National Cancer Institute, designed to hold up to 5 petabytes of research genomics data (up to 50,000 whole genomes), including data for all major NCI projects. To date it has served more than 8.3 petabytes of data to more than 300 research labs. Cancer is exceedingly complex, with thousands of subtypes involving an immense number of different combinations of mutations. The only way we will understand it is to gather together DNA data from many thousands of cancer genomes so that we have the statistical power to distinguish between recurring combinations of mutations that drive cancer progression and "passenger" mutations that occur by random chance. Currently, with the exception of a few projects such as ICGC and TCGA, most cancer genomics research is taking place in research silos, with little opportunity for data sharing. If this trend continues, we lose an incredible opportunity. Soon cancer genome sequencing will be widespread in clinical practice, making it possible in principle to study as many as a million cancer genomes. For these data to also have impact on understanding cancer, we must begin soon to move data into a global cloud storage and computing system, and design mechanisms that allow clinical data to be used in research with appropriate patient consent. A global alliance for sharing genomic and clinical data is emerging to address this problem. This is an opportunity we cannot turn away from, but involves both social and technical challenges. Reference: http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-211.html 

11:30 A Test for Predicting Cardiovascular Death in Coronary Artery Disease Patients

Reijo Laaksonen, M.D., Ph.D., FESC, CMO, Zora Biosciences Oy

LDL-cholesterol (LDL-C) has traditionally been used to gauge cardiovascular risk. However, LDL-C provides only limited predictive information on fatal CVD complications in patients with established coronary artery disease (CAD). These patients at the highest risk of myocardial infarction or cardiovascular death present an unmet diagnostic need. Identification of these individuals would allow their more focused treatment in time preventing pre-mature deaths and hospitalizations. In our effort to address this unmet diagnostic need we applied the Zora lipidomic technology to patient samples from well defined CAD patient cohorts. We successfully developed markers that identify high risk CAD patients with accuracy that cannot be reached with currently used routine clinical measurements. Importantly these markers are actionable and can be used also for monitoring the treatment success in patients. The Zora high risk CAD test will lead to significant improvements in clinical diagnostics with concomitant health care savings. 

12:00 pm Exploring Microbiome in Metabolic Diseases

Deepak K. Rajpal, D.V.M., Ph.D., Director, Computational Biology, GlaxoSmithKline

Metabolic diseases, especially type 2 diabetes and obesity, are growing global healthcare concerns. Various studies have highlighted the role of gastrointestinal microbial communities in metabolic health and disease. We will provide a brief overview of the gut microbiome, its putative role in metabolic diseases and the emerging data in this space.

12:30 Session Break

12:40 Luncheon Presentation I: Semantics for Rapid Development of Informatics Solutions 

Ben Szekely, Director & Founding Engineer, Cambridge Semantics

R&D Informatics present a demand for huge quantities of dispersed and diverse data coupled with constantly shifting regulatory and competitive landscape. In this talk, we will discuss how semantics enables - flexible conceptual information modeling; integrating structured and unstructured data at conceptual level; varied data access pathways including blending structured search, semantic search, key word search, and chemical search; and sophisticated text analytics for literature, patents, and other unstructured data sources.

1:10 Luncheon Presentation II (Sponsorship Opportunity Available)

1:40 Refreshment Break in the Exhibit Hall with Poster Viewing


2:15 Chairperson’s Remarks
Dave Anstey, Global Head, Life Sciences, YarcData 

2:20 Implementing Big Data Analysis and Archival Solutions for NGS Data

Zhiyan Fu, Ph.D., Chief Scientific Computing Officer, Genome Institute of Singapore (A*STAR)

This presentation shows the latest development in big data analysis, compression and storage management. It provides a practical case to implement the big data technologies to a mid-size genome center. Attendees will understand the challenges of big data life-cycle management in a genome center and see how the latest big data technologies are implemented, and the pros and cons of some of the techniques, including Hadoop, HDF5, and different NGS compression algorithms evaluated by GIS.

2:50 Annotation of a Massive Dataset of Whole Genome Sequences Using a Hybrid Approach

Gerry Higgins, M.D., Ph.D., Vice President, Pharmacogenomic Science, AssureRx Health, Inc.

A collection of 17,131 whole genome sequences generated by 2nd generation sequencing using Illumina, Complete Genomics, Inc. and SOliD have been analyzed using a hybrid approach. Annotation was performed using comparative genomics modeling; sequence context; functional predictions using bioinformatics tools; epigenomic alterations; gene modeling; coding disruptions; and a supervised learning machine trained on surrogate phenotypes. Using a hybrid annotation approach lead to the discovery of novel pharmacogenomic variants that could be of value for clinical pharmacogenomics.

3:20 Semantic Technologies Offer Great Promise – with Constraints. We will Discuss a New Approach for Accelerating Research: Graph Analytics at Scale

Dave Anstey, Global Head, Life Sciences, YarcData

Our platform for real-time data discovery is enabling leading research hospitals and life sciences organizations to analyze ALL their diverse data sets together, without sampling to rapidly validate more hypotheses, identify unknown relationships and get more value from their data.


3:50 Genome-Wide Protein Structure and Function Prediction

Andrzej Kloczkowski, Ph.D., Professor, Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital and Department of Pediatrics, The Ohio State University College of Medicine

The knowledge of protein structure is critical to comprehend their function, for understanding of molecular mechanisms of disease, and for development of new generations of medicines based on the computer-aided drug design. Because of this there is an urgent need to improve the existing computational methods of structure prediction to reach ultimately the accuracy of prediction comparable to crystallographic or NMR structure determination resolution. We discuss these important problems and propose new methods for genome-wide protein structure and function prediction.

4:20 Valentine’s Day Celebration in the Exhibit Hall with Poster Viewing


5:20 Breakout Discussions in the Exhibit Hall 

These interactive discussion groups are open to all attendees, speakers, sponsors, & exhibitors. Participants choose a specific breakout discussion group to join. Each group has a moderator to ensure focused discussions around key issues within the topic. This format allows participants to meet potential collaborators, share examples from their work, vet ideas with peers, and be part of a group problem-solving endeavor. The discussions provide an informal exchange of ideas and are not meant to be a corporate or specific product discussion. 

Big Data's Big Role in Understanding Complex Diseases 

Andreas Kogelnik, M.D., Ph.D., Founder and Director, Open Medicine Institute 

• High performance platform for integrating molecular/genomic and clinical data
• Applying science and medicine to crowd-sourced data
• Enabling longitudinal outcomes studies with genomics and informatics

Big Data: Should it be Top-down or Bottom-up?? 

Michael Liebman, Ph.D., Managing Director, Strategic Medicine, Inc. 

Sabrina Molinaro, Ph.D., Institute for Clinical Physiology, National Research Council, Italy 


6:30 Close of Day

Day 1 | Day 2 | Day 3 | Download Brochure 

Japan-Flag Korea-Flag China-Simplified-Flag China-Traditional-Flag  

Premier Sponsors

Beckman Coulter Life Sciences 

Boston Healthcare



Charles River no tagline


Cofactor Genonics


Menarini Silicon Biosystems






View All Sponsors