Register Online

PDF Version

Submit a

Upcoming Conferences

Request info

Order CD

Press pass

Click here to view a post conference review and photos

Held concurrently with CHI's Data Integration for the Pharmaceutical Industry
and immediately following CHI's Third Annual Microarray Data Analysis.

Corporate Sponsors:

Biological and pharmaceutical research is increasingly moving to higher and higher throughput. The generation of all this data, however, loses most of its potential value unless important conclusions can be extracted from large data sets quickly enough to influence next steps. Opportunities to use graphical representation provide an extremely effective means of overcoming the complexity and sheer volume of data that needs to be analyzed. Obtaining maximum value from experimental data involves a team effort that includes biologists, chemists, pharmacologists, statisticians and software engineers. This program has been designed to provide perspectives from each specialty, with an emphasis on how they can be integrated into a cohesive, comprehensive team. Gain new perspectives on ways to look at your data, and how to utilize visualization tools to create added value.

Scientific Advisors:
Dr. Joanna L. Batstone, IBM Corporation
Dr. Georges G. Grinstein, University of Massachusetts
Dr. Bruno W.S. Sobral, Virginia Tech

Download Conference-at-a-Glance
View Initial Listing of Poster Presentations

Wednesday, September 24

8:00am Pre-conference Short Course Tutorial Registration and Coffee

8:30-11:30 Pre-conference Short Course Tutorials (*Separate Registration Required)

Pre-conference Short Course Tutorials


Integrating Visualization and Data Mining for Microarray Analysis
Dr. Georges Grinstein, Professor, Computer Science Department; Director, Institute for Visualization and Perception Research; Director, Center for Biomolecular and Medical Informatics, University of Massachusetts Lowell; and Founder and Director, Research & Development
This short course will provide an overview of visualization and data mining techniques discuss how current systems deal with their integration, and the role of high dimensional data. We will highlight the exploration process and provide various application examples. We will also discuss where visual analytic systems should be heading. Several demos will be presented.

Course Participants Will:

  • Gain a fundamental background in visualization
  • Understand the role of visualization in discovery
  • Have an overview of the different techniques and how they fit in with analysis
  • Understand the integration issues
  • Gain knowledge on what the current systems provide and how to compare systems

Who should attend?
Biologists, chemists, analysts, software developers, statisticians, bioinformaticians, and managers in discovery biology and drug development laboratories.


Enterprise Database Integration for Researchers
Dr. William J. Pjura, President, Altionics, Inc.
Enterprise Database Integration for Researchers presents a visual modeling approach to identifying entity objects and packaging them into cohesive subject areas. The short course examines several public web accessible, chemical, pharmaceutical, and biological databases from the perspective of analysis and design criteria, focusing on identifying the common characteristics shared by the databases and their unique characteristics. These common and unique characteristics are identified and packaged into cohesive subject areas and remodeled into extensible and adaptive enterprise database architectures. Subsequently, the eXtensible Markup Language (XML) is introduced, and existing chemical, biological, and pharmaceutical XML-based vocabularies are reviewed. The importance of XML in the integration of data from disparate sources and the design of XML schema to facilitate the integration of these databases are examined, and the implementation of the XML schema is demonstrated.

Course Prospectus:
Course participants will have the opportunity to apply database analysis and design concepts to the analysis of several public Web accessible chemical and biological databases. Attendees will learn how to identify the common characteristics shared by the databases and to identify their unique characteristics. They will see how the concept of packaging cohesive objects is applied to designing a database architecture that supports the integration of apparently disparate databases. Subsequently, the eXtensible Markup Language (XML) will be introduced and the existing chemical, biological, and pharmaceutical XML-based vocabularies will be reviewed. The importance of XML in data integration from seemingly disparate sources and the design of XML schema to facilitate integration of these disparate databases will be examined. Finally, the implementation of the XML schema will be demonstrated.

Who should attend?
This course is recommended for researchers who are responsible for experimental design, are familiar with basic database analysis and design concepts, and have an ongoing need to effectively integrate and analyze data from internal and external databases.

*Separate Registration Required

*Separate Registration Required

11:00am - 12:00pm Conference Registration and Poster Set-up

11:30am-12:30pm Luncheon Workshop   Sponsored by 
"Data Integration at Amgen"
Presented by Mark Jury, Amgen

1:00pm Chair's Opening Remarks
Dr. Georges Grinstein

Joint Kick-off Keynotes

1:10 Mining the Biomedical Literature using Semantic Analysis and Natural Language Processing Techniques
Dr. Ronen Feldman, Assistant Professor, Mathematics and Computer Science Department, Israel's Bar-Ilan University and Chief Scientist, Clearforest, Ltd.
The information age has made it easy to store large amounts of data electronically. The proliferation of documents available on the web, on corporate intranets, on newswires and elsewhere is overwhelming. Search engines only exacerbate this overload problem by making more and more documents available in a matter of a few keystrokes. This information overload is directly mirrored in the bio-medical field, where scientific publications and other forms of text-based data are produced at an unprecedented rate. Text mining is the combined, automated process of analyzing unstructured, natural language text in order to discover information and knowledge that are typically difficult to retrieve. In this paper, we focus on using text mining as it applies to the biomedical literature. In particular, we are interested in finding relationships among genes, proteins, drugs and diseases, to assist in explaining and predicting complex biological processes. We will describe the LitMiner‘ system that we have developed for this purpose; in particular, we will focus on the KDD CUP 2002, which serves as a formal evaluation of our system.

1:50 Integromics in Drug Discovery: Practical Tools for Integrating Genomics, Proteomics, Bioinformatics, and Chemoinformatics
Dr. John N. Weinstein, Senior Research Investigator, Laboratory of Molecular Pharmacology, National Cancer Institute, National Institutes of Health
After microarray experiments, (or other "omic" studies), have been done in the pharmaceutical context, one's first task is to analyze the data statistically. But that leaves open the Big Question: what do the results mean biologically and pharmacologically? A number of practical approaches and computational tools for addressing that question and integrating different types of data will be discussed. Included is a set of program packages available from our group and collaborators through MedMiner, MatchMiner, GoMiner, CIMminer, and LeadScope/LeadMiner.

2:30 Poster and Exhibit Viewing, Refreshment Break



3:15pm A Life Scientist's Road to Interoperability of Data and Tools
Dr. J. Dana Eckart, Senior Project Associate, The Virginia Bioinformatics Institute, Virginia Tech
Life science researchers deal with large amounts of varied data, from sequencing trace files to mass spectra. These data sets grow faster than the number of transistors that can be placed on a chip (Moore's Law). In addition, scientists and other life sciences data consumers must convert these data into information about genomes, proteins, protein expression patterns, metabolites, and the interaction pathways that enable understanding of biological systems and processes - some contained in the lietrature. Therefore, life scientists require a framework for data and tool management and interoperability that supports these needs. To meet this challenge, VBI has developed PathPort (i.e. Pathogen Portal), which uses a "bus" architecture called ToolBus. On the server side, ToolBus employs Web-services, while on the client side, it contacts data and tools and views results through a single, consistent user interface. This bioinformatics platform is flexible and extensible, connecting "plug-in" data sources, analysis tools, and visualization components. The project exemplifies how technology can be leveraged to support distributed collaborations among life scientists. This approach should significantly enhance the re-use of the framework for different thematic areas of research. The presentation will outline our process in developing this dynamic software package.

3:45pm Knowledge Extraction in the "Omics" Era: The Meaning is in the Literature
Dr. Damien Chaussabel, Research investigator, National Institute of Allergy and Infectious Disease, National Institutes of Health
The rise of high-throughput screening platforms has led to the rapid proliferation of datasets where functional implication often remains cryptic. This talk will present an original text mining approach that exploits the biomedical literature database to extract critical biological knowledge embedded into the mass of information produced by genome-wide screening strategies. Furthermore, examples will illustrate how the sequential analysis of gene expression and term occurrence patterns generates a visual interface between microarray data and vast literature resources.

4:15pm Analyzing and Managing Affymetrix GeneChip Data
Mr. Gregg Wright, Senior Vice President, Life Sciences, IMC Inc.
Neuroscientists at the Salk Institute for Biological Studies are using thousands of Affymetrix GeneChip microarrays to find genes associated with specific brain functions, behaviors and important phenotypes. In close collaboration with Salk scientists, IMC has developed TeraGenomics, a highly scalable data warehouse to implement Salk analysis methods and to make it easier for bench scientists in many locations to collaborate and rapidly mine the vast amounts of data generated with these microarrays. The metadata are MIAME-compliant and incorporate taxonomies for brain structures and disease types. We have adapted technology long used for very large-scale data warehouses in retail, banking, transportation, and manufacturing, but new to biosciences. This presentation will summarize the computational problems and describe the solution architecture.

4:45pm Mining a Cross-Species Expression Data Repository
Dr. Jordan Stockton, Product Manager, Marketing, Silicon Genetics
The cross-species transcriptome has yet to be fully exploited in the post-genomic era. Using homology indexing tools and datasets from a large public data repository, we have identified meaningful expression patterns and the coordinated activity of orthologous genes across technology types and species. We suggest that cross-species expression profiling has the potential to shed new light on key biological pathways.

5:15pm PathwayAssist: A Tool for the Integration of Gene Expression, Protein-Protein Interaction, Metabolic Pathway and Literature Data
Mr. Jason Goncalves, Chief Scientific Officer, Iobion Informatics
Integrating multiple sources of biological information is an important step on the path to knowledge and interpretation of biological results. The seminar will review the issues involved in integrating diverse information sources, such as gene expression, protein-protein interaction and metabolic pathway data. We will also discuss how natural language processing (NLP) techniques can be used to greatly enhance the current biological databases by extract information directly from the body of biomedical literature. Application of the NLP and data integration tools in PathwayAssist will be presented through specific use cases.

5:45-6:45 Networking Reception (hosted by Cambridge Healthtech Institute)


Thursday, September 25

7:30am Coffee and Technology Workshop (Sponsorship Available)



8:30am Chair's Remarks
Dr. Bruno WS Sobral

8:35am Integrated, Tightly-Coupled, High Dimensional Analysis and Visualization for Microarray Expression Data
Dr. Georges Grinstein, Professor, Computer Science Department; Director, Institute for Visualization and Perception Research; Director, Center for Biomolecular and Medical Informatics, University of Massachusetts Lowell and Founder and Director, Research & Development, AnVil
We will describe a highly integrated tightly-coupled visualization and analysis environment based on a classification of visualization as presentation, confirmatory, or exploratory. These affect the role of analysis. We will provide an application example which follows this approach with its focus on the interplay and requirements for exploration and confirmation. Our example will show how we deal with the identification of subspaces of interest (exploratory) in very high dimensional datasets, and how validation (confirmatory) of the various discovered and proposed hypotheses takes place.

9:05am Visualizing the Genome: Techniques for Presenting Human Genome Data and Annotations
Dr. Ann Loraine, Bioinformatics Scientist, Bioinformatics Department, Affymetrix, Inc.
To get maximum benefit from genomic sequence data and annotations, biologists need visualization tools that present the data in an intuitive, interactive format. This talk will cover genome display techniques we developed at Affymetrix that support rapid and efficient visual inspection of complex genomic scenes. Some of these include one-dimensional zooming to show sequence data alongside gene structures; color-coding exons to indicate translation frame; and display of protein annotations in the context of genomic sequence to show how alternative splicing impacts conserved functionally important motifs in the encoded proteins. Using genome display software we developed, I will demonstrate how these techniques make answering basic questions about human gene structures easy to accomplish.

9:35am Methods for Analysis and Visualization of SNP Genotype Data for Complex Diseases
Dr. Anya Tsalenko, R&D Scientist, Life Sciences Technologies Laboratory (LSTL) at Agilent Labs
SNP markers are becoming central for studying genetic determinants of complex diseases. Large SNP data sets collected in such studies call for the development of specialized analysis tools. We present statistical methods, visualization tools and algorithmic approaches to questions that arise in pursuing correlations between individual SNPs as well as sets of SNPs and sample properties. These methods are based on similar tools for analysis of gene expression data.

10:05am Poster and Exhibit Viewing, Refreshment Break

10:45am Combining Gene Ontology with Sequence Similarity over Multiple Genomes
Dr. Andre Nantel, Research Officer, Biotechnology Research Institute, National Research Council of Canada
Sequence similarity is the main source of information available during the annotation of novel genes. We have been using the E-values from whole genome/proteome Blast comparisons to visualize sequence similarities between a reference organism (S. cerevisiae, C. albicans, C. jejuni) and a large number of other eukaryotic and prokaryotic organisms. These datasets can then be correlated/ explored along with additional information related to gene ontologies and transcriptional profiling
 (see biovis/).

11:15am Using Multiple Visualization Types Together with Multiple Data Types to Support Fast Decision Making in the Pharmaceutical Pipeline
Dr. Gavin Fischer, Application Scientist, OmniViz
The ability to use "abstract" visualizations to draw attention to areas of interest, and more in depth visualizations to answer focused questions, enables researchers to move from a large amount of data to the one (or few) records they are interested in. To support this, visualizations need to support any type of data so that fundamental judgments about the relationships within the data can be exposed. OmniViz has the ability to use disparate data types both in unique overview visualizations (e.g. Galaxy‘ view) that give broad perspectives and in specialized visualizations that address specific questions (e.g. correlation tool).

11:45am Panel Discussion

12:15 Luncheon  


1:45pm Chair's Remarks
Dr. Georges Grinstein

1:50 Information Pathways in Pharmaceutical R&D
Dr. Otto Ritter, Associate Director, Bioinformatics, Enabling Science and Technology, AstraZeneca R&D Boston
Molecular pathways are useful models for representing biological processes at the cellular level. Pathways are usually represented as graphs, where molecules or molecular complexes are the nodes, and molecular interactions are the edges. In one step of generalization, where we take any biomedical entities as nodes and any general relationships as edges, we get an associative network as a representation of biomedical knowledge. If we take one more step in this generalization process and include any information assets and any transformations or associations, we get information pathways as models representing (pharmaceutical) R&D processes. It's useful to know that at all three levels of interpretation (molecular, biomedical, R&D), we can actually re-use the same software components for data management, analysis, and visualization.

2:10 Standards to Enable Information Integration
Dr. David Benton, Director, Knowledge Integration and Discovery Systems, Informatics and Knowledge Management, R&D IT, GlaxoSmithKline
Information integration is widely acknowledged to be both a great need and a major challenge facing pharmaceuticals R&D. The principal obstacle to information integration systems is heterogeneity at virtually all levels of the pharma information stack. This talk will address the sources of this heterogeneity, question the premise that any technical solution can solve the problems posed by heterogeneity, and propose that any non-trivial information integration will require shared ontologies and domain models. It will also address whether such shared ontologies and domain models can be: (1) developed entirely in-house; (2) acquired from vendors; or (3) developed as open standards by the R&D community.

2:40 Poster and Exhibit Viewing; Refreshments and Desserts Served

3:30 Microarray Gene Expression Analysis and Data Integration
Dr. Heng Dai, Senior Scientist, Bioinformatics, Drug Discovery, Johnson & Johnson PRD
Data integration and interpretation remains one of the major challenges in microarray data analysis. It is essential to integrate data from diverse resources including molecular annotation, expression, pathway, disease and pharmacological databases. We have developed methods and tools to effectively integrate this data into a central database, which can be easily accessed through a web interface. This data can be further analyzed with data mining and visualization tools, such as Omniviz, to identify novel interactions and associations between medical, biological and chemical entities.

4:00 New Challenge for Drug Discovery Informatics: Information and Knowledge Integration
Dr. Abdel Laoui, Head, Chemoinformatics, Aventis Pharmaceuticals
The new issue in the pharmaceutical industry is to develop new drug discovery informatics solution designed to deal with the challenges emerging in today's data-rich environment - challenges arising from the volume, diversity, and variable quality of data being generated. Data pipelining has emerged as a practical technology for accelerating the discovery process. The companies that will be successful will be those that can bridge the gap between Bioinformatics and ChemoInformatics quickly. At Aventis we have implemented a new paradigm in drug discovery informatics which we call Chemical Biology. We will present this integrated approach which is multidisciplinary and knowledge based with the corresponding new enabling technology.

4:30 Speaker to be announced

5:00 Panel Discussion

5:30 Close of Conference

Lead Publication:
Sponsoring Publications:

Web Partner:

There are many sponsorship opportunities for your company to maximize its exposure and influence. They include conference-specific sponsorships, technology workshops, networking receptions, delegate bags, etc. We are also ready to work with you in customizing a solution to meet your specific marketing objectives. Make a lasting impression by taking advantage of these marketing tools.

For exhibit and sponsorship information, please contact Carol Dinerstein at 781-972-5471 or

Special Airline Discounts Available
Special Zone and Discount Fares have been established for this conference with United Airlines. Please call United Airlines Meeting Reservation Desk at 800-521-4041 and reference ID#579YS.

Wyndham Baltimore Inner Harbor
101 W. Fayette Street
Baltimore, Maryland 21201
T: 410-752-1100 o F: 410-752-0832
Cut-off date: August 29, 2003
$179 single/$199 double occupancy
Please call the hotel directly to make your room reservation. Identify yourself as a Cambridge Healthtech Institute conference attendee to receive the reduced room rate. Reservations made after the cut-off date or after the group room block has been filled (whichever comes first) will be accepted on a space-and-rate-availability basis. Rooms are limited, so please book early.

Cambridge Healthtech Institute encourages attendees to gain further exposure by presenting their work in the poster sessions. Please fill out the registration form, with the poster title and primary author. To ensure inclusion in the conference CD, a one-page summary must be submitted and registration must be paid in full by August 22, 2003.  Click here for poster instructions


Initial Listing of Poster Presentations

Extending MicroArray Explorer with R
Dr. Peter F. Lemkin, National Cancer Institute

Multiresolution Analysis of 2-D Electrophoretic Gel Images
Dr. Nicolas Nafati, Research Engineer, INSERM

Model Centric Data Integration and Visualization
Dr. Christophe Schilling, Chief Technical Officer, Genomatica, Inc.
Data Integration to Enable Drug Discovery:A Microarray PerspectiveDr. Soheil Shams, BioDiscovery, Inc


View 2002 programs:

Microarray Data Analysis
Data Visualization 


CHI Home   |  Conferences   |  Exhibits  |  Sponsorship  |  Request Info CD Orders  |  Privacy Policy

Phone: 781-972-5400, Fax:  781-972-5425