|
Click
here to view a post conference review and photos

Held concurrently with CHI's Data
Integration for the Pharmaceutical Industry
and immediately following CHI's Third Annual
Microarray Data Analysis.
| Corporate Sponsors: |
|
|
|
 |
 |
|
|
Biological and pharmaceutical
research is increasingly moving to higher and higher throughput. The generation
of all this data, however, loses most of its potential value unless important
conclusions can be extracted from large data sets quickly enough to influence
next steps. Opportunities to use graphical representation provide an extremely
effective means of overcoming the complexity and sheer volume of data that needs
to be analyzed. Obtaining maximum value from experimental data involves a
team effort that includes biologists, chemists, pharmacologists, statisticians
and software engineers. This program has been designed to provide
perspectives from each specialty, with an emphasis on how they can be
integrated into a cohesive, comprehensive team. Gain new perspectives on ways to
look at your data, and how to utilize visualization tools to create added
value.
Scientific Advisors:
Dr. Joanna L. Batstone, IBM Corporation
Dr. Georges G. Grinstein, University of Massachusetts
Dr. Bruno W.S. Sobral, Virginia Tech
Download
Conference-at-a-Glance
View Initial Listing of Poster Presentations
8:00am Pre-conference Short
Course Tutorial Registration and Coffee
8:30-11:30 Pre-conference
Short Course Tutorials (*Separate Registration Required)
Pre-conference
Short Course Tutorials
|
COURSE
ONE*
Integrating
Visualization and Data Mining for Microarray Analysis
Dr. Georges Grinstein, Professor, Computer Science Department;
Director, Institute for Visualization and Perception Research;
Director, Center for Biomolecular and Medical Informatics,
University of Massachusetts Lowell; and Founder and Director,
Research & Development
This short course will provide an overview of visualization and
data mining techniques discuss how current systems deal with their
integration, and the role of high dimensional data. We will
highlight the exploration process and provide various application
examples. We will also discuss where visual analytic systems
should be heading. Several demos will be presented.
Course
Participants Will:
- Gain a
fundamental background in visualization
- Understand
the role of visualization in discovery
- Have an
overview of the different techniques and how they fit in with
analysis
- Understand
the integration issues
- Gain
knowledge on what the current systems provide and how to
compare systems
Who should
attend?
Biologists, chemists, analysts, software developers,
statisticians, bioinformaticians, and managers in discovery
biology and drug development laboratories.
|
COURSE TWO*
Enterprise Database Integration
for Researchers
Dr. William J. Pjura, President, Altionics, Inc.
Enterprise Database Integration for Researchers presents a visual
modeling approach to identifying entity objects and packaging them
into cohesive subject areas. The short course examines several
public web accessible, chemical, pharmaceutical, and biological
databases from the perspective of analysis and design criteria,
focusing on identifying the common characteristics shared by the
databases and their unique characteristics. These common and
unique characteristics are identified and packaged into cohesive
subject areas and remodeled into extensible and adaptive
enterprise database architectures. Subsequently, the eXtensible
Markup Language (XML) is introduced, and existing chemical,
biological, and pharmaceutical XML-based vocabularies are
reviewed. The importance of XML in the integration of data from
disparate sources and the design of XML schema to facilitate the
integration of these databases are examined, and the
implementation of the XML schema is demonstrated.
Course Prospectus:
Course participants will have the opportunity to apply database
analysis and design concepts to the analysis of several public Web
accessible chemical and biological databases. Attendees will learn
how to identify the common characteristics shared by the databases
and to identify their unique characteristics. They will see how
the concept of packaging cohesive objects is applied to designing
a database architecture that supports the integration of
apparently disparate databases. Subsequently, the eXtensible
Markup Language (XML) will be introduced and the existing
chemical, biological, and pharmaceutical XML-based vocabularies
will be reviewed. The importance of XML in data integration from
seemingly disparate sources and the design of XML schema to
facilitate integration of these disparate databases will be
examined. Finally, the implementation of the XML schema will be
demonstrated.
Who should attend?
This course is recommended for researchers who are responsible for
experimental design, are familiar with basic database analysis and
design concepts, and have an ongoing need to effectively integrate
and analyze data from internal and external databases. |
| *Separate
Registration Required |
*Separate
Registration Required |
11:00am - 12:00pm Conference
Registration and Poster Set-up
| 11:30am-12:30pm Luncheon
Workshop |
Sponsored by |
"Data
Integration at Amgen"
Presented by Mark Jury, Amgen
|

|
1:00pm Chair's Opening Remarks
Dr. Georges Grinstein
1:10 Mining the Biomedical
Literature using Semantic Analysis and Natural Language Processing
Techniques
Dr. Ronen Feldman, Assistant Professor, Mathematics and Computer
Science Department, Israel's Bar-Ilan University and Chief Scientist,
Clearforest, Ltd.
The information age has made it easy to store large amounts of data
electronically. The proliferation of documents available on the web, on
corporate intranets, on newswires and elsewhere is overwhelming. Search
engines only exacerbate this overload problem by making more and more
documents available in a matter of a few keystrokes. This information
overload is directly mirrored in the bio-medical field, where scientific
publications and other forms of text-based data are produced at an
unprecedented rate. Text mining is the combined, automated process of
analyzing unstructured, natural language text in order to discover
information and knowledge that are typically difficult to retrieve. In
this paper, we focus on using text mining as it applies to the biomedical
literature. In particular, we are interested in finding relationships
among genes, proteins, drugs and diseases, to assist in explaining and
predicting complex biological processes. We will describe the LitMinerÔ
system that we have developed for this purpose; in particular, we will
focus on the KDD CUP 2002, which serves as a formal evaluation of our
system.
1:50 Integromics in Drug Discovery:
Practical Tools for Integrating Genomics, Proteomics, Bioinformatics, and
Chemoinformatics
Dr. John N. Weinstein, Senior Research Investigator, Laboratory of
Molecular Pharmacology, National Cancer Institute, National Institutes of
Health
After microarray experiments, (or other "omic" studies),
have been done in the pharmaceutical context, one's first task is to
analyze the data statistically. But that leaves open the Big Question:
what do the results mean biologically and pharmacologically? A number of
practical approaches and computational tools for addressing that question
and integrating different types of data will be discussed. Included is a
set of program packages available from our group and collaborators through
http://discover.nci.nih.gov: MedMiner, MatchMiner, GoMiner, CIMminer, and
LeadScope/LeadMiner.
2:30 Poster and Exhibit Viewing,
Refreshment Break
DATA MINING
3:15pm A Life Scientist's
Road to Interoperability of Data and Tools
Dr. J. Dana Eckart, Senior Project Associate, The Virginia
Bioinformatics Institute, Virginia Tech
Life science researchers deal with large amounts of varied data, from
sequencing trace files to mass spectra. These data sets grow faster than
the number of transistors that can be placed on a chip (Moore's Law). In
addition, scientists and other life sciences data consumers must convert
these data into information about genomes, proteins, protein expression
patterns, metabolites, and the interaction pathways that enable
understanding of biological systems and processes - some contained in the
lietrature. Therefore, life scientists require a framework for data and
tool management and interoperability that supports these needs. To meet
this challenge, VBI has developed PathPort (i.e. Pathogen Portal), which
uses a "bus" architecture called ToolBus. On the server side,
ToolBus employs Web-services, while on the client side, it contacts data
and tools and views results through a single, consistent user interface.
This bioinformatics platform is flexible and extensible, connecting
"plug-in" data sources, analysis tools, and visualization
components. The project exemplifies how technology can be leveraged to
support distributed collaborations among life scientists. This approach
should significantly enhance the re-use of the framework for different
thematic areas of research. The presentation will outline our process in
developing this dynamic software package.
3:45pm Knowledge Extraction
in the "Omics" Era: The Meaning is in the Literature
Dr. Damien Chaussabel, Research investigator, National Institute of
Allergy and Infectious Disease, National Institutes of Health
The rise of high-throughput screening platforms has led to the rapid
proliferation of datasets where functional implication often remains
cryptic. This talk will present an original text mining approach that
exploits the biomedical literature database to extract critical biological
knowledge embedded into the mass of information produced by genome-wide
screening strategies. Furthermore, examples will illustrate how the
sequential analysis of gene expression and term occurrence patterns
generates a visual interface between microarray data and vast literature
resources.
4:15pm Analyzing and
Managing Affymetrix GeneChip Data
Mr. Gregg Wright, Senior Vice President, Life Sciences, IMC Inc.
Neuroscientists at the Salk Institute for Biological Studies are using
thousands of Affymetrix GeneChip microarrays to find genes associated with
specific brain functions, behaviors and important phenotypes. In close
collaboration with Salk scientists, IMC has developed TeraGenomics, a
highly scalable data warehouse to implement Salk analysis methods and to
make it easier for bench scientists in many locations to collaborate and
rapidly mine the vast amounts of data generated with these microarrays.
The metadata are MIAME-compliant and incorporate taxonomies for brain
structures and disease types. We have adapted technology long used for
very large-scale data warehouses in retail, banking, transportation, and
manufacturing, but new to biosciences. This presentation will summarize
the computational problems and describe the solution architecture.
4:45pm Mining a
Cross-Species Expression Data Repository
Dr. Jordan Stockton, Product Manager, Marketing, Silicon Genetics
The cross-species transcriptome has yet to be fully exploited in the
post-genomic era. Using homology indexing tools and datasets from a large
public data repository, we have identified meaningful expression patterns
and the coordinated activity of orthologous genes across technology types
and species. We suggest that cross-species expression profiling has the
potential to shed new light on key biological pathways.
5:15pm PathwayAssist: A
Tool for the Integration of Gene Expression, Protein-Protein Interaction,
Metabolic Pathway and Literature Data
Mr. Jason Goncalves, Chief Scientific Officer, Iobion Informatics
Integrating multiple sources of biological information is an important
step on the path to knowledge and interpretation of biological results.
The seminar will review the issues involved in integrating diverse
information sources, such as gene expression, protein-protein interaction
and metabolic pathway data. We will also discuss how natural language
processing (NLP) techniques can be used to greatly enhance the current
biological databases by extract information directly from the body of
biomedical literature. Application of the NLP and data integration tools
in PathwayAssist will be presented through specific use cases.
5:45-6:45 Networking
Reception (hosted by Cambridge Healthtech Institute)
7:30am Coffee and
Technology Workshop (Sponsorship Available)
DATA VISUALIZATION
8:30am Chair's Remarks
Dr. Bruno WS Sobral
8:35am Integrated,
Tightly-Coupled, High Dimensional Analysis and Visualization for
Microarray Expression Data
Dr. Georges Grinstein, Professor, Computer Science Department;
Director, Institute for Visualization and Perception Research; Director,
Center for Biomolecular and Medical Informatics, University of
Massachusetts Lowell and Founder and Director, Research & Development,
AnVil
We will describe a highly integrated tightly-coupled visualization and
analysis environment based on a classification of visualization as
presentation, confirmatory, or exploratory. These affect the role of
analysis. We will provide an application example which follows this
approach with its focus on the interplay and requirements for exploration
and confirmation. Our example will show how we deal with the
identification of subspaces of interest (exploratory) in very high
dimensional datasets, and how validation (confirmatory) of the various
discovered and proposed hypotheses takes place.
9:05am Visualizing the
Genome: Techniques for Presenting Human Genome Data and Annotations
Dr. Ann Loraine, Bioinformatics Scientist, Bioinformatics
Department, Affymetrix, Inc.
To get maximum benefit from genomic sequence data and annotations,
biologists need visualization tools that present the data in an intuitive,
interactive format. This talk will cover genome display techniques we
developed at Affymetrix that support rapid and efficient visual inspection
of complex genomic scenes. Some of these include one-dimensional zooming
to show sequence data alongside gene structures; color-coding exons to
indicate translation frame; and display of protein annotations in the
context of genomic sequence to show how alternative splicing impacts
conserved functionally important motifs in the encoded proteins. Using
genome display software we developed, I will demonstrate how these
techniques make answering basic questions about human gene structures easy
to accomplish.
9:35am Methods for Analysis
and Visualization of SNP Genotype Data for Complex Diseases
Dr. Anya Tsalenko, R&D Scientist, Life Sciences Technologies
Laboratory (LSTL) at Agilent Labs
SNP markers are becoming central for studying genetic determinants of
complex diseases. Large SNP data sets collected in such studies call for
the development of specialized analysis tools. We present statistical
methods, visualization tools and algorithmic approaches to questions that
arise in pursuing correlations between individual SNPs as well as sets of
SNPs and sample properties. These methods are based on similar tools for
analysis of gene expression data.
10:05am Poster and Exhibit
Viewing, Refreshment Break
10:45am Combining Gene
Ontology with Sequence Similarity over Multiple Genomes
Dr. Andre Nantel, Research Officer, Biotechnology Research
Institute, National Research Council of Canada
Sequence similarity is the main source of information available during
the annotation of novel genes. We have been using the E-values from whole
genome/proteome Blast comparisons to visualize sequence similarities
between a reference organism (S. cerevisiae, C. albicans, C. jejuni) and a
large number of other eukaryotic and prokaryotic organisms. These datasets
can then be correlated/ explored along with additional information related
to gene ontologies and transcriptional profiling
(see http://206.167.190.233/ biovis/).
11:15am Using Multiple
Visualization Types Together with Multiple Data Types to Support Fast
Decision Making in the Pharmaceutical Pipeline
Dr. Gavin Fischer, Application Scientist, OmniViz
The ability to use "abstract" visualizations to draw
attention to areas of interest, and more in depth visualizations to answer
focused questions, enables researchers to move from a large amount of data
to the one (or few) records they are interested in. To support this,
visualizations need to support any type of data so that fundamental
judgments about the relationships within the data can be exposed. OmniViz
has the ability to use disparate data types both in unique overview
visualizations (e.g. GalaxyÔ view) that give broad perspectives and in
specialized visualizations that address specific questions (e.g.
correlation tool).
11:45am Panel Discussion
| 12:15 Luncheon
|
 |
|
CLOSING PLENARY SESSION:
EFFECTIVE DATA MANAGEMENT FOR DRUG DISCOVERY
1:45pm Chair's Remarks
Dr. Georges Grinstein
1:50 Information Pathways
in Pharmaceutical R&D
Dr. Otto Ritter, Associate Director, Bioinformatics, Enabling
Science and Technology, AstraZeneca R&D Boston
Molecular pathways are useful models for representing biological
processes at the cellular level. Pathways are usually represented as
graphs, where molecules or molecular complexes are the nodes, and
molecular interactions are the edges. In one step of generalization, where
we take any biomedical entities as nodes and any general relationships as
edges, we get an associative network as a representation of biomedical
knowledge. If we take one more step in this generalization process and
include any information assets and any transformations or associations, we
get information pathways as models representing (pharmaceutical) R&D
processes. It's useful to know that at all three levels of interpretation
(molecular, biomedical, R&D), we can actually re-use the same software
components for data management, analysis, and visualization.
2:10 Standards to Enable
Information Integration
Dr. David Benton, Director, Knowledge Integration and Discovery
Systems, Informatics and Knowledge Management, R&D IT, GlaxoSmithKline
Information integration is widely acknowledged to be both a great need
and a major challenge facing pharmaceuticals R&D. The principal
obstacle to information integration systems is heterogeneity at virtually
all levels of the pharma information stack. This talk will address the
sources of this heterogeneity, question the premise that any technical
solution can solve the problems posed by heterogeneity, and propose that
any non-trivial information integration will require shared ontologies and
domain models. It will also address whether such shared ontologies and
domain models can be: (1) developed entirely in-house; (2) acquired from
vendors; or (3) developed as open standards by the R&D community.
2:40 Poster and Exhibit
Viewing; Refreshments and Desserts Served
3:30 Microarray Gene
Expression Analysis and Data Integration
Dr. Heng Dai, Senior Scientist, Bioinformatics, Drug Discovery,
Johnson & Johnson PRD
Data integration and interpretation remains one of the major
challenges in microarray data analysis. It is essential to integrate data
from diverse resources including molecular annotation, expression,
pathway, disease and pharmacological databases. We have developed methods
and tools to effectively integrate this data into a central database,
which can be easily accessed through a web interface. This data can be
further analyzed with data mining and visualization tools, such as Omniviz,
to identify novel interactions and associations between medical,
biological and chemical entities.
4:00 New Challenge for Drug
Discovery Informatics: Information and Knowledge Integration
Dr. Abdel Laoui, Head, Chemoinformatics, Aventis Pharmaceuticals
The new issue in the pharmaceutical industry is to develop new drug
discovery informatics solution designed to deal with the challenges
emerging in today's data-rich environment - challenges arising from the
volume, diversity, and variable quality of data being generated. Data
pipelining has emerged as a practical technology for accelerating the
discovery process. The companies that will be successful will be those
that can bridge the gap between Bioinformatics and ChemoInformatics
quickly. At Aventis we have implemented a new paradigm in drug discovery
informatics which we call Chemical Biology. We will present this
integrated approach which is multidisciplinary and knowledge based with
the corresponding new enabling technology.
4:30 Speaker to be
announced
5:00 Panel Discussion
5:30 Close of Conference
 |
| There
are many sponsorship opportunities for your company to maximize its
exposure and influence. They include conference-specific sponsorships,
technology workshops, networking receptions, delegate bags, etc. We
are also ready to work with you in customizing a solution to meet your
specific marketing objectives. Make a lasting impression by taking
advantage of these marketing tools.
For exhibit and sponsorship
information, please contact Carol Dinerstein at 781-972-5471 or dinerstein@healthtech.com.
|
TRAVEL INFORMATION
Special Airline Discounts Available
Special Zone and Discount Fares have been established for this conference with
United Airlines. Please call United Airlines Meeting Reservation Desk at
800-521-4041 and reference ID#579YS.
HOTEL INFORMATION
Wyndham Baltimore Inner Harbor
101 W. Fayette Street
Baltimore, Maryland 21201
T: 410-752-1100 o F: 410-752-0832
Cut-off date: August 29, 2003
$179 single/$199 double occupancy
Please call the hotel directly to make your room reservation. Identify yourself
as a Cambridge Healthtech Institute conference attendee to receive the reduced
room rate. Reservations made after the cut-off date or after the group room
block has been filled (whichever comes first) will be accepted on a
space-and-rate-availability basis. Rooms are limited, so please book early.
CALL FOR POSTERS
Cambridge Healthtech Institute encourages attendees to gain further
exposure by presenting their work in the poster sessions. Please fill out the
registration form, with the poster title and primary author. To ensure
inclusion in the conference CD, a one-page summary must be submitted and
registration must be paid in full by August 22, 2003. Click
here for poster instructions
|