THURSDAY, AUGUST 25
7:30 Registration and Morning Coffee
8:30 Chair’s Opening Remarks
Dr. Georges Grinstein, Professor and Director, Center for Biomolecular and Medical Informatics, University of Massachusetts-Lowell
8:40 Standardizing Design, Analysis, and Public Access, and Use of All Exon Arrays
Dr. Eric P. Hoffman, Director, Research Center for Genetic Medicine, Children’s National Medical Center
|| Appropriate and consistent approaches to experimental design and data interpretation have begun to emerge, including appropriate quality control and standard operating procedures. In addition, public access databases are maturing both in the utility of user analysis tools, and the integration of different sites. Finally, Affymetrix arrays including probe sets for all known or predicted exons have been developed, and a project utilizing these for a genome-wide survey of alternative splicing will be presented.
9:15 Differential Expression: What a Crock!
Dr. Glenn Stone, Research Statistician, Mathematical and Information Sciences, CSIRO
To date most of the analysis of microarray data has focused on differential expression, or used differential expression measures to filter genes of interest. The hypothesis testing approach commonly used and the over-emphasis of FWER and FDR has a number of drawbacks, not least that generally it ignores the (massively) multivariate nature of the data. We shall consider these issues and suggest some different approaches.
9:45 Toward Metrology in Gene Expression Microarray Experiments
Dr. Z.Q. John Lu, Mathmatical Statistician, Statistical Engineering Division, NIST
NIST’s new gene expression metrology project aims to deliver measurements, data, and standards that will help improve the quality, comparability, and understanding of microarray experimental data. Because of the many facets that contribute to data quality in microarray data, a systematic and holistic metrological approach is called for. Among the several subcomponents of this ongoing effort, I will focus especially on some key statistical issues in signal extraction, assessment of repeatability / reproducibility from high-dimensional data, and the seemingly conflicting sample size / data dimensionality dilemma.
10:15 Coffee Break
10:30 Correlation Between Gene Expression Levels and Empirical Bayes Methodology in Microarray Data Analysis
Dr. Andrei Yakovlev, Chair, Biostatistics and Computational Biology, University of Rochester
Stochastic dependence between gene expression levels in microarray data is of critical importance for the methods of statistical inference that resort to pooling test statistics across genes. By applying resampling techniques to simulated and real biological data sets, we have studied a potential impact of the correlation between gene expression levels on the statistical inference based on the empirical Bayes methodology. We report evidence from these analyses that this impact may be quite strong, leading to a high bias and variance of the number of differentially expressed genes. This study also pinpoints specific components of the empirical Bayes method where the reported effect manifests itself.
11:00 At What Scale Should Microarray Data be Analyzed?
Dr. Hui-Rong Qian, Research Scientist, Genomic and Molecular Informatics, Eli Lilly & Company
Microarray data, for example, Affymetrix’s MAS5 signals, are very often transformed to satisfy the model assumptions such as normality and homogeneity in variance before statistical analysis. Log-transformation has been widely used to reach a more normal look of microarray data across all the genes on the array. However, it is not well-known how the statistical assumptions are satisfied/violated on a gene-by-gene basis, nor the impact of not using an optimum scale in the analysis. In this presentation, we investigate the distributional properties of the Affymetrix GeneChip signal data across all the genes as well as within a particular gene, and explore the impact of several commonly used transformation scales on microarray data analysis.
11:30 Interactive Panel Discussion with Morning Speakers
12:00 Lunch (on your own)
(Technology Workshop Sponsorships Available)
1:30 Chair’s Remarks
Mr. Thomas J. Downey, President, Partek, Inc.
1:35 Robust Classification Modeling on Microarray Data Using Misclassification Penalized Posterior
Dr. Jae Lee, Associate Professor, Public Health Sciences, University of Virginia
Genome-wide microarray data are often used in challenging classification problems of clinically relevant subtypes of human diseases, but the identification of a robust prediction model that performs consistently well on future independent data has not been successful due to the biased model selection from an extremely large number of candidate models during the classification model search and construction. Furthermore, common criteria of prediction model performance such as classification error rates do not provide a sensitive measure for evaluating performance of such astronomic competing models, and even though several different classification approaches have been utilized to tackle such classification problems, no direct comparison on these methods has been made. We introduce a novel measure for assessing the performance of a prediction model, the misclassification-penalized posterior
(MiPP): the sum of the posterior classification probabilities penalized by the number of incorrectly classified samples. Applying MiPP to several popular classification methods, we implement a forward step-wise cross-validated procedure to find the final robust prediction models and evaluate their objective classification performance based on a completely independent (future) data set as suggested in Ambroise and McLachlan (2002). Our
MiPP-based stepwise cross-validated discriminant approach enables us to identify robust prediction models only with a few genes on well-known microarray data sets, which show superior performance to other models in the literature that often have more than 50—100 features in their model construction.
2:05 Evaluating Methods for Classifying Expression Data
Mr. Michael Man, Associate Director, Nonclinical Statistics, Pfizer
Using expression data of biomarkers to predict drug efficacy or safety is an increasingly important application of expression technologies. We evaluated the relative performance of several classification methods for building predictive models by applying these methods on twelve expression datasets in two scenarios of biomarker applications. Partial least squares discriminant analysis and support vector machines perform well in these datasets. A practical approach is discussed to take advantage of multiple methods in biomarker applications.
2:35 Significance Analysis of Function and Expression (SAFE): Honest Hypothesis Testing for Pathway Involvement in Microarray Studies
Dr. Fred Wright, Associate Professor, Biostatistics, University of North Carolina
Microarray analysis is increasingly moving beyond the study of individual genes to studying pathways and broader biological phenomena. Until recently, pathway analysis of microarray results has been limited to post hoc analyses of significant gene lists. We describe the SAFE procedure, a powerful 2-stage permutation-based method that enables discovery of pathway relationships while maintaining careful control of false positive rates.
3:05 Microarray Analysis of Human Whole Blood Total RNA Following Alcohol Consumption
Dr. Dennis Burian, Team Lead, Functional Genomics Group, Civil Aerospace Medical Institute, FAA
We performed microarray analysis on whole blood total RNA from six human subjects before and at four blood alcohol levels after alcohol ingestion. Amplified RNA was hybridized to Affymetrix hgU133A plus2.0 GeneChips®. Raw CEL file data were imported to S-PLUS® / S+ArrayAnalyzer® for differential expression analysis across blood alcohol levels. Ontology information for differentially expressed genes and pathway analysis was performed with Ingenuity Pathway Analysis. Our analysis has led to initial discoveries that allow us to begin teasing out molecular markers for mild alcohol use.
3:35 Refreshment Break, Poster and Exhibit Viewing
4:30 Deciphering the Data Deluge Discussion Groups
Time has been designated for one hour focus groups centered around specific themes. These moderated discussions encourage brainstorming and interactive problem solving between scientists who share a common interest in the discussion topic. This unique opportunity allows conference participants to focus on a topic in depth to exchange ideas, information, experiences, and develop future collaborations.
Examples of discussion topics:
• Pathway analysis using microarray gene expression data
• From fishing with rods to fishing with net(work)s: Accuracy, information content and predictive power of microarray measurements
• Designing effective measures of data quality
• Outliers: error or valid result
• The search for consensus in gene annotation
• Follow-up studies after data clustering
5:30 Pizza and “Micro” brews
Networking Reception in the Exhibit Area
6:30 Close of Day One
and Exhibit Opportunities
Contact: Suzanne Caroll, 617-6301353