CHIís Drug Discovery and Development Map:
A Framework for Applications of Genomics and Proteomics to Drug Discovery

The last decade has been marked by an unprecedented boom in the number and variety of technologies used to discover and develop new drugs. Many of these new technologies arose from work surrounding the Human Genome Project, and we at Cambridge Healthtech Institute have been in the enviable position of having a front row seat to these remarkable developments. To support us in tracking and analyzing the progress of these technologies and their applications, weíve developed a dynamic, integrated framework that we call our Drug Discovery and Development Map.

This framework is shaped by a particular view taken by researchers seeking to describe, measure, understand, and ultimately predict biology. These tasks require the generation of data, with the nature of the data depending upon the specific types of questions being asked. Our framework focuses on two key components, genes and proteins, and considers five major types of data for each one. The five types of data encompass structure, expression, variation, function and integration.

A wide variety of tools and assays are used to generate this range of data, with several different approaches used for any one type of data, and some tools used for more than one type of data. As shown in the map, the last column under "Biology" represents data interpretation, with examples given for each of the ten categories. In each cases there are actually two subsets of interpretation; both databases containing that type of data and bioinformatic algorithms for comparison, analysis, data mining and other purposes. To make the map more readable, this level of detail has not been broken out.

Throughout this time period, researchers have studied all ten types of data, but there have been clear shifts in patterns of emphasis. Some of these shifts have resulted from technological progress that allowed higher throughput or better results to be achieved, while in other cases, the maturation of one area has served as the foundation for increased emphasis in others. For example, seven or eight years ago the greatest emphasis was on gene sequences. In fact, since genomic DNA sequencing was considered by many to be too formidable a project at that stage, it was cDNA, or the expressed genome, that drew their attention. As progress was made toward obtaining data for nearly all expressed gene sequences, increased attention shifted into three key directions. For those involved in the Human Genome Project, the effort of gaining sequence information at the level of the entire genome began to ramp up. For others, the next focus was still in genomics, but beyond the static genome and into the dynamic patterns of gene expression. For still others, the key field for further development was proteomics, where the ultimate actors in health and disease can be studied comprehensively.

The rapid development of microarray technology spurred gene expression profilingís rise in prominence.As sequencing efforts matured, it also became clear that beyond achieving a consensus sequence for the human genome, lay the even larger challenge of understanding the basis of human genetic variability (particularly with regard to single nucleotide polymorphisms, or SNPs). This attention to SNP genotyping, which began just a few years ago, was greatly aided by the foundation laid with gene sequence data.

All three of these genomic segments, sequencing, gene expression profiling and genotyping, greatly benefited from technological advances that resulted in dramatically higher throughputs and rapid declines in cost. For some other areas it has proven to be more of a challenge to achieve performance improvements of the same magnitude. Defining the function of each gene, or more specifically, the protein it encodes, has been particularly challenging. Itís just not yet possible to study gene function with the type of "massively parallel" approaches used in those other fields.

For proteins, amino acid sequences are comparable to nucleic acid sequences for genes, but the richer data is 3-D conformational structure, since it provides much better clues as to biological activity. While the techniques for determination of protein conformational structure have improved, this process still remains relatively quite slow. Other areas of proteomics have also labored under the lack of comparable technical breakthroughs to replace difficult 2-D gel analysis, even though promising new approaches, including protein arrays, are under development. There has been a huge increase in the throughput for one type of functional proteomics, protein-protein interaction studies based on yeast 2-hybrid assays. This was fueled by high interest, and achieved by industrializing and automating the assays, rather than through any significant technical advance.

Ultimately, the new genomic and proteomics technologies are not just about generating reams of disparate bits of data, they aim to provide a unified view of complex biological systems. The first step in this process is generating gene networks from gene sequence and expression data. Such studies do not require new tools as much as sophisticated and comprehensive approaches to data compilation. Correspondingly, protein pathway studies pull together data about how changes in protein expression levels modulate the expression of other proteins in a cascade fashion. In our framework, integration at the protein level has been extended into systems biology, which can be described as the integration of genomic, proteomic and metabolic data.

Beyond discovery-oriented biology lies the actual development of marketable diagnostics and therapeutics. In the commercial realm, most of the value ascribed to genomic and proteomic technology and data is tied directly to the pharmaceutical industryís ability to translate that information into such products. For diagnostics, tests based on genes (mutations, SNPs), gene expression profiles and protein biomarkers are being added to the more standard diagnostics of clinical chemistry or immunoassays. Much of the impact of genomics on drug development thus far has been focused on the identification and validation of biological targets. While much of this research on targets is based only on comparisons of the biology of health and disease, sooner or later it becomes critical to integrate the activity of chemical compounds with the body.

There are two different ways in which chemistry comes into play-- in the form of chemical probes or as compounds being evaluated as potential leads or drugs. The use of chemical probes to elucidate biology is the basis of chemical genomics. A large series of compounds are individually introduced into cells, with the aim of identifying a cell that then undergoes a specific phenotypic change. By identifying the compound introduced into that cell, and then finding which gene or protein was bound by the chemical probe, the researcher succeeds in finding both a genetic link to a change in phenotype and a chemical probe that can cause that change to occur.

Genomics and proteomics can also be used in compound evaluation, by providing molecular details about the effect of a compound on the body. This approach may highlight mechanisms of action or toxicity, both of which can be critical for further compound optimization. The other way in which genomics and proteomics can be employed for drug development is through pharmacogenomics, which focuses on the relationship between drug responses and biological variation. Pharmacogenomics comprises the study of variations in targets or target pathways, variation in metabolizing enzymes (pharmacogenetics) or, in the case of infectious organisms, genetic variations in the pathogen. Finally, just as biological data has databases and tools for analysis that constitute bioinformatics, data about chemical structures and activities constitutes cheminformatics.

While this framework provides a relatively simple view of this complex field, it has proven to be quite useful for aiding visualization of relationships, identification of bottlenecks and prediction of trends. Cambridge Healthtech Institute has effectively used this framework to aid in the selection and structuring of conferences and reports, as well as for analysis in reports and consulting projects. Additional levels of detail can and have been added to this framework, such as mapping specific technologies to more detailed steps in the drug development process. Once such technology maps have been created it is then possible to map technology developers into each space. For more detailed explanation of this framework, or to view an example of a technology map.


Cambridge Healthtech Institute| Beyond Genome | Bio-IT World | Biomarker World Congress | Cambridge Health Associates | Discovery On Target |
Health-IT World
| Bio-IT World Conference & Expo  | Molecular Medicine Tri-Conference | PEGS| PepTalk | Pharma DD
World Pharmaceutical Congress |

Your  Life Science Network

Cambridge Healthtech Institute  |  250 First Avenue  |  Suite 300   |   Needham,  MA  02494
Phone: 781-972-5400  |   Fax: 781-972-5425