i2b2: Informatics for Integrating Biology & the Bedside - A National Center for Biomedical Computing
Driving Biology Projects Airways Diseases

Genetics and Pharmacogenetics of Airways Disease

DBP Investigators

Principal Investigator:  Ross Lazarus MB.BS, MPH, GDipCompSci.

Co-Principal Investigator:  Scott T. Weiss MD, MS
Co-Investigators:  Lynn Bry MD, PhD and Bernard Kinane MD

Current Status:  Completed 2010


1. What questions did we ask in this DBP?

How can we identify genetic variation that affects individual risk of complex diseases such as asthma and how can we translate that knowledge into better prevention and better disease management. How can we identify genetic variation that influences the safety and effectiveness of the various drugs used to treat asthma and how can we translate that knowledge into better treatment with less risk of adverse side effects?

2. How did i2b2 help us answer these questions?

i2b2 will provide our team with access to a specialized data mart containing data on more than 97,000 patients with asthma, extracted from the Partners Research Patient Data Repository (RPDR). NLP experts from Core 1 will extract important phenotypes that are not otherwise directly available from the RPDR, to help us identify and then recruit appropriate asthmatic patients for study, including obtaining DNA for subsequent genotyping.

3. What tools were developed from our work that will be of value to others?

Novel methods for viewing and exploring large collections of clinical data will be created by Core 2; novel tools for applying NLP methods to large clinical data resources will be created by experts from Core 1. All of these tools will be created specifically in response to our research needs, but they will all be generalized into the i2b2 CRC where they will be available to other researchers.

4. What new clinical discoveries arose from our work?

We are working toward discovering methods to predict the specific medications that will do the most good with the least harm in treating individual patients with asthma, and more generally with other diseases. Our work is also focused on developing tools and methods that clinician researchers can use in their own research on other clinical problems to more quickly translate basic research findings into clinical practice.

Background and Specific Aims:

The biological problem that drives our proposal is the interplay between environmental exposures and genetic variation in determining both individual airways disease risk and individual response to airways disease medications. Asthma and chronic obstructive pulmonary disease (COPD) are common diseases of the airways, directly affecting 1 in 5 Americans and imposing a substantial economic burden, including health care costs of approximately 36 billion dollars per year(1). The high prevalence and serious health effects of these common diseases imply that improvements in treatment and prevention have the potential to make a dramatic impact. Tools and methods currently available were successful for monogenic disorders, but have not yielded major breakthroughs in complex diseases to date. The work we will perform as part of the Informatics for Integrating Biology and the Bedside (i2b2) project will lead to the development and implementation of methods and tools to improve genetic epidemiological and pharmacogenetic research in complex disease.

1. Develop and implement methods to detect distinct sub-phenotypes in asthma.

We hypothesize that information gleaned from extensive, longitudinal clinical data collections will reveal subtle, previously unrecognized subgroups of patterns in the natural history and intermediate phenotypes of these diseases, potentially associated with underlying genetic variation, and that recognition of specific sub-phenotypes will prove to be of clinical, prognostic and pharmacogenomic importance in subsequent focused genetic studies and pharmacogenetic clinical trials. In collaboration with Cores 1, 2 and 4, we will develop and apply natural language processing, machine learning, classification and related data mining methods to large clinical databases, to search for and to catalog variants of these two clinical disease phenotypes.

2. Apply the methods developed in specific aim 1 to the experimental design of genetic and pharmacogenetic research.

We hypothesize that decreased phenotypic heterogeneity among cases will lead to greater statistical power to detect association. In collaboration with Cores 1, 2 and 4, we will apply and generalize the techniques and tools for identifying phenotypically homogenous subsets of patients developed in Specific Aim 1 for the identification of potential subjects suitable for recruitment into specifically targetted genetic and pharmacogenetic studies, while ensuring appropriate safeguards for patient privacy and medical record confidentiality.

3. Develop and apply bioinformatics methods and tools that identify high-yield, potentially functional subsets of single nucleotide polymorphisms (SNPs) to the SNP selection phase of the design of genetic and pharmacogenetic experiments.

We hypothesize that we can predict asthma exacerbations (hospitalizations and ER visits) with a combination of clinical and genetic variables.

4. Utilize our newly developed i2b2 resources asthma cases and controls as an externally generalizable population for Genome Wide Association Studies of predictors of asthma exacerbations.

We have developed genetic and non-genetic tests to predict asthma exacerbations and we will use the i2b2 asthma cases and controls to validate these newly developed predcitive tests.


· Using the Partners Healthcare Research Patient Data Registry (RPDR) of 4+M patients, we have identified a cohort of 97,000 asthmatic patients; records from these patients have been parsed for coded data and variables such as smoking history, comorbitdies, and medication history extracted by Natural Language Processing of unstructured text.

· To enable identification of the subpopulation experiencing frequent exacerbation despite treatment with inhaled corticosteroids, we used data from a longitudinal clinical study of childhood asthma to build a predictive model for application to the RPDR cohort. Because the proportion of variance in asthma exacerbation risk was small, however, and many of the controlled variables measured in the clinical trial are not available from routine clinical records, we have concluded that extensive phenotypic data will not serve as useful predictors of exacerbations in this study.

· Using hospitalization frequency as a measure of exacerbation, we have selected a cohort of 2,000 asthmatics assumed to be refractory to standard therapy (high service users) and 2,000 controlled asthmatics (low service users) for genome-wide association study to determine if the extreme subphenotype can be explained by genetic variation.

· We are employing a novel system (“Crimson”) developed at the Pathology Department at Brigham and Women’s Hospital to collect anonymized, discarded blood samples from these patients for genome wide association study (funding applied for). This IRB compliant system is yielding thousands of samples a year.

· Mining of the clinical data repository for phenotypic variables has driven development and refinement of the i2b2 Clinical Research Chart elements, including a comprehensive datamart, new visualization tools, and an NLP processing suite.

· Dr. Lazarus has made significant progress in developing new tools for managing, analyzing and visualizing whole genome association data; these are freely available at  http://bioconductor.org.

Publications arising from this work:

  1. Zeng Q.T, Goryachev S., Weiss S.T., Sordo M., Murphy S.N., Lazarus R. “Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system”, BMC Medical Informatics and Decision Making, 2006 6:30.
  2. Carey V.J., Morgan M., Falcon S., Lazarus R., Gentleman R. “GGtools: analysis of genetics of gene expression in Bioconductor”, Bioinformatics. 2007 23(4):522-523.
  3. Warnes G., Chaselow S., Montana G., O’Connell M., Henderson D., Jain N., Qiu W., Cheng J. and Lazarus R. “The RGenetics Project”, Paper presented to the UseR conference, Vienna, Austria, August 2006.
  4. Lazarus R., Raby B., Qiu W, Silverman E.K.,Weiss S.T., "Quantifying the effects of allele frequency differences and allelic phase on LD captured by tag SNPs derived from incompletely ascertained data: Theoretical basis, models and impact on LD mapping", Oral presentation and Proceedings of the American Society of Human Genetics Annual Meeting 2006, New Orleans.
  5. HImes B.E., Kohane I.S., Ramoni M.F., Weiss S.T.  Characterization of patients who suffer asthma exacerbations using data extracted from electronic medical records.  AMIA Annu Symp Proc.  2008 Nov 6; 308-12.  PMID:18999057.
  6. Himes B.E., Day Y., Kohane I.S., Weiss S.T., Ramoni M.F.  Prediction of Chronic Obstructive Pulmonary Disease (COPD) in asthma patients using electronic medical records.  J Am Med Inform Assoc.  2009;308-12.
  7. Himes B.E., Klanderman B., Kohane I.S., Weiss S.T.   Assessing the reproducibility of asthma genome-wide association studies in a general clinical population.  J Allergy Clin Immunol.  2011 Apr;127(4):1067-9.

[ back to top ]
Home | Contact | Sitemap | Search
©2018 Partners Healthcare