Genome-wide association study for rheumatoid arthritis susceptibility genes
Principal Investigator: Robert M. Plenge, M.D., Ph.D.1,2
Co-Investigators: Katherine P. Liao, M.D.1, Soumya Raychadhuri, M.D., Ph.D.1, Elizabeth W. Karlson, M.D.1, Fina Kurreeman, Ph.D.1
1Divison of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital
2 Broad Institute of MIT and Harvard
Current Status: Completed 2011.
1. What questions did we ask in this DBP?
Could we develop a computational algorithm that uses existing clinical records to establish a collection of patients with rheumatoid arthritis (RA) with high sensitivity and positive predictive value?
RA is a chronic disease that affects up to 1% of the adult population. Certain clinical characteristics are quite specific to the diagnosis of RA, including seropositivity for auto-antibodies (e.g., CCP and RF), presence of radiographic erosions, and use of disease modifying anti-rheumatic drugs. We used structured data mining and Natural Language Processing (NLP) of unstructured text to define a collection of RA patients based on these specific clinical characteristics. A collection of controls were matched to RA cases based on demographic data. Subsequently, we used this patient collection (a) to collect DNA for the purpose of identifying genetic variants that influence risk of RA, (b) to investigate clinical outcomes such as response to treatment, and (c) to examine the co-existence of other autoimmune diseases, environmental risk factors (e.g., smoking) and co-morbid conditions.
2. How did i2b2 help us answer these questions?
The i2b2 team provided the technical expertise to perform mining and interpretation of electronic medical records, including NLP interrogation of written notes and development of a statistics-based selection algorithm, to identify RA patients and relevant clinical data. In addition, the i2b2 team contributed its expertise to the genetic association study and interpretation of results. The Workbench developed by i2b2 was used to review expeditiously the medical records of a subset of patients to determine sensitivity and positive predictive value.
3. What tools did we anticipate would be developed from our work that will be of value to others?
Our project relied heavily on NLP text extraction utilities. Specifically, we required tools to extract information on auto-antibody status, radiographic erosions, response to medications, and environmental exposures (e.g., smoking) (HiTex suite). A validated computational algorithm to identify RA patients and matched controls from electronic medical records was developed that has been subsequently demonstrated to scale accurately to other sites. Our team worked closely with the software infrastructure group to develop and test the Workbench, which once available greatly enabled end users to annotate patient records.
4. What new clinical discoveries do we anticipate may arise from our work?
Our primary goal is to identify new genetic variants that influence risk of RA. Our secondary goals include: (a) identify genetic variants that predict response to powerful, yet potentially toxic medications (e.g., methotrexate and anti-TNF therapy), (b) identify gene-environment interactions, and (c) examine the epidemiology and genetics of multiple autoimmune diseases within individual patients.
Background (As originally defined):
Rheumatoid arthritis (RA) is a chronic inflammatory disease with a worldwide prevalence of up to 1% in the adult population. While the root cause remains unknown, it is clear that inherited DNA variation contributes to more than half of risk to disease. To date, genetic variation within 5 genes has been associated unambiguously to RA susceptibility (HLA-DRB1, PTPN22, 6q23/TNFAIP3, STAT4, TRAF1-C5), but together explains less than half of the overall genetic burden. These alleles primarily influence risk in a subset of RA patients marked by the presence of auto-antibodies against cyclic citrullinated peptides (CCP). To identify additional RA susceptibility genes, several large-scale genome-wide association (GWA) studies are underway in anti-CCP+ RA patients. Initial analyses of GWA studies conducted by us and our collaborators demonstrate: (a) common alleles that influence risk of RA can be identified using the GWA approach, and (b) additional patient samples are required to identify common genetic variants of modest risk (odds ratios (OR) 1.10 – 1.50 per copy of allele).
The current i2b2 proposal is to identify a collection of RA patients from the Partners Health System (including Brigham and Women’s Hospital and Massachusetts General Hospital) with high sensitivity and positive predictive value. Once identified, we will collect DNA on all patients and conduct a GWA study on a subset of patients with the most severe disease. The simple yet ambitious goal is to identify new genetic variants that influence RA susceptibility. Our approach will utilize existing i2b2 infrastructure: we will identify RA patients using the Partners Health System’s Research Patient Data Registry (RPDR); we will identify which subset of patients are CCP+ and have severe disease using NLP text extraction utilities; and we will work closely with i2b2 researchers to refine our phenotype search criteria to maximize the number of homogenous RA samples collected during a two-year period. Furthermore, we will leverage genomic resources of the Broad Institute of MIT and Harvard to extract and store DNA, and to genotype up to 750 patients using the Affymetrix 6.0 GeneChip.
Specific Aim 1: Establish a carefully phenotyped collection of RA patients from the Partners clinical database using structured information and NLP to search for features that indicate severe disease (e.g., medication history, radiographic erosions, CCP level) and significant environmental exposures (e.g., smoking). Controls will be matched to each case based on demographic data. Initial estimates suggest that up to 4,000 RA patients are available in the PHS.
Specific Aim 2: Collect DNA and plasma on all RA patients using the innovative Crimson biorepository system. Colleagues at Brigham and Women's Hospital have established an IRB-approved program that enables matching of anonymous patient identifiers with “discarded” blood samples collected during routine care. We will collect blood samples on our targeted RA case-control collection to measure serum auto-antibodies and to extract DNA for genotyping. Our genetic studies will focus on CCP+ RA patients to minimize heterogeneity in collection of patient material and to maximize our power to detect true RA susceptibility variants.
Specific Aim 3: Conduct a genetic study of established RA susceptibility variants to validate our sample collection and to estimate clinical prediction in an unbiased collection of RA patients. We will use the Illumina BeadExpress platform to genotype 384 SNPs in up to 3,000 case-control samples. For each susceptibility allele, we will calculate the OR and compare to the published literature. We will calculate a genetic burden score based on the OR, and determine the genetic risk profile of each RA patient. We estimate that a small fraction of patients (~1% of all RA patients) will have up to a 20-fold increased risk of RA based on this genetic burden score.
Specific Aim 4: Measure autoantibodies in all RA patients and a subset of controls to determine if non-RA autoantibodies are increased in frequency in RA patients. In Aim 3 we will genotype not only RA risk alleles, but risk alleles associated with other autoimmune diseases (e.g., lupus, type 1 diabetes). In this Aim we will correlate genotype to disease-specific autoantibodies.
New Grants arising from this work:
- PGRN U01 GM092691: Genetic predictors of response to anti-TNF therapy in rheumatoid arthritis, PI Robert Plenge, MD, PhD.*
- K08 AR060257: Genetic Predictors of Coronary Artery Disease; PI Katherine Liao, MD.
- R01 AR04880: Clinical Risk Prediction Modelling in RA; PI Elizabeth Karlson, MD.
Publications arising from this work:
- Liao KP, Cai T, Gainer V, Goryachev S, Zeng-Treitler Q, Raychaudhuri S, Szolovits P, Churchill S, Murphy S, Kohane IS, Karleson E, Plenge R. Utilizing electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res. 2010;62(8):1120-1127. doi:10.1002/acr.20184.
- Kurreeman F, Liao K, Chibnik L, Hickey B, Stahl E, Gainer V, Li G, Bry L, Mahan S, Ardlie K, Thompson B, Szolovits P, Churchill S, Murphy SN, Cai T, Raychaudhuri S, Kohane I, Karlson E, Plenge R. Genetic basis of autoanitbody posivite and negative Rheumatoid Arthritis risk in a multi-ethnic cohort derived from Electronic Health Records. Am J Human Gen. 2011;88:57-69. doi:10.1016/j.ajhg.2010.12.007.
- Carroll RJ, Thompson WK, Eyler AE, Mandelin AM, Cai T, Zink RM, Pacheco JA, Boomershine CS, Lasko TA, Xu H, Karlson EW, Perez RG, Gainer VS, Murphy SN, Ruderman EM, Pope RM, Plenge RM, Kho AN, Liao KP, Denny JC. Portability of an algorithm to identify rheumatoid arthritis in electronic health records. J Am Med Inform Assoc. 2012 June 1;19(e1):e162-9. PMID:22374935. *In conjunction with this new award.