Announcement of Data Release and Call for Participation
2016 CEGS N-GRID Shared-Tasks and Workshop
on Challenges in Natural Language Processing for Clinical Data
    
Tentative Timeline 
Registration: begins May, 2016 
Data Release for Sight Unseen Track: 6th June 2016 
System Outputs Due for Sight Unseen Track: 10th June 2016 
Training Data Release: 11th June 2016 
Test Data Release: 10th August 2016 (12am Eastern Time) 
System Outputs Due: 12th August 2016 (11:59pm Eastern Time) 
Abstract Submission: 1st September 2016 
Workshop: 11th November 2016, Chicago, IL, USA 
Journal Submissions: TBD
  
  
 
The 2016 Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-Scale and RDOC Individualized Domains (N-GRID) challenge,
    a.k.a., RDoC for Psychiatry challenge, aims to extract symptom severity from neuropsychiatric clinical records. 
    Research Domain Criteria (RDoC) is a framework 
    developed under the aegis of the National Institute of Mental Health (NIMH) that facilitates the study of human behavior 
    from normal to abnormal in various 
    
    domains. The challenge goal is to classify symptom severity in a domain for a patient, based on information
    included in their initial psychiatric evaluation.
 
    This challenge will be conducted on initial psychiatric evaluations (1 per patient), which have been fully de-identified and scored by 
    clinical experts in a symptom domain. The data for this task is provided by Partners Healthcare and the Neuropsychiatric Genome-Scale and 
    RDoC Individualized Domains (N-GRID) project (HMS PI: Kohane; MGH PI: Perlis) of Harvard Medical School, and will be released under 
    a Rules of Conduct and Data Use Agreement. Obtaining the data requires completing the registration, which will start in May 2016.
 
 
    All data are fully de-identified and manually annotated for RDoC.
     
    
        The tracks 
The 2016 CEGS N-GRID challenge consists of three NLP tracks:
     
    
        Track 1: De-identification: Removing protected health information (PHI) is a critical step in making medical records accessible 
        to more people, yet it is a very difficult and nuanced task. This track addresses the problem of de-identifying medical records over a 
        new set of ~1000 initial psychiatric evaluation records, with surrogate PHI for participants to identify. We intend to run two versions 
        of the de-id track.
         
        - 
            Sight unseen track: this track involves running existing home-grown de-id systems on the RDoC data without any training and 
            modification to the systems, as a way of measuring how well the existing systems generalize to brand new data. The RDoC data will be 
            provided for this track without any gold standard training annotations and system outputs will be collected within 3 days of data 
            release.
        
  
         - 
            Regular track: this track will allow the development and training of de-id systems on the RDoC training data. Evaluation 
            will be on the RDoC test data.
        
 
     
    
    
        Track 2: RDoC classification:
        The goal of RDoC classification is to determine symptom severity in a domain for a patient, based on information included in their 
        initial psychiatric evaluation. The domain has been rated on an ordinal scale of 0-3 as follows: 0 (absent), 1 (mild=modest significance), 
        2 (moderate=requires treatment), 3 (severe=causes substantial impairment) by experts. There is one judgment per document, and one document 
        per patient.
     
    
        Track 3: Novel Data Use: The data released for this 2016 challenge are the first set of mental health records released to the 
        research community. These data can be used for mental health-related research questions that go beyond what is posed by the challenge 
        organizers. This Track is for participants who want to build on their existing systems, or the systems developed for Tracks 1 and 2, 
        with the aim of addressing new research questions.
 
     
    
        Evaluation Dates and Format 
The evaluation for the NLP tracks will be conducted using withheld test data. Participating teams are asked to stop development as soon as 
they download the test data. Each team is allowed to upload (through this website) up to three system runs for each of the tasks. System 
output is expected to be submitted in the exact format of the ground truth annotations to be provided by the organizers.
     
    
        Dissemination 
Participants are asked to submit a 500-word long abstract describing their methodologies. Abstracts may also have a graphical summary of 
the proposed architecture. The document should not exceed 2 pages, 1.5 line spacing, 12 font size. The authors of either top performing 
systems or particularly novel approaches will be invited to present or demonstrate their systems at the workshop. A special issue of a 
journal will be organized following the workshop.
     
    
        Organizing Committee 
Ozlem Uzuner, co-chair, SUNY at Albany 
Amber Stubbs, co-chair, Simmons College 
Michele Filannino, co-chair, SUNY at Albany 
Tianxi Cai, Harvard School of Public Health 
Susanne Churchill, Harvard Medical School 
Isaac Kohane, Harvard Medical School 
Thomas H. McCoy, MGH, Harvard 
Roy H. Perlis, MGH, Harvard 
Peter Szolovits, MIT 
Uma Vaidyanathan, NIMH 
Philip Wang, American Psychiatric Association 
     
    
        Contact 
Please see the announcements for more information. Questions about the challenge can be addressed to the organizers.
     
                                
                            
                             |