2009 NLP
Shared Task

Announcement of Data Release and Call for Participation

Third i2b2 Shared-Task and Workshop
Challenges in Natural Language Processing for Clinical Data
Medication Extraction Challenge

Data Release: 1 June, 2009
Evaluation: 17 August, 2009 (9:00am EST) to 19 August, 2009 (11:59pm EST)
Workshop: 13 November, 2009 in San Francisco, CA

Organizer: Informatics for Integrating Biology and the Bedside, i2b2, a National Center for Biomedical Computing

Medication extraction challenge aims to encourage development of natural language processing systems for the extraction of medication-related information from narrative patient records. Information to be targeted includes medications, dosages, modes of administration, frequency of administration, and the reason for administration. In order to encourage the development of semi- and un-supervised systems for medication extraction, the development data for the medication extraction challenge will be distributed unannotated. Participants will be allowed to create their own annotations. For this purpose, annotation guidelines and sample annotated records will be provided.

The challenge opens to registration on April 1, 2009. Development data for the challenge will be released in June. Test data are scheduled to be released for only three days and will be used only for evaluation purposes. The results of the challenge will be presented at the workshop organized by i2b2.

Data for the medication extraction challenge will be released under a Data Use Agreement. Obtaining the data requires completing a registration and signing the Data Use Agreement. Downloading the data implies commitment on the part of the downloading team to participate in the medication extraction challenge. Data can be kept and used for research purposes beyond the duration of the challenge.

Evaluation Dates, File Formats, and Evaluation Metrics.

The medication extraction challenge is inspired by the Question Answering track of Text Retrieval Evaluation Conference (TREC) of NIST. Following the standards of NIST, evaluation will be on the test data and evaluation metrics will resemble those of NIST. Participating teams are asked to stop development as soon as they download the test data. Each team is allowed to upload (through this website) up to three system runs. System output is expected in the form of standoff annotations, following the exact format of the ground truth annotations to be provided by i2b2.

Test data will be annotated by the challenge participants. After uploading their system outputs to the i2b2 website, each team will be asked to annotate 10 records/person. Multiple annotations for each record will be obtained before finalizing the ground truth. Downloading the training data constitutes commitment on the part of the challenge participants to annotate 10 records/person from the test data.

Authors of top performing systems will be invited to present or demo their systems at the workshop. A journal special issue will be organized for a subset of the top ten systems.

Tentative Schedule
April 1, 2009   Registration Open
1 June, 2009   Development Data Release
17 August, 2009   Test Data Release at 9am EST
19 August, 2009   Test Data due at i2b2 at 11:59pm EST
20 August, 2009   Annotation Allocation at 9am EST
4 September, 2009   Annotation Collection at 11:59pm EST
13 November, 2009   Workshop and Announcement of Results

Organizing Committee:

Ozlem Uzuner, Chair, SUNY at Albany
Middle East Technical University Northern Cyprus Campus
Imre Solti,University of Washington
Peter Szolovits,MIT CSAIL
Isaac Kohane, Partners HealthCare

Please see the FAQs and announcements for more information. Questions on the challenge can be addressed to Ozlem Uzuner, i2b2nlp@albany.edu.


