User Tools

Site Tools


This is an old revision of the document!



Hua Xu, Jon Duke, George Hripcsak, Nigam Shah, Noemie Elhadad, Karthik Natarajan, Anupama Gururaj


  1. Updates from Annual meeting
  2. IRB for use of clinical text
  3. Clinical text data storage and representation schema
  4. NLP tools/pipelines for ETL
  5. Use cases, e.g, phenotyping for cohort selection using NLP outputs
  6. Discussion


  1. Updates from Annual meeting
    • Extensive interest from the OHDSI community with reference to the text processing aspect. During the meeting, suggestions for improvements in the current projects were received.
  2. IRB for use of clinical text
    • IRB language pertaining to textual part of the record is being compiled from multiple sources.
    • Anu will collect and generate a generic document for use as an example.
    • Once approval of the document is obtained from the contributors, the document will be posted online for use by the OHDSI community.
  3. Clinical text data storage and representation schema
    • Minimum set of modifiers for all clinical entities that support use of rule to derive clinical concepts will be generated by Alex (Columbia).
    • To classify the notes for the representation schema, metadata about the notes with note-type defined in detail and mapped to LOINC codes will be generated.
    • Note types from different institutions will be collected. George will share hierarchical note type metadata. Also, we will collect note type metadata from Josh Denny at Vanderbilt. All the collected material will be aggregated by Karthik.
  4. NLP tools/pipelines for ETL
    • The plan is to develop a set of wrappers for multiple NLP tools (currently cTAKES and MetaMap) for conversion of output to the OHDSI textual data schema.
    • In order to get an idea of the updates in cTAKES< need to invite Guergana Savova to present and do a demo of cTAKES during the January call.
    • In order to prioritize the work, focus on positive concepts first for high confidence extraction of NER from text.
  5. Use cases, e.g, phenotyping for cohort selection using NLP outputs
    • To define the syntax for storing phenotypes, two aspects can be considered:
      1. set of data elements or features on which an algorithm functions
      2. formulation of the phenotype definition
    • In order to represent the NLP output, query-based phenotyping will be the first focus of the group.
    • For machine-learning based algorithms, the NLP output will be accessed outside of the CDM
    • Is ElasticSearch a good first step in this area? ES should be considered here as a tool more for cohort building and selection rather than phenotyping. For this purpose, it is a good starting point.
    • Finding patients for clinical trials will be used as a usecase here. The ES could serve as an explorer for feature selection in the phenotyping process.
    • Action item: Simple search set up for MT samples by next meeting by Min.
    • Use MIMICII and MIMICIII as demo datasets for the tools being developed by the group
  6. Discussion
projects/workgroups/wg_meeting_11042015.1448902881.txt.gz · Last modified: 2015/11/30 17:01 by anu_gururaj