====== Minutes_Meeting_02032016 ====== ==== Attendees ==== Hua Xu, Jon Duke, George Hripcsak, Karthik Natarajan, Anupama Gururaj, Mark Khayter, Min Jiang, Alexandre Yahi, Noemie Elhadad, Juan M Banda, Olga Patterson, Lian Hu ==== Agenda ==== {{:projects:workgroups:nlp_wg_meeting_02032016_final.pdf|}} - Minimal Model Presentation – Alex - Note-type mapping Presentation – Karthik - Share existing ontologies from Vanderbilt (Hua) and Regenstrief (Jon) - Share strategies for combining data from different searches – Jon - Report on WG for commenting – Hua - Wrappers for cTAKES and Metamap – Min - Improvements to search engine set up using MT samples – Min - Textual Data Representation – Discussion - Goals of 2016 - Change of meeting time ===Minutes=== - Minimal model presentation - Alex {{:projects:workgroups:ohdsi_nlp_wg_yahi.pdf|}} - the model is based on the SHARE-N model and adapted to the current data structure. This model incorporates other semantic types and all of the modifiers are not available in cTAKES yet. - the notes were processed from eMERGE cohort at Columbia with about 60,000 notes encompassing 1700 patients. The original patient number was 3200. - In theory, a set containing the combination of minimal modifiers can be generated. Practically, can we trust the data enough to add it into OHDSI tables? - only highest confidence data (with maximum PPV) should be added to the tables. - Next steps: - Look at the note sections to determine the errors. - Work with Sunny to generate the NLP outputs for the phenotyping data - Evaluate by comparisons with structured data - Make the system more robust - Generate a protocol and/or annotation guidelines - Share the data as a Gold standard with manually annotated CUIs - Alex's script is to be tried on different datasets and evaluated across notes from different institutions - Identify minimal set of notes to work with when recommending to the OHDSI community - Identify sets of concepts that are not reliable - negation is a very good example of this idea. - Continue discussion of NLP system evaluation across different sites - The NLP-WG will meet on second Wednesday of every month ===Action Items=== - Note-type mapping Presentation - Karthik - Share existing ontologies from Vanderbilt (Hua) and Regenstrief (Jon) - Share strategies for combining data from different searches - Jon - Report on WG for commenting - Hua - Wrappers for cTAKES and Metamap - Min - Improvements to search engine set up using MT samples - Min - Textual Data Representation - Discussion - NLP system evaluation across different sites - Discussion