User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
documentation:software:usagi [2014/12/17 19:29]
ericavoss created
documentation:software:usagi [2018/08/24 06:49]
schuemie [Importing Source Codes into Usagi]
Line 5: Line 5:
 ===== Introduction ===== ===== Introduction =====
-Usagi is a software tool created by the Observational Health Data Sciences and Informatics (OHDSI) team and is used to help in the process of mapping codes from a source system into terminologies,​ preferably ​standard ​ones, stored in the Observational Medical Outcomes Partnership (OMOP) Vocabulary ([[http://​​data-standardization/​vocabulary-resources/​]]). ​ The word Usagi is Japanese for rabbit and was named after the first mapping exercise it was used for; mapping source codes used in a Japanese dataset into OMOP Vocabulary concepts.+Usagi is a software tool created by the Observational Health Data Sciences and Informatics (OHDSI) team and is used to help in the process of mapping codes from a source system into the standard ​terminologies ​stored in the Observational Medical Outcomes Partnership (OMOP) Vocabulary ([[http://​​data-standardization/​vocabulary-resources/​]]). ​ The word Usagi is Japanese for rabbit and was named after the first mapping exercise it was used for; mapping source codes used in a Japanese dataset into OMOP Vocabulary concepts.
   ​   ​
 Mapping source codes into the OMOP Vocabulary is valuable for two main reasons: ​ Mapping source codes into the OMOP Vocabulary is valuable for two main reasons: ​
Line 13: Line 13:
 ==== Scope and purpose ==== ==== Scope and purpose ====
-A source ​codes file that needs mapping are loaded into the Usagi (if the codes are not in English additional translations columns are needed). ​ A term similarity approach is used to connect source codes to OMOP Vocabulary concepts ​(currently only OMOP Vocabulary V5).  At a high level this term similarity approach works by 1) leveraging Unified Medical Language System (UMLS) to find synonyms for concepts in the OMOP Vocabulary (i.e. if the concept in the OMOP Vocabulary is “Myocardial infarction” a synonym for that concept is “heart attack”) and 2) map source code descriptions (in English) to the OMOP Vocabulary concepts by using by using a term similarity score. However these code connections need to be manually reviewed and Usagi provides an interface to facilitate that.  ​+Source ​codes that needs mapping are loaded into the Usagi (if the codes are not in English additional translations columns are needed). ​ A term similarity approach is used to connect source codes to Vocabulary concepts. However these code connections need to be manually reviewed and Usagi provides an interface to facilitate that.  ​
-Usagi currently does not currently ​translate non-English codes to English. ​ We suggest using Google Translate ([[https://​​]]). ​ You can paste an entire column of non-English terms into Google Translate, and it will return that same column translated to English.+Usagi currently does not translate non-English codes to English. ​ We suggest using Google Translate ([[https://​​]]). ​ You can paste an entire column of non-English terms into Google Translate, and it will return that same column translated to English
 +Usagi will only propose concepts that are marked as **standard concepts** in the Vocabulary.
 ==== Process Overview ==== ==== Process Overview ====
Line 39: Line 41:
 ==== Importing Source Codes into Usagi ==== ==== Importing Source Codes into Usagi ====
-Export source codes from source system into a CSV or Excel (.xlsx) file.  This should at least have the columns ​SOURCE_CODE ​and SOURCE_CODE_DESCRIPTION ​however additional information about codes can be brought over as well (e.g. DOSE_UNIT).  In addition to information about the source codes, the frequency of the code should also be brought over as FREQUENCY, this can help prioritize which codes should receive the most effort in mapping (i.e. you can have 1,000 source codes but only 100 are truly used within the system). ​ If any source code information needs translating to English, use Google Translate to do that.  ​Add the English translations to your file.+Export source codes from source system into a CSV or Excel (.xlsx) file.  This should at least have columns ​containing the **source code** ​and an English **source code description**, ​however additional information about codes can be brought over as well (e.g. dose unit, or the description in the original language if translated). In addition to information about the source codes, the **frequency of the code** should ​preferably ​also be brought over, since this can help prioritize which codes should receive the most effort in mapping (e.g. you can have 1,000 source codes but only 100 are truly used within the system). ​ If any source code information needs translating to English, use Google Translate to do that.  ​
 Note: source code extracts should be broken out by domain (i.e. drugs, procedures, conditions, observations) and not lumped into one large file. Note: source code extracts should be broken out by domain (i.e. drugs, procedures, conditions, observations) and not lumped into one large file.
Line 45: Line 47:
 Source codes are loaded into Usagi from the File --> Import codes menu.  From here an “Import codes ...” will display as seen in Figure 1.  Source codes are loaded into Usagi from the File --> Import codes menu.  From here an “Import codes ...” will display as seen in Figure 1. 
-{{ :​documentation:​loadingscreen1.png?direct |}}+{{ :​documentation:​software:​usagiimport.png?direct |}}
 **Figure 1:  Usagi Source Code Input Screen** **Figure 1:  Usagi Source Code Input Screen**
-In Figure 1, the source code terms were in Dutch and were also translated into English. ​ Usagi will leverage the English translations to map to the standard vocabulary ​(SNOMED in this case).+In Figure 1, the source code terms were in Dutch and were also translated into English. ​ Usagi will leverage the English translations to map to the standard vocabulary.
-{{ :​documentation:​loadingscreen2.png?direct |}}+{{ :​documentation:​software:​usagiimport2.png?direct |}}
 **Figure 2:  Telling Usagi how to Read Input File** **Figure 2:  Telling Usagi how to Read Input File**
 Seen in Figure 2, the //Column mapping// section is where you define for Usagi how to use the imported CSV.  If you mouse hover over the drop downs, a pop-up will appear defining each column. ​ Usagi will not use the //​Additional info column(s)// as information to associate source codes to Vocabulary concept codes; however this additional information may help the individual reviewing the source code mapping and should be included. ​ Seen in Figure 2, the //Column mapping// section is where you define for Usagi how to use the imported CSV.  If you mouse hover over the drop downs, a pop-up will appear defining each column. ​ Usagi will not use the //​Additional info column(s)// as information to associate source codes to Vocabulary concept codes; however this additional information may help the individual reviewing the source code mapping and should be included. ​
-Finally you can tell Usagi what OMOP Vocabulary terminologies you plan to map into.  For example, in Figure 3, the user is mapping the source codes to the SNOMED standard terminology ​the OMOP Vocabulary Hover your mouse over the different filters for additional information about the filter.+Finally you can set some restrictions for Usagi when mapping.  For example, in Figure 3, the user is mapping the source codes only to concepts in the Condition domain. By default, Usagi only maps to Standard Concepts, but if the the option '​Filter standard concepts'​ is turned off, Usagi will also consider Classification Concepts. Hover your mouse over the different filters for additional information about the filter.
-One special filter is //Filter by automatically selected concepts//​. ​ If there is information that you can use to restrict the search, you can do so by providing a list of CONCEPT_IDs in the column indicated in the //Auto concept ID column// (semicolon-delimited). ​ For example, in the case of drugs there might be a mapping available to ATC codes. ​ Even though an ATC code does not uniquely identify a single RxNorm drug code, it does help limit the search space to only those concepts that fall under the ATC code in the Vocabulary. ​ By providing this list of CONCEPT_IDs in the //Auto concept ID column//, and turning on //Filter by automatically selected concepts//, Usagi will make use of this information. ​ In the example abovewe used a partial mapping derived from UMLS to restrict Usagi to this mapping when available.+One special filter is //Filter by automatically selected concepts ​/ ATC code//.  If there is information that you can use to restrict the search, you can do so by providing a list of CONCEPT_IDs ​or an ATC code in the column indicated in the //Auto concept ID column// (semicolon-delimited). ​ For example, in the case of drugs there might already ​be ATC codes assigned to each drug. Even though an ATC code does not uniquely identify a single RxNorm drug code, it does help limit the search space to only those concepts that fall under the ATC code in the Vocabulary. ​To use the ATC codefollow these steps:
-{{ :​documentation:​loadingscreen3.png?direct |}} +  - In the Column mapping section, switch from 'Auto concept ID column'​ to 'ATC column'​ 
-**Figure 3:  Defining ​OMOP Vocabulary Terminology Usagi Should Plan to Map to**+  - In the Column mapping section, select the column containing the ATC code as 'ATC column'​.  
 +  - Turn on the '​Filter by user selected concepts / ATC code' on in the Filters section. 
 +You can also use other sources of information than the ATC code to restrict as well.  In the example shown in the figure above, we used a partial mapping derived from UMLS to restrict the Usagi searh. In that case we will need to use 'Auto concept ID column'​. 
 +{{ :​documentation:​software:​usagiimport3.png?direct |}} 
 +**Figure 3:  Defining ​filter rules when mapping**
 Once all your settings are finalized, click the "​Import"​ button to import the file.  The file import will take a few minutes as it is running the term similarity algorithm to map source codes. Once all your settings are finalized, click the "​Import"​ button to import the file.  The file import will take a few minutes as it is running the term similarity algorithm to map source codes.
 ==== Reviewing Source Code to OMOP Vocabulary Concept Maps ==== ==== Reviewing Source Code to OMOP Vocabulary Concept Maps ====
-Once you have set up your input file of source codes, the mapping process begins.  ​+Once you have imported ​your input file of source codes, the mapping process begins.  ​
-{{ :​documentation:​usagiscreen1.png?​direct |}}+{{ :​documentation:software:​usagiscreen1.png?​direct |}}
 **Figure 4:  Usagi Matching Screen** **Figure 4:  Usagi Matching Screen**
Line 76: Line 84:
 === Approving a Suggested Mapping === === Approving a Suggested Mapping ===
-In the Overview Table, Usagi tries to make suggested related ​concepts ​to the source codes it was provided In the example in Figure 4, the English names of Dutch condition codes were mapped to SNOMED conditions; ​Usagi searches for concept names and synonyms ​(taken from UMLS) based on whatever English text it is given If Usagi is unable to make a mapping, it will map to the CONCEPT_ID = 0.  ​If you noticed all the suggested mappings are 0 there may be an issue with the initial index generated by Usagi, use the //Help --> rebuild index// option to rebuild the index.+The Overview Table shows the current mapping of source codes to concepts. Right after importing ​source codes, this mapping contains the automatically generated suggested mappings based on term similarity and any search options. In the example in Figure 4, the English names of Dutch condition codes were mapped to standard concepts in the Condition domain, because the user restricted the search to that domain. ​Usagi compared the source code descriptions to concept names and synonyms ​to find the best match. Because the user had selected '​Include source terms' Usagi also considered the names and synonyms of all source concepts in the vocabulary that map to a particular concept.If Usagi is unable to make a mapping, it will map to the CONCEPT_ID = 0.  ​
-{{ :​documentation:​usagiscreen2.png?​direct |}}+{{ :​documentation:software:​usagiscreen2.png?​direct |}}
 **Figure 5:  Reviewing an Usagi Match** **Figure 5:  Reviewing an Usagi Match**
-It is suggested that someone with experience with coding systems help map source codes to their associated standard vocabulary. ​ That individual will work through code by code in the Overview Table to either accept the mapping Usagi has suggested or choose a new mapping. ​ For example in Figure 5 we see that the Dutch term “Pijn toegeschreven aan hart” which was translated to the English term “Heart pain”.  Usagi used “Heart pain” and mapped it to the OMOP Vocabulary concept of “4145057-Cardiac chest pain”.  There was a matching score of 0.39 associated to this matched pair (matching scores are typically 0 to 1 with 1 being a confident match), a score of 0.39 signifies that Usagi is not very sure of how well it has mapped this Dutch code to SNOMED. ​ Let us say in this case, we are okay with this mapping, we can approve it by hitting the green “Approve” button in the bottom right hand portion of the screen. ​+It is suggested that someone with experience with coding systems help map source codes to their associated standard vocabulary. ​ That individual will work through code by code in the Overview Table to either accept the mapping Usagi has suggested or choose a new mapping. ​ For example in Figure 5 we see that the Dutch term “Hoesten” which was translated to the English term “Cough”.  Usagi used “Cough” and mapped it to the OMOP Vocabulary concept of “4158493-C/O - cough”.  There was a matching score of 0.58 associated to this matched pair (matching scores are typically 0 to 1 with 1 being a confident match), a score of 0.58 signifies that Usagi is not very sure of how well it has mapped this Dutch code to SNOMED. ​ Let us say in this case, we are okay with this mapping, we can approve it by hitting the green “Approve” button in the bottom right hand portion of the screen. ​
 === Searching for a New Mapping === === Searching for a New Mapping ===
-{{ :​documentation:​usagiscreen3.png?​direct |}}+{{ :​documentation:software:​usagiscreen3.png?​direct |}}
 **Figure 6:  Searching for a New Concept** **Figure 6:  Searching for a New Concept**
-There will be cases where Usagi suggests a map and the user will be left to either try to find a better mapping or set the map to no concept (CONCEPT_ID = 0).  In the example given in Figure 6, we see for the Dutch Term “Symptomen/​klachten potentie [ex. P07,P08]”, which was translated to “Impotence NOS”.  Usagi was unable to make a proper map because ​the UMLS derived mapping ​used a non-valid concept, and therefore mapped it to CONCEPT_ID = 0.  In the Search Facility, we could search for other concepts using either the actual term itself or a search box query.+There will be cases where Usagi suggests a map and the user will be left to either try to find a better mapping or set the map to no concept (CONCEPT_ID = 0).  In the example given in Figure 6, we see for the Dutch Term “Hoesten”, which was translated to “Cough”.  Usagi's suggestion ​was restricted by the concept identified in our automatically ​derived mapping ​from UMLS, and the result might not be optimal.  In the Search Facility, we could search for other concepts using either the actual term itself or a search box query.
-When using the manual search box, there are some things to keep in mind:  ​Usagi’s search ​algorithm is based on complete wordsso the search ​‘cardi’ will not find terms containing the word ‘cardiac’. ​ To use partial words you can insert a wildcard ‘*’, so for example ​‘cardi*’ will find both ‘cardiac’ and ‘cardiology’. ​ Usagi is able to deal with plurals, so ‘child’ will also find ‘children’. ​ You can use simple ​boolean ​logic in the search box, for example ‘cardiac ​AND arrest’ will find only those terms containing both words, whereas ‘cardiac ​OR heart’ will find all terms containing one or both of the two words.+When using the manual search box, one should ​keep in mind that Usagi uses a fuzzy search, ​and does not support structured ​search ​queries, so for example ​not supporting ​boolean ​operators like AND and OR.
-To continue our example, suppose we used the search term “Impotence NOS” to see if we could find a better mapping. ​ On the right of the //Query// section of the Search Facility, there is a //Filters// section, this provides options to trim down the results from the OMOP Vocabulary when searching for the search term.  In this case we know we want to only find SNOMED terms, we only want valid concepts, and we are looking for concepts in the CONDITION domain   +To continue our example, suppose we used the search term “Cough” to see if we could find a better mapping. ​ On the right of the //Query// section of the Search Facility, there is a //Filters// section, this provides options to trim down the results from the OMOP Vocabulary when searching for the search term.  In this case we know we want to only find standard ​concepts, and we allow concepts to be found based on the names and synonyms of source ​concepts in the vocabulary that map to those standard concepts  ​
-When we apply these search criteria we find “4216771-Impotence” and feel this may be an appropriate Vocabulary concept to map to our Dutch code, in order to do that we can hit the “Replace concept”, which you will see the Selected Source Code section update, followed by the “Approved” button. ​ There is also an “Add concept” button, this allows for multiple standardized Vocabulary concepts to map to one source code (e.g. some source codes may bundle multiple diseases together while the standardized vocabulary may not).+When we apply these search criteria we find “254761-Cough” and feel this may be an appropriate Vocabulary concept to map to our Dutch code, in order to do that we can hit the “Replace concept”, which you will see the Selected Source Code section update, followed by the “Approved” button. ​ There is also an “Add concept” button, this allows for multiple standardized Vocabulary concepts to map to one source code (e.g. some source codes may bundle multiple diseases together while the standardized vocabulary may not)
 +=== Concept information === 
 +When looking for appropriate concepts to map to, it is important to consider the '​social life' of a concept. The meaning of a concept might depend partially on its place in the hierarchy, and sometimes there are '​orphan concepts'​ in the vocabulary with few or no hierarchical relationships,​ which would be ill-suited as target concepts. Usagi will often report the number of parents and children a concept has, and it also possible to show more information by pressing ALT + C, or selecting //view// --> //Concept information//​ in the top menu bar.  
 +{{ :​documentation:​software:​conceptinformation.png?​direct |}} 
 +**Figure 7:  Concept information panel** 
 +Figure 7 shows the concept information panel. It shows general information about a concept, as well as its parents, children, and other source codes that map to the concept. Users can use this panel to navigate the hierarchy and potentially choose a different target concept.
 === Auto Mapped === === Auto Mapped ===
Line 100: Line 117:
 When you import your source codes there is an option to add information about “Auto concept ID column”. ​ If there is information already known that will allow you to map your source data to a CONCEPT_ID, you can include that in the file you upload into Usagi. ​ Once loaded, the Overview Table will list these codes with a status of “Auto mapped to 1” if only one CONCEPT_ID was provided, or just “Auto mapped” if there were more.  You still will be required to approve these auto mappings using the “Approve” button, or if you really trust the underlying information,​ you can sort by status, select all codes with status ‘Auto mapped to 1”, and click //Edit --> Approve selected//. When you import your source codes there is an option to add information about “Auto concept ID column”. ​ If there is information already known that will allow you to map your source data to a CONCEPT_ID, you can include that in the file you upload into Usagi. ​ Once loaded, the Overview Table will list these codes with a status of “Auto mapped to 1” if only one CONCEPT_ID was provided, or just “Auto mapped” if there were more.  You still will be required to approve these auto mappings using the “Approve” button, or if you really trust the underlying information,​ you can sort by status, select all codes with status ‘Auto mapped to 1”, and click //Edit --> Approve selected//.
 +Continue to move through this process, code by code, until all codes have been checked. ​ In the list of source codes at the top of the screen, by selecting the column heading you can sort the codes. ​ Often we suggest going from the highest frequency codes to the lowest. In the bottom left of the screen you can see the number of codes that have approved mappings, and how many code occurrences that corresponds to. 
-Continue ​to move through this processcode by code, until all codes have been checked. ​ In the list of source codes at the top of the screen, by selecting the column heading you can sort the codes. ​ Often we suggest going from the highest frequency codes to the lowest (often you will find the two set of codes cover most of the data)   +It is possible ​to add comments to mappingswhich could be used to document why a particular mapping decision was made.
 Best Practices: Best Practices:
Line 107: Line 125:
   * By clicking on a column name you can sort the columns in the Overview Table. ​ It may be valuable to sort on “Match Score”; reviewing codes that Usagi is most confident on first may quickly knock out a significant chunk of codes. ​ Also sorting on “Frequency” is valuable, spending more effort on frequent codes versus non-frequent is important.   * By clicking on a column name you can sort the columns in the Overview Table. ​ It may be valuable to sort on “Match Score”; reviewing codes that Usagi is most confident on first may quickly knock out a significant chunk of codes. ​ Also sorting on “Frequency” is valuable, spending more effort on frequent codes versus non-frequent is important.
   * It is okay to map some codes to CONCEPT_ID = 0, some codes may not be worth it to find a good map and others may just lack a proper map.   * It is okay to map some codes to CONCEPT_ID = 0, some codes may not be worth it to find a good map and others may just lack a proper map.
 +  * It is important to consider the context of a concept, specifically its parents and children.
 ==== Export the Usagi Map Created ==== ==== Export the Usagi Map Created ====
Line 115: Line 134:
 After selecting the SOURCE_VOCABULARY_ID,​ you give your export CSV a name and save to location. ​ The export CSV structure is in that of the SOURCE_TO_CONCEPT_MAP table. ​ This mapping could be appended to the OMOP Vocabulary’s SOURCE_TO_CONCEPT_MAP table. ​ It would also make sense to append a single row to the VOCABULARY table defining the SOURCE_VOCABULARY_ID you defined in the step above. ​ Finally, it is important to note that only mappings with the “Approved” status will be exported into the CSV file; the mapping needs to be completed in USAGI in order to export it. After selecting the SOURCE_VOCABULARY_ID,​ you give your export CSV a name and save to location. ​ The export CSV structure is in that of the SOURCE_TO_CONCEPT_MAP table. ​ This mapping could be appended to the OMOP Vocabulary’s SOURCE_TO_CONCEPT_MAP table. ​ It would also make sense to append a single row to the VOCABULARY table defining the SOURCE_VOCABULARY_ID you defined in the step above. ​ Finally, it is important to note that only mappings with the “Approved” status will be exported into the CSV file; the mapping needs to be completed in USAGI in order to export it.
 +==== Updating an Usagi mapping ====
 +Often a mapping is not a one-time effort. As data is updated perhaps new source codes are added, and also the vocabulary is updated regularly, perhaps requiring an update of the mapping. ​
 +When the set of source codes is updated the following steps can be followed to support the update:
 +1. Import the new source code file
 +2. Choose //File// --> //Apply previous mapping//, and select the old Usagi mapping file
 +3. Identify codes that haven'​t inherited approved mappings from the old mapping, and map them as usual.
 +When the vocabulary is updated, follow these steps:
 +1. Download the new vocabulary files from Athena
 +2. Rebuild the Usagi index (//Help// --> //Rebuild index//)
 +3. Open the mapping file
 +4. Identify codes that map to concepts that in the new vocabulary version no longer are Standard concepts, and find more appropriate target concepts. ​
 ==== Menu Options ==== ==== Menu Options ====
documentation/software/usagi.txt · Last modified: 2021/04/09 18:56 by maximmoinat