User Tools

Site Tools


documentation:laertes_etl

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
documentation:laertes_etl [2015/05/26 18:49]
lee [PUBMED / MEDLINE data feed]
documentation:laertes_etl [2015/06/23 10:58] (current)
lee
Line 33: Line 33:
 This model decouples the data sources from the various copies of the This model decouples the data sources from the various copies of the
 sources that might have been processed in many different ways. It also sources that might have been processed in many different ways. It also
-decouples what can be said about and evidence item (i.e., the semantic+decouples what can be said about an evidence item (i.e., the semantic
 tags) from the information artifact. All of this allows for greater tags) from the information artifact. All of this allows for greater
 flexibility with respect to inclusion of sources and flexibility with respect to inclusion of sources and
Line 227: Line 227:
   * Run python scripts to convert the data into RDF ntriple graph data   * Run python scripts to convert the data into RDF ntriple graph data
   * Load the RDF ntriple graph data into the Virtuoso database   * Load the RDF ntriple graph data into the Virtuoso database
 +  * Manually load the annotation URIs into the URL Shortener MySQL database using the MySQL command line client
   * Manually run Virtuoso SPARQL query to export the drug/hoi combinations along with the adverse event counts into an export file   * Manually run Virtuoso SPARQL query to export the drug/hoi combinations along with the adverse event counts into an export file
   * Run Python script to load the export file into the PostgreSQL public schema database   * Run Python script to load the export file into the PostgreSQL public schema database
Line 234: Line 235:
 The details for this data feed are documented and maintained here: The details for this data feed are documented and maintained here:
 https://​github.com/​OHDSI/​KnowledgeBase/​tree/​master/​LAERTES/​SPLICER https://​github.com/​OHDSI/​KnowledgeBase/​tree/​master/​LAERTES/​SPLICER
- 
 ==== SemMED data feed ==== ==== SemMED data feed ====
 The Semantic MEDLINE Database is a repository of semantic predications (subject-predicate-object triples) extracted by SemRep, a semantic interpreter of biomedical text. The Semantic MEDLINE Database is a repository of semantic predications (subject-predicate-object triples) extracted by SemRep, a semantic interpreter of biomedical text.
Line 244: Line 244:
   * Run python scripts to convert the data into RDF ntriple graph data   * Run python scripts to convert the data into RDF ntriple graph data
   * Load the RDF ntriple graph data into the Virtuoso database   * Load the RDF ntriple graph data into the Virtuoso database
 +  * Manually load the annotation URIs into the URL Shortener MySQL database using the MySQL command line client
   * Manually run Virtuoso SPARQL query to export the drug/hoi combinations along with the adverse event counts into an export file   * Manually run Virtuoso SPARQL query to export the drug/hoi combinations along with the adverse event counts into an export file
   * Run Python script to load the export file into the PostgreSQL public schema database   * Run Python script to load the export file into the PostgreSQL public schema database
documentation/laertes_etl.1432666197.txt.gz · Last modified: 2015/05/26 18:49 by lee