OHDSI2026 Tutorials

2026 Global Symposium Tutorials

Morning Sessions (8 am - 12 pm ET)

An Introduction to the Journey from Data to Evidence Using OHDSI

This tutorial describes the journey from raw health data to reliable evidence, covering core concepts like the OMOP Common Data Model, Standardized Vocabularies, and open-source tools such as ATLAS and HADES. This tutorial is for newcomers and aim to explain how data standards, tools, and community practices enable large-scale, open-science research, from data transformation (ETL) to study execution and interpretation.

An Introduction to ATLAS

Each year many OHDSI attendees are new to the community and need a practical, hands-on introduction to the OMOP Common Data Model and the primary tools for reproducible observational research. This tutorial will teach attendees how to use Atlas to define vocabularies and cohorts, explore data, and assemble study-ready outputs. The session will include hands-on walkthroughs, a brief high-level introduction to Strategus concepts (how Atlas outputs can feed distributed execution), and a preview of an updated Atlas user interface that streamlines study creation and review. Note: we will not perform full Strategus study design in the tutorial, and Strategus developer availability for detailed questions will be limited.

Topics:

Datasource characterization in Atlas to identify which concepts appear in a dataset over time
Searching and browsing the OMOP vocabulary and saving reusable concept sets
Building and validating cohort definitions using Atlas’ cohort editor
Estimating descriptive measures such as incidence rates and cohort prevalence to understand outcome/exposure distributions
Discovering event patterns and treatment pathways with Atlas visualizations
Brief, high-level overview of Strategus concepts: how Atlas outputs map to Strategus inputs, the typical execution flow, and considerations when preparing a study for distributed execution (no full Strategus study design)

Bringing FAIR to Imaging Research with the Medical Imaging OMOP Extension

This tutorial introduces the Medical Imaging OMOP extension (MI-CDM) and provides a practical, end-to-end guide for incorporating imaging data into OMOP-based observational research. While imaging features and radiomics are increasingly central to clinical and translational studies, imaging data remain largely siloed from EHR-derived data and are often difficult to make Findable, Accessible, Interoperable, and Reusable (FAIR). This tutorial demonstrates how the MI-CDM enables FAIR-aligned imaging research by representing DICOM metadata and derived imaging features within the OMOP CDM. The session is designed for researchers, data scientists, informaticians, and OHDSI community members who want to launch their own imaging-enhanced OMOP dataset. Participants will gain hands-on experience to move from raw imaging archives to analysis-ready, multimodal data that integrate imaging, clinical, and demographic information. We will begin with an overview of the MI-CDM design principles and how imaging concepts are harmonized with existing OMOP structures. The tutorial will then walk through a concrete workflow: indexing a clinical or research DICOM archive, extracting and characterizing DICOM metadata at scale, and transforming these metadata using standardized DICOM terminology, then loading them into searchable OMOP tables. Through this process, participants will experience how previously siloed imaging data—pixels and metadata—can be made findable. Next, we will demonstrate how imaging-derived features can be added to the OMOP ecosystem. Using an example imaging algorithm, we will show how to generate quantitative imaging features and represent them within the OMOP data model, preserving provenance and supporting reuse. Finally, the tutorial will demonstrate multimodal analytics using standard OHDSI tools. Participants will see how imaging features can be analyzed alongside clinical covariates using Atlas to perform data characterization and exploratory analyses. This step highlights how, within OMOP’s privacy-by-design framework, standardized queries can be executed without moving large image files, providing an accessible way to incorporate imaging data into traditional OMOP-based research workflows. By the end of this tutorial, attendees will understand how to operationalize imaging within OMOP, apply FAIR principles to imaging research, and build imaging-enhanced datasets ready for observational and multimodal studies within the OHDSI ecosystem.

Complex Phenotyping at Scale with and without LLMs Using PhenotypeR

This tutorial will teach students how to run and understand study-specific diagnostics using the OHDSI R package PhenotypeR (https://github.com/OHDSI/PhenotypeR), which enables complex phenotype diagnostics, including drug and measurement diagnostics, matched control sampling, and survival analysis. This can be done with or without support from large language models (LLMs) to help you interpret your results.

Part 1: Theory

We will start with an introduction to clinical phenotyping and its central importance to generating reliable evidence. Different steps in the process will be explained including identifying relevant codes, characterising cohorts and their matched controls as benchmark, drug and measurement specific diagnostics, the use of survival estimates, and population diagnostics including incidence rates and prevalence.
The role of diagnostics review by clinical experts and the support provided by LLMs will be introduced and their merits discussed. LLMs are used to contextualise the findings from phenotype characterisation vs previous knowledge, e.g. on disease presentation (signs/symptoms), work-up (e.g. lab or procedures), and treatment/s. Throughout this session lessons learned will be shared from case studies. Results from different LLMs (e.g. Google Gemini, OpenAI, and Mistral) are used to provide examples of the support available.

Part 2: Practical

We will do an interactive session using the PhenotypeR package in R (no prior coding experience necessary!). Participants will be given access to an environment with all required packages installed so that they can code along. Using synthetic data, we will first show you how to create cohorts and then how to run diagnostics against these cohorts. Lastly, we will show how to incorporate clinician or LLM based expectations which diagnostics can be checked against.

OHDSI Leadership Storytelling Workshop

The general idea of the workshop is to address communication and storytelling around OHDSI, as well as the story or pitch needed when engaging different types of stakeholders who are, or should be, part of the community.

The intended impact would be to:

Enable OHDSI community members to grow the community more effectively
Attract a more diverse group of stakeholders beyond researchers
Support current thought leaders by enabling more members to confidently take the stage and lead sessions
Address community needs related to communication, presentation, storytelling, pitching, and public speaking, which are currently not covered by an active working group

Why would OHDSI community members join?

To acquire or refine skills for presenting the OHDSI story to relevant stakeholders
To learn practical methods for public speaking and pitching
To gain reusable materials such as slides and narrative frameworks
To surface real challenges and receive feedback on possible solutions
To practice with peers and receive constructive feedback

Who is this for?

Community members who want to develop their thought leadership
Members struggling with OHDSI or OMOP adoption within their organization
Those aiming to lead OHDSI or OMOP initiatives locally
National Node Leads who want to grow their region in size and stakeholder diversity
Study leads seeking to resource network studies and onboard data partners
Data engineers and scientists who depend on OHDSI or OMOP adoption to support their work

Mastering OMOP: Transforming EHR Data with Practical Strategies, Best Practices, and OHDSI Integration

Join us for an immersive 4-hour workshop tailored to professionals and researchers working with electronic health record (EHR) data, whether you are new to the OHDSI community or have been actively involved for years. Led by an experienced OHDSI instructor alongside seasoned veterans from top-tier academic medical centers, this workshop will strengthen your understanding of the OMOP Common Data Model (CDM) while offering practical strategies for its adoption, implementation, and use within health systems.

The first hour of the session will explore the foundation and evolution of the OMOP CDM, addressing critical questions about why this model is essential and the broader mission of OHDSI. We will discuss the challenges of working with EHR data and how OMOP provides a unique framework for driving cross-institutional, reproducible research. You will gain insights on how OMOP compares to other data models conceptually, preparing you for the practical considerations ahead.

The second section delves into the real-world intricacies of creating and maintaining a pragmatic OMOP CDM. Through engaging lectures and relatable best practices, you will learn how to tailor the OMOP CDM to your organization’s specific needs, align vocabulary mappings to international standards, and maintain high-quality data through customized quality checks. This segment will demystify some of the most technical aspects of OMOP adoption, ensuring attendees leave with a concrete understanding of how to start or refine their processes.

In the final hour, we will transition into broader discussions focused on joining and leveraging OHDSI’s collaborative networks. You will discover opportunities to contribute to global research initiatives, build partnerships within disease-based or location-based working groups, and collaborate with others. A facilitated networking session will allow attendees to interact with OHDSI experts, share experiences, and expand their professional connections within the community.

This interactive workshop blends lectures, small group discussions, and networking opportunities to ensure attendees receive practical insights and actionable strategies for their work. Whether you are embarking on your OHDSI journey or are a seasoned contributor, this session promises to provide value, tools, and connections to empower your work with EHR data.

Afternoon Sessions (1 pm - 5 pm ET)

Building and Using the OHDSI Evidence Network: From Data Partner to Federated Study Execution

The OHDSI Evidence Network is a global, open, federated research infrastructure designed to support large-scale real-world evidence generation while preserving local data governance and institutional autonomy. As participation in the network grows, there is increasing demand for clear, practical guidance on how to engage with the network—both from the perspective of data partners contributing data and study leads conducting multi-site research. This tutorial is structured as two complementary 2-hour sessions, designed to align expectations across the network and reduce friction in federated research.

Session 1: Participating in the Evidence Network as a Data Partner (2 hours)

This session is intended for data custodians, analysts, and institutional stakeholders at organizations with OMOP CDM–mapped data. Topics include:

– An overview of the Evidence Network’s federated, opt-in operating model

– How to communicate the value and return on investment of participation to institutional leadership

– Common governance and oversight considerations (e.g., IRB review, data sharing boundaries)

– What participation in a network study entails, including roles, responsibilities, and timelines

– How data partners engage with feasibility, execution, and iterative study runs

– The goal of this session is to equip data partners with a clear understanding of how and why to participate, and what to expect when engaging in network studies.

Session 2: Conducting Studies Through the Evidence Network as a Study Lead (2 hours)

This session is intended for investigators and methodologists interested in leading or coordinating federated studies. Topics include:

– Designing studies suitable for federated execution

– The Evidence Network study lifecycle, from idea through synthesis

– Roles and responsibilities of study leads, data partners, and coordinating teams

– Execution, QA/QC, and managing cross-site variability

– Synthesizing and publishing results from multi-site studies

Learning Outcomes

Across both sessions, participants will gain a shared mental model of how the Evidence Network operates, practical guidance for participation or leadership, and tools to support high-quality federated research within the OHDSI community.

From Multi-Modal Data to Real-World Evidence: Hands-on with the Data2Evidence Platform for OMOP Data Curation and Analytics

Data2Evidence is an end-to-end platform built for generating credible real-world evidence from multi-modal healthcare data by providing users tools to go from a healthcare dataset to interpretable results. Data2Evidence was used to demonstrate the end-to-end OMOP journey with the GUSTO mother-child cohort dataset during the 2026 OHDSI APAC scientific forum series. The GUSTO cohort is Singapore’s first OMOP CDM in the OHDSI Evidence Network. This 4-hour tutorial trains OHDSI collaborators (clinical researchers, students, data scientists, industry professionals, and regulators-facing teams) to similarly use Data2Evidence to design, execute, and communicate evidence generation in a way that is transparent, shareable, and aligned with OHDSI best practices. The session blends short talks, live demonstrations, and hands-on exercises. Participants will learn how they can use Data2Evidence to load a dataset using easy drag and drop ETL flow, structure an evidence request, how these choices map to OHDSI components, and how to run the study using Strategus libraries.

Hands-on activities will walk participants through an example study from specification to execution and review. Attendees will (1) extract raw test data (e.g., CSV, database, EHR export) and transform it into the OMOP CDM format, (2) load the transformed data into an OMOP CDM database and validate it with DQD to ensure data quality, (3) configure a Data2Evidence study from a template, (4) run the Data2Evidence workflow on the provided dataset or local OMOP CDM connection, (5) review key diagnostics and sensitivity checks for the study, and (6) produce an evidence “packet” suitable for internal review or multi-site collaboration. Participants can also discuss with the developers on how to translate their own research questions into Data2Evidence-ready specifications for scalable evidence generation.

Prerequisites: basic familiarity with OMOP CDM and cohort concepts is not required but helpful. Participants should bring a laptop and will receive pre-tutorial setup instructions (software, credentials, and test data options). By the end, attendees will be able to author an OHDSI network study specification, execute it reproducibly, interpret diagnostics, and share a standardized evidence artifact for collaboration and decision-making.

Integrating Geospatial Data Into OMOP CDM

This hands-on tutorial introduces the OHDSI GIS toolchain for integrating environmental exposures and social determinants into health research. Participants will discover, process, and analyze place-based health determinants using OMOP extensions.

Session 1: Cataloging and Data Discovery (1 hour)

Discover geospatial datasets using gaiaCatalog’s Schema.org-compliant interface. Participants will search datasets, understand metadata documentation, and author functional metadata for automated data retrieval. Exercises cover environmental, social, and demographic data sources relevant to health research.

Session 2: The Gaia Pipeline (1 hour)

Learn the workflow from raw geospatial data to OMOP-standardized tables. Deploy gaiaDocker locally, ingest public datasets (EPA air quality), perform spatial transformations using gaiaDb/PostGIS, and populate the external_exposure table. Includes geocoding with DeGauss, spatial joins, and privacy-preserving aggregation.

Session 3: OMOP Integration (1 hour)

Explore external_exposure and location_history tables that extend OMOP CDM for geospatial analytics. Understand vocabulary integration (OMOP GIS, Exposome, SDoH) and query exposure data alongside clinical observations. Calculate temporal exposure metrics (e.g., PM2.5 during pregnancy) and link exposures to cohort definitions.

Session 4: Analytical Applications (1 hour)

Integrate HADES tools for geospatially-informed research. Use FeatureExtraction for spatial covariates and PatientLevelPrediction with environmental features. Explore prototype extensions (GeoFeatureExtraction, SpatialCohortMethod), privacy-preserving visualization, and federated network studies.

Target Audience: Researchers and informaticians interested in environmental epidemiology or social determinants. Basic OMOP familiarity helpful.

Introduction to OHDSI Phenotype Development & Evaluation

Accurate phenotyping is a cornerstone of reliable observational research, yet it remains one of the greatest methodological challenges in real‑world data analytics. Observational data are inherently prone to misclassification and these errors can meaningfully influence study validity. This mid‑level OHDSI tutorial is designed for participants who are familiar with OHDSI standardized vocabularies and already comfortable developing cohorts in ATLAS, and who now seek to deepen their scientific understanding and strengthen their applied phenotype development skills. The tutorial will begin with an overview of the science of misclassification error, covering key concepts such as sensitivity, specificity, predictive value, and index event misclassification. Participants will learn how these errors arise within observational data sources and how they directly affect effect estimation, transportability, and downstream decision‑making. Building on this foundation, the session will transition to OHDSI’s established best practices for phenotype development and evaluation. Using real examples, we will walk through tools and methods that support transparent, reproducible, and scalable phenotype definitions—including cohort diagnostics, standard vocabulary exploration, benchmarking against population characteristics. The latter portion of the tutorial introduces an emerging opportunity for improving phenotype development: an AI‑aided iterative workflow. We will demonstrate how AI/LLM can support iterative refinement as part of phenotype development.

By the end of this tutorial, participants will have a richer understanding of the scientific foundations of phenotype misclassification, practical experience applying OHDSI best practices, and early exposure to how AI‑enabled workflows can enhance rigor and reproducibility. This session is ideal for researchers ready to advance from simply using ATLAS to mastering phenotype design as a scientific discipline.

OHDSI Standardized Vocabularies on FHIR: A Deep Dive Using the Echidna Terminology Server

This hands-on tutorial teaches participants to use Echidna (https://echidna.fhir.org), the authoritative source for OHDSI Standardized Vocabularies on a FHIR Terminology Server. Beginning with the conceptual foundations of terminology management across FHIR, OMOP, and openEHR, participants will progress to practical skills executing FHIR API calls against a production terminology server – including concept lookup, value set expansion for phenotype development, and source code translation to OMOP Standard Concepts. The tutorial concludes with an overview of the HL7 Vulcan FHIR-to-OMOP Implementation Guide and patterns for integrating Echidna into ETL pipelines and OHDSI workflows.

Presented by Jean Duteau (Dogwood Health Consulting), Davera Gabriel (Evidentli), and Dr. Guy Tsafnat (Echidna Systems).

Using OMOP Model in Registry Context & Clinical Trials Standardization Context: Conventions, Past Use Cases, SDTM & Regulatory Consideration, Challenges

Registry and clinical trial (CT) data often need to fill the gap not covered by traditional RWD. If RWD follows OMOP, using the same model for harmonization of registry and CT data has advantages.

Outline:

1. Special considerations found in registry and CT data (e.g., converting relative dates to OMOP compliant dates, CRF data, adverse drug event (seriousness, severity, and relatedness between drug and AE))

2. Use of 2B concepts (custom vocabularies) vs OMOP vocabularies

3. Use cases published in literature (e.g., UK Biobank, OHDSI 2024 poster: Application of OMOP Common Data Model to Disease Registry Data, AllOfUs)

4. Other use cases (AACR GENIE)

5. OMOP and SDTM comparison (+ FDA regulatory considerations)

6. Combining trial and RWD data (external control arm) considerations (data granularity mismatch, RWD limited to routine healthcare, lack of advanced COAs, data collection not regulatory grade)