Can you discuss your background and career journey?
I earned my PhD in Biomedical Informatics from Columbia University and subsequently held faculty positions at Vanderbilt University and The University of Texas Health Science Center. I now serve as the Robert T. McCluskey Professor and Vice Chair for Research and Development in the Department of Biomedical Informatics & Data Science at Yale School of Medicine, where I also serve as Associate Dean for Biomedical Informatics. My early research centered on developing natural language processing (NLP) methods and systems to extract, standardize, and interpret real-world data from electronic health records. Over time, my work has expanded to encompass large-scale data integration, AI model development, computational infrastructure, and translational applications of AI across multiple biomedical domains.
How did you first get involved with OHDSI, and what has motivated you to stay actively engaged with the community?
I first became involved with OHDSI around 2015, when we were exploring how to map unstructured clinical data into the OMOP Common Data Model. I was immediately impressed by the openness, rigor, and collaborative spirit of the OHDSI community. Since then, I have been deeply engaged—leading the NLP Working Group, contributing to data standards discussions, and supporting various network studies. What keeps me motivated is the shared mission of making real-world data useful for improving health, and the fact that OHDSI has created a truly global, inclusive, and intellectually vibrant community. It’s rare to find a network where clinicians, epidemiologists, data scientists, and informaticians work so seamlessly toward a common goal.
You lead the Natural Language Processing Workgroup for OHDSI. How does NLP impact OHDSI’s mission of generating real-world evidence in healthcare, and what are a couple recent NLP advances the community should know about?
NLP plays a critical role in unlocking the vast amount of unstructured data—all types of clinical notes—that remains underutilized in most real-world datasets. By converting this information into structured, standardized representations aligned with OMOP, NLP greatly expands the scope and precision of real-world evidence. The OHDSI NLP Working Group has played a pivotal role in this effort by designing tables to represent textual data in the OMOP Common Data Model, developing and sharing tools for text processing, and enabling multi-site studies using clinical notes. I am sincerely grateful to all members of the NLP Working Group for their dedication and contributions.
Recently, the field has seen transformative advances driven by large language models (LLMs) and AI agents. Within OHDSI, we are exploring how LLMs can automate entity extraction, vocabulary mapping, and phenotyping across multiple institutions. Our group has also been developing novel methods, benchmark datasets, and tools to make state-of-the-art NLP technologies more accessible to the community. These advances bring us closer to the goal of fully integrating unstructured data into federated network studies.
Yale has been involved in exciting research within the OHDSI community. As Professor and Vice Chair for Research and Development at Yale, what excites you most about what is happening there?
At Yale, we are fostering a dynamic ecosystem that bridges biomedical informatics, data science, artificial intelligence, and clinical and translational research. We actively promote OHDSI tools and the OMOP Common Data Model across several national initiatives—such as the IMPACT-MH Data Coordination Center and the RADx-rad Coordination Cetner—that emphasize data standards, interoperability, and scalable health AI applications. Currently, we are particularly excited about advancing real-world foundation models built upon the OMOP CDM to accelerate causal inference, treatment effect estimation, and patient-level prediction tasks. What excites me most is how OHDSI’s open science philosophy aligns with Yale’s mission to advance data-driven discovery—enabling global collaboration, reproducibility, and trustworthy AI for improving healthcare.
A theme at the recent global symposium was building greater global collaboration in network studies. As a longtime member of OHDSI, how exciting has it been to witness the global growth of the community, and why is it so important for research to be truly global?
It has been remarkable to watch OHDSI grow from a U.S.-centered initiative into a global movement spanning many countries. I have been actively involved in the OHDSI Asia-Pacific (APAC) community, which has played an important role in strengthening collaborations and advancing research across the region. This global growth is essential—not just for inclusiveness, but because health data, clinical practice, and disease patterns vary widely across populations. Global collaboration ensures that our evidence is more generalizable, equitable, and impactful. Personally, I find it inspiring to see researchers across continents using a shared data model and open tools to answer meaningful clinical questions together. It demonstrates what the global scientific community can achieve when knowledge and data are shared responsibly.
You are a leader of the China Chapter, and China will be hosting the 2025 Asia-Pacific symposium in December. How pleased were you to bring the APAC Symposium to China, and what do you hope people take away from the event?
I am thrilled that China will host the 2025 OHDSI Asia-Pacific Symposium. It’s a wonderful opportunity to showcase the rapid growth of real-world data research and health informatics in the region. I started the OHDSI China Chapter several years ago, and it is deeply rewarding to see it now in capable hands, continuing to grow and expand its impact. The Chapter has made tremendous progress in building data networks, developing local OMOP implementations, and connecting with global collaborators. My hope is that the symposium will foster deeper international collaboration, encourage open science, and inspire the next generation of researchers to join the OHDSI mission. It will also highlight the importance of aligning data standards and AI technologies across countries to advance global health research.
What are some of your hobbies, and what is one interesting thing that most community members might not know about you?
My wife often jokes that I’m a bit boring—probably because I tend to spend too much time talking about data—but I do my best to keep life fun outside of work. Recently, I switched from badminton to pickleball, which has quickly become my favorite way to stay active and connect with friends. Maybe we should organize a friendly pickleball match at the next OHDSI symposium—nothing brings people together like a bit of friendly competition! Lately, I’ve also been experimenting with AI tools beyond biomedicine, exploring their applications in areas such as finance purely out of curiosity and enjoyment.