In the latest edition of the Collaborator Spotlight, Desai discusses her career journey, the STARR-OMOP initiative, why she believes medicine mirrors astrophysics, watching her daughter enter the health data science research field, and more.
Can you discuss your background and career journey?
I grew up in India and studied Physics at the Indian Institute of Technology, and then I came to UT Austin to pursue a PhD in Theoretical Math. A detour led me to a dream role at the Harvard-Smithsonian Center for Astrophysics, where I worked on NASA’s Chandra X-ray Observatory. I spent over a decade as a research scientist analyzing X-ray spectra of Sun-like stars—essentially using data to understand underlying physical processes.
Later, at Stanford’s Solar Physics Center, while studying a phenomenon called white-light flares, I became curious about applying machine learning methods. This was around 2012, just as Coursera launched. A group of us began experimenting with ML to predict solar flares, and I realized what we’d been doing in astrophysics—looking for signal in all the noise, extracting patterns from massive datasets—was now called data science.
I became increasingly interested in this field and moved to the Genetics department to develop algorithms for a cloud-based recommender system for Biomedical literature called Scireader. When I was growing up, Biology was not a computational science, but I was watching that paradigm change happen in front of my eyes. The HITECH Act had been passed, and healthcare was undergoing a digital transformation—encompassing electronic health records, wearable technology, and cheaper storage—suddenly making it possible to link and analyze huge volumes of multimodal clinical data.
What struck me was how medicine mirrored astrophysics: different data modalities—labs, imaging, vitals—offer fragmented views of the same patient, just like different wavelengths reveal different facets of the same star. To truly understand either, you need to bring those pieces together.
As a data junkie, I was fascinated by the amount of data we were collecting. There was genetic data, patient data, wearables data, and socio-economic data. In 2016, Stanford initiated the development of a multimodal data lake to integrate clinical data across various modalities. When I learned about that effort, I knew I wanted to be part of it. The mission deeply resonated with my interests, and that’s what led me to where I am today, working to advance data-driven healthcare.
How did you first get involved with OHDSI, and what aspects of the community do you find most inspiring?
I first discovered OHDSI when I joined the team developing STARR and was introduced to the OMOP Common Data Model. I attended my first OHDSI symposium in 2018 and was struck by the warmth, camaraderie, and genuinely welcoming atmosphere. As someone new to biomedical informatics, I was acutely aware of how much I had yet to learn, but the community’s openness and support made a lasting impression. That experience made me want to be part of the OHDSI community.

Priya stands with members of the Stanford team
What is STARR-OMOP, and what role do you play in that initiative at Stanford?
STARR-OMOP is Stanford’s next-generation clinical data warehouse, containing Electronic Health Records (EHR) data from the three Stanford hospitals, as well as associated clinics, in the OMOP CDM Ver 5.3. It is a core component of the STAnford Medicine Research Data Repository or STARR ecosystem, which is Stanford’s single integrated data lake containing clinical data of different modalities. STARR contains structured and unstructured, raw, and “analysis-ready” data, a HIPAA-compliant Big Data computing platform, as well as tools to analyze this data, which are completely hosted on the Google cloud platform.
We have three flavors of OMOP, all in BigQuery: a fully identified STARR OMOP containing clinical notes and flowsheets; a PHI-scrubbed STARR OMOP “confidential” version of the fully identified STARR-OMOP; and a STARR OMOP “confidential lite” which contains only the structured data, ie without notes or flowsheets. The key distinguishing aspect of the OMOP data at Stanford is that the PHI-scrubbed OMOP is available to any Stanford researcher pre-IRB via the HIPAA-compliant big data platform. We currently refresh our OMOP.
I am part of the larger team that builds and maintains the infrastructure for STARR, and I lead the Data Science and Service team, which is responsible for ensuring data integrity, quality, and standardization of STARR products, as well as custom data extracts for researchers.
Are there any current or recent projects at Stanford that you’re particularly excited about and would like to share?
Yes! We have a couple of very cool initiatives that the entire team is working on. In fact, we submitted abstracts on almost all of them for the 2025 OHDSI Global Symposium. One major initiative is to integrate all multimodal Oncology data into STARR and subsequently bring it into OMOP. As part of that project, we have recently expanded our CDM to include the image_occurrence table, which links all the radiology dicoms to the OMOP procedure_occurrence, visit_occurrence, and potentially the PHI-scrubbed dicom. We have brought in the Stanford Cancer Registry into STARR, and are in the process of bringing the Genomics data into STARR. The holy grail would be to bring all these data modalities into OMOP.
OHDSI is deeply rooted in open science and collaboration. How important are those principles to the work you do?
Collaboration is at the heart of our approach—be it at the team level, department level, institute level or globally. Stanford has benefited immensely from being part of the OHDSI community. We are now part of multiple network studies, which have led to some really interesting findings using real-world observational data. As our faculty, students and staff have become more involved in the community, they have been able to identify and collaborate with like-minded researchers, forming working groups that focus on their specific research interests; the Perinatal and Reproductive Health Working group is a great example of that. In turn, this has led to the integration of more clinical data domains into OMOP. It has allowed new partnerships to arise. Working as a multi-institutional extended team has given us the opportunity to learn from one another, share insights and code, and foster mentorship and professional development. The OHDSI community is a living example of the transformative impact large-scale collaboration can have.

Priya and Pooja Desai, from the 2023 Global Symposium
Beyond being a researcher, you’re also a parent to a future researcher—your daughter is pursuing her PhD at Columbia. What advice do you share about getting involved in biomedical informatics, and what excites you about how the next generation might shape the future of healthcare?
You know, it’s interesting—my daughter and I came into biomedical informatics through very different paths, and yet we’ve found ourselves meeting in the middle through this incredible, interdisciplinary community. Watching her pursue her PhD at Columbia has been both humbling and inspiring.
The advice I always give her—and really, to anyone entering the field—is to lead with curiosity and remain a lifelong student. Talk to as many people as you can, even outside your immediate area of focus. Ask questions, even the ones that feel obvious. Some of the best insights come from conversations where you’re just genuinely trying to understand how someone else thinks.
What excites me most about the next generation is their openness to questioning long-held assumptions. They’re not afraid to challenge existing routines, to reimagine what’s possible, and to bring in tools and perspectives that didn’t exist a decade ago. That kind of creativity and courage is exactly what we need to shape a more equitable, innovative future for healthcare.
What are some of your hobbies, and what is one interesting thing that most community members might not know about you?
In my free time, I enjoy hiking, gardening, and cooking. I am also an artist!