Health Outcomes and Medical Effectiveness Research (HOMER): Risk identification and comparative effectiveness system, enabling real-time exploration of the effects of medical products.
Observational health care data, such as electronic health records and administrative claims, offer tremendous opportunities to study health, disease, and medical products. The current paradigm for big data analysis in health care is largely episodic in nature: a researcher posts a hypothesis about a specific association, such as a medical product-outcome relationship, designs an observational analysis plan to test that hypothesis, executes the analysis against an available observational dataset, and — particularly when traditional measures of statistical significance (p<0.05) are met — attempts to disseminate the findings through peer-review publications and other forums. Typically, the test for causal effects is largely focused on attempting to produce an unbiased estimate of the strength of association, and determining whether a relative risk metric is of sufficient magnitude to reject a null hypothesis of no effect. A few problems arise from this current paradigm:
- The current research process is highly inefficient — evidence is generated to support one hypothesis at a time, and the number of questions about disease and medical products that patients and providers deserve reliable evidence about are growing at a pace that outstrips the output of the entire research enterprise. For example, across all pharmaceutical drugs and all health outcomes of interest, only 4% of combinations have evidence in the published literature from randomized clinical trials or observational studies; 96% of the potential questions remain unasked and therefore unanswered.
- The evidence from our research process is not reliable — estimates of strength of association from observational database analyses are subject to systematic error which bedevils the field of epidemiology. Repeated examples illustrate the challenge in proper analyses, as different research groups attempting to answer the same question on the same data generate conflicting results (such as bisphosponate-esophageal cancer, pioglitazone-bladder cancer), and findings across observational databases on the same issue fail to replicate (such as flouroquinalone-retinal detachment, dabigatran-bleeding). Issues of data source heterogeneity and method parameter sensitivity make it critically important to explore multiple databases and multiple analysis choices when addressing a particular product-outcome association, but conducting multiple large-scale analyses across disparate sources and synthesizing results across the analyses is difficult.
- The evidence we generate is insufficient to address questions of causality — most observational database analyses provide estimates of strength of association, and when statistically significant findings are observed, offer post-hoc rationalizations for biologic plausibility. Austin Bradford Hill outlined in 1965  many facets that bear consideration when considering a causal effect, including strength of association, plausibility, consistency, temporality, biologic gradient, analogy, specificity, experiment, and coherence. These viewpoints have been applied to specific pharmaocovigilance analyses , but have not been consistently adopted or systematically applied in the context of observational data studies. An open opportunity for the novel use of observational data involves developing exploratory analyses for each of these causal dimensions, as well some novel dimensions, to strengthen the interpretation of any purported effect. In addition we propose to develop quantitative metrics associated with each of these dimensions.
- The current use of observational healthcare data and results are static — patient-level data are summarized in a series of statistics that populate tables in a manuscript. The level of detail provided about the underlying data and analysis methods applied to the data is often not sufficiently transparent to evaluate the integrity of the study, and because the patient-level data are not publicly available, the analyses are often not reproducible. Yet, most study results stimulate more questions than they answer. For example, if we find that dabigatran causes bleeding, the community will immediately want to go further to ask: is the effect observed for all indications of the treatment or for all patient subgroups within each indication? Do other anticoagulants have similar effects? If the drug causes gastrointestinal bleeding, are there other hemorrhagic conditions that it is also associated with? Do observed associations persist as observational data accumulate, health care delivery evolves, and the practice of medicine learns from prior work to develop interventions intended to maximize benefits and minimize risks of treatments?
- Addressing these questions requires a more iterative approach to data analysis, one which facilitates exploration of summary results while protecting patient privacy, through coordination of an observational data network of disparate sources that provide timely access to current summary analysis results on an on-going basis.
To address these challenges, we are in the process of designing, implementing, and deploying the Health Outcomes and Medical Effectiveness Research (HOMER) system. HOMER is an interactive visualization platform to enable researchers to explore medical associations across a network of observational health care databases. We will develop open-source large-scale standardized analytics that extract summary statistics from patient-level longitudinal databases, and a web-based interface that allows real-time exploration of all summary statistics in a way that facilitates transparency of the underlying data and analysis results while protecting patient privacy.
This work will require big data thinking on multiple dimensions: observational healthcare databases are growing substantially, with many databases having a volume covering over 100 million patients and capturing tens of billions of clinical observations. Across a data network, there can be tremendous variety in the populations covered and the data capture process employed within electronic health records and administrative claims. The veracity of these data sources principally lie in our ability to translate the data captured into an accurate and complete depiction of each patient’s health experience, and from those depictions, attempt to infer unbiased population-level effects of medical products. Healthcare data offers substantial velocity, with clinical observations captured every day, and large-scale analyses are expected to be executed on a regular basis, if not real time, to ensure the timeliness of all evidence generated. However, the “big data” problem goes beyond the patient-level sources; for example, with 10,000s of medical interventions and 1000s of health outcomes of interest to patients, a comprehensive summary of all potential effects constitutes “big results”; if we estimate that 1000 summary statistics will be needed to properly characterize a single drug-outcome effect, then the result set to explore all drugs and all outcome should be expected to exceed 10 billion.
Thus, the summary result set requires a novel approach to exploration, as no human being can manually review all this information to discern effects of potential interest or to gauge the accuracy of the information to rule out purported effects. It requires a large-scale exploration framework built on interactive visualizations that allow the researcher to filter, zoom, pan results and link to orthogonal analysis components to learn and tell an evidence-based story about effects of medical products or reasons for associations. Moreover, large-scale analysis results will allow for large-scale evaluation of the performance and reliability of the analysis methods, providing further evidence about how to learn from the results and how much confidence can be placed on any given data point.
The HOMER framework will begin from Sir Austin Bradford Hill’s causal considerations. We will develop large-scale analysis solutions for each causal component: strength of association, consistency, temporality, experiment, plausibility, coherence, biologic gradient, specificity, and analogy. Each component will comprise both a novel approach to produce summary statistics from patient-level data as evidence generation, plus a novel approach to visualize the summary results as part of the evidence dissemination strategy. Within the application, you will be able to select any drug and any health outcome of interest and explore all evidence associated with that drug-outcome relationship within each of Hill’s criteria. Doing these analyses at scale means developing tools that are computationally efficient enough to produce summary statistics with datasets containing millions of patients, using sophisticated regularization strategies for confounding adjustment across millions of covariates, and being able to apply the tool across the millions of drug-outcome pairs that users may be interested in exploring.
All software developed as part of HOMER will be made open-source and publicly available so that all stakeholders with patient-level data can execute analyses and explore their results themselves, but also share their data source summary results with the broader community.