For localized information and support, would you like to switch to your country-specific website for {0}?
Key takeaways
- The “fast science” of health economics and outcomes research (HEOR) is becoming even faster through advances in AI
- Human oversight is critical—manual (human) validation of results generated through AI must still be performed
- Generative AI will play a prominent role in reshaping the profession of health economics and outcomes research
“Don’t speed. Any mistake is less likely to be catastrophic when driving the speed limit.” This was the first driving lesson many of us received. It was good advice then and it still holds true today, particularly as we witness the rapid application and utilization of artificial intelligence (AI) in healthcare research.
Health economics and outcomes research (HEOR) is the confluence of two fields, with one focused on measuring and valuing the outcomes of healthcare interventions and the other on scientific disciplines that evaluate the effect of healthcare interventions on patients.1 The thought of HEOR as a “fast science” can make some HEOR experts uncomfortable.
This notion has become even more acute with the introduction of generative artificial intelligence (Gen AI) into HEOR. At present, AI remains somewhat divisive within the field. Some researchers are exhilarated, others are cautious or skeptical. But no matter their enthusiasm or trepidation, industry thought leaders are afraid of missing out or being left behind on the potential of Gen AI to transform how HEOR is conducted.
HEOR is a scientific discipline that provides powerful data and insights for healthcare decision makers that is grounded in deliberate and methodical processes built up over the past twenty years. As payers and health technology agencies increasingly relied on real-world evidence to make payment decisions for new healthcare interventions, HEOR pioneered new methods to curate and analyze this data. These HEOR methods, such as systematic literature reviews (SLRs), converting real-world data to real-world evidence, forecasting the budget impact of a new therapy based on cost-effectiveness models, estimating the head-to-head advantage of one therapy over another, and measuring patient-reported outcomes, help payers make informed choices about the drugs, devices, and diagnostics that they choose to pay for.2
Instead of waiting years for a conclusive RCT study to report the economic and patient benefits of a healthcare intervention, payers can estimate the costs and improved outcomes of a new product using HEOR.3 On the Autobahn of evidence, HEOR is the preferred vehicle of choice compared to other research methods, such as randomized controlled trials (RCTs) and prospective registries. While RCTs and prospective registries are considered the most robust methods to determine cause-and-effect, the process can take several years.4
Speed: Applying Gen AI to systematic literature reviews for HEOR
Gen AI represents the most significant upgrade to the field of HEOR’s capabilities in a generation. Researchers in the field of HEOR have demonstrated that some tasks that previously took weeks can now be performed in hours using AI.5 For instance, the first broad application of Gen AI by HEOR has been conducting systematic literature reviews (SLRs). The sheer number of papers published on a major topic can number in the thousands and become overwhelming for any one individual to analyze and summarize. AI is attractive to automate this process, reducing the time and labor typically required for an SLR.
The potential for faster research is like a new engine that promises a significant boost in speed, flexible fuel, and raw power. Initial road tests have demonstrated Gen AI’s ability to create new content, summarize data, and even synthesize data. Gen AI models, particularly large language models (LLMs), can ingest natural language text and perform tasks that require reasoning.5 These models are now being applied to the field of HEOR to generate computer code for data analysis, summarize cost-effectiveness findings across hundreds of research papers, and draft reports for payers when they make value assessments for new healthcare interventions.5 HEOR has always tested the speed limit, but the performance capabilities of the Gen AI engine will necessitate a rethinking of the vehicle’s entire design.
SLRs currently require a trained human (often a graduate student) to read and review hundreds of full-text articles to assess them for quality, bias, and appropriate fit with the intended summary. More advanced SLR skills require data extraction, and the most advanced method involves meta-analysis, which is a quantitative method to determine statistical significance from pooling a number of studies together.6 LLMs can streamline this process by assisting with the most time-consuming tasks. For example, LLMs can be programmed to apply predetermined inclusion and exclusion criteria to quickly identify key studies.7 A human can then work iteratively with the LLM to develop a more robust search strategy based on the preliminary results. LLMs can then be queried to provide reasoning and justification for their selections. Finally, these models can generate concise summaries of the selected studies. There are also emerging examples of LLMs providing assistance with bias assessment by prompting the researcher to answer a series of questions (or prompts) regarding article selection. v
Operator: The role of human oversight in AI for HEOR
While AI models can achieve break-neck speed in performing essential SLR tasks, using this technology is not without risk. Reports are emerging of “hallucinations” in the form of fake citations. Through chain-of-thought prompting intended to enhance LLM performance on reasoning-based tasks, the LLM may identify the “perfect” citation for a report, but that perfect citation may not actually exist.8 For this reason, manual (human) validation of results generated through AI still needs to be performed.
Since there is currently no operator’s manual in the form of best practices and guidelines for AI-enabled HEOR research, it is not yet clear if engine failures are due to the user’s driving skills or the inherent limitations of the current LLMs.As researchers in the field of HEOR gain more experience with AI, good practices and guidelines will emerge. However, this will require researchers to be rigorously transparent in sharing their mistakes for the benefit of the community. Good practices and guidelines will be needed to solidify the critical role of human oversight in AI-assisted HEOR.
New fuel: The refining of unstructured data
An increasingly vast amount of patient-generated data is being collected for research, but currently only 20 percent of this data exists in a structured format. At the moment, every researcher who works with real-world data is competing for access to the same structured datasets.
AI has the potential to increase access to unstructured data. One example of this new “fuel source” is electronic medical record (EMR) data. Only a fraction of EMR data is structured in a format that can be analyzed with real-world evidence methods. The potential to automate the manual task of reviewing patient charts can accelerate the classification of unstructured data into analyzable datasets. It should be noted, however, that the accuracy of current AI models in mapping descriptive text to medical codes has been questioned. As a result, researchers are working to improve the prompts they provide for these tasks with the goal of increasing the accuracy of converting unstructured text to the correct structured data field.9
Engine power: The impact of foundational models on HEOR
Foundational models are large, pre-trained AI systems that can understand and generate text, images, or other data. They serve as a base for a wide range of applications, allowing users to fine-tune them for specific tasks, like improving healthcare decision-making or automating processes. These models help boost efficiency and innovation across various industries, including healthcare.10
This power comes at a cost, however, as Gen AI is not green. Generative, multipurpose AI systems require significant energy consumption to perform tasks. For example, generating a single AI image (one of the most energy-intensive tasks) uses as much energy as fully charging a smartphone.11 Awareness of the environmental costs related to energy consumption will increase as professions such as HEOR incorporate LLMs into their routine work. Most HEOR work is currently centered on text classification, where model training is among the least energy-intensive tasks. However, as the field progresses towards broad adoption of model inference tasks – where a previously trained learning model draws conclusions or makes predictions from a new dataset – the environmental costs will increase significantly.12
An HEOR-specialized model—trained specifically on HEOR methods, data, and publications—is not on the immediate horizon. The previously mentioned engineering challenges of developing good practices will need to occur before the HEOR engine can be deployed responsibly at scale. Any task that involves qualitative aspects, such as assessment for publication bias, appropriateness of statistical methods, and judgment of conclusions to be fair and balanced, highlights the continued importance of human oversight.13
Navigating the AI road ahead
While artificial intelligence can advance the speed of innovation, this technology will not help us with navigation. Simply put, you can go in the wrong direction faster. Pursuing multiple dead-end roads at a faster rate will eventually help us to plot the right course, but there is currently no GPS feature built into Gen AI. Ironically, as the sheer volume of Gen AI-enabled HEOR publications increases, the demand for AI-driven literature reviews will also increase. AI will create its own headaches and its own markets by creating new challenges that will require AI to solve – generating its own demand in a circular fashion.
Faced with the prospect of a radically redesigned profession, new questions are emerging for HEOR. Should we expect HEOR evidence to be synthesized faster? Should we expect faster healthcare decision-making to translate into faster patient access to new therapies? What is the risk of adopting AI too slowly? It’s becoming harder to recall those common-sense lessons from the past. Researchers who are willing to test the limits of the Gen AI engine will set new world records…or crash spectacularly. The wisest path for most of us is likely a gradual acceleration, relying on human oversight, prudent judgment in selecting data inputs, and manual validation of outputs. One thing is for certain, however, the Gen AI era will eventually render the old speed limit obsolete.
For more insights from The Professional Society for Health Economics and Outcomes Research (ISPOR) visit: https://www.ispor.org/
References
- ISPOR. (2024) Available from: https://www.ispor.org/heor-explained
- Sanders G D et al. (2016) JAMA, 316(10):1093-103 Available from: https://jamanetwork.com/journals/jama/article-abstract/2552214
- Power. (2024) Available from: https://www.withpower.com/guides/health-economics-and-outcomes-research-heor
- Wong C. et al. (2019) Biostatistics, 20, 273-286. Available from: https://academic.oup.com/biostatistics/article/20/2/273/4817524
- Fleurence R L et al. (2024) Value Health. S1098-3015(24)06754-8. Available from: https://pubmed.ncbi.nlm.nih.gov/39536966/
- Egger M et al. (1997) BMJ, 315(7121):1533-7. Available from: https://pubmed.ncbi.nlm.nih.gov/9432252/
- Jin Q et al. (2024) Nature Communications, 15(1):9074. Available from: https://www.nature.com/articles/s41467-024-53081-z
- Sallam M. (2023) Healthcare (Basel), 11(6):887. Available from: https://pubmed.ncbi.nlm.nih.gov/36981544/
- Fleurence R L et al. (2024) Value Health. 27(6):692-701. Available from: https://pubmed.ncbi.nlm.nih.gov/38871437/
- Moor M et al. (2023) Nature, 616(7956):259-265. Available from: https://pubmed.ncbi.nlm.nih.gov/37045921/
- Luccioni A S and Strubell E. (2024) FAccT ’24. Available from: chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://facctconference.org/static/papers24/facct24-6.pdf
- Douwes C et al. arXiv, 2107.02621v2. Available from: https://arxiv.org/abs/2107.02621
- Jansen JP et al. (2014) Value Health, 17(2):157-73. Available from: https://pubmed.ncbi.nlm.nih.gov/24636374/