For localized information and support, would you like to switch to your country-specific website for {0}?
Key takeaways
- Medical and/or pharmaceutical evaluation frameworks are ill-suited for dynamic AI and digital health technologies
- Building clinical trust and adopting continuous evidence generation methods are vital for scaling AI
- Structural changes, a shared language, and public-private collaboration are necessary to overcome regulatory challenges and advance health technology assessment globally
The integration of artificial intelligence and digital health technologies into health care promises sweeping solutions to systemic problems such as workforce shortages, soaring costs, and the high burden of chronic diseases.1 Yet, a profound gap exists between this potential and practical deployment, largely due to evaluation systems that are not “fit for purpose” for dynamic, evolving technologies.2,3 To highlight a recent report from the London School of Economics and Political Science (LSE), Healthcare Transformers hosted a virtual panel of experts to reflect on the crisis of evidence and propose a path forward for health technology assessment and responsibly scaling digital health and AI technologies.
After a presentation by Robin van Kessel of LSE Health, the panelists discussed a wide range of issues: clinical trust, regulatory fragmentation, evidence generation, and the need for structural change and cross-sector alignment.
Moving beyond promises
Robin van Kessel, of LSE Health, set the stage by noting that health care systems are in “dire straits” due to financial constraints and accelerating workforce shortages. Investing just 24 US cents per patient per year in digital health could save over two million lives over the next decade.4 However, adoption is stalled because the discourse is driven by “promises, not evidence”. Current evaluation processes, derived from pharmaceuticals, are “heavily skewed towards patients,” often ignoring health care professionals’ unique needs, and are ineffective mechanisms for evaluating digital health and AI technologies. Van Kessel noted that while most AI technologies currently classified under Software as a Medical Device are low-to-medium risk, the impending EU AI Act is likely to classify medical AI systems as high risk. The LSE report’s key contribution is an “evidence-based taxonomy for professional facing digital health and AI technologies,” a “building blocks framework” based on seven dimensions (e.g., intended use case, data inputs, driving technology) designed to allow evaluators to compare “apples to apples”. Furthermore, while randomized control trials (RCTs) remain the gold standard, probabilistic methods like Bayesian analysis are being recognized for their particular relevance to communicating uncertainty for inherently probabilistic AI technologies.
The imperative of clinical trust for digital health and AI
Professor Jochen Klucken, Professor and Chair of Digital Medicine, University of Luxembourg, focused on the clinical perspective, emphasizing that scaling AI requires acceptance and trust of health care professionals. He noted that doctors currently have difficulties with AI, and that significant education is necessary. Beyond trust, the evidence must prove real clinical impact, as traditional metrics like precision and accuracy are not enough. There is also a fundamental methodological mismatch: AI inherently gives a “probabilistic answer,” unlike the binary outcomes of classical RCTs. He advocated for Bayesian models to evaluate AI effectiveness, which better align with the probabilistic nature of both AI and a doctor’s decision-making built on a priori knowledge.
The risks of regulatory failure in health technology assessment
Antonio Spina of the World Economic Forum mentioned that “public private collaboration has always been fundamental” to regulation and health technology assessment, but the changing nature of digital and AI innovation—with continuous life cycles and new actors—demands new models of collaboration, specifically the co-creation of regulatory assessment. He warned of the consequences if stakeholders fail to convene and align, namely: regulatory fragmentation (inconsistencies and duplication), misdirected regulation (which could be too restrictive or too lax regarding emerging risks like model drift), and ultimately loss of competitiveness. The biggest risk, however, is that already strained health systems will be unable to address modern global health challenges unless technology can be responsibly scaled.
Embracing continuous evidence generation
Speaking from the innovator’s perspective, Chaohui Guo of Roche Diagnostics affirmed that the fundamental challenge is that dynamic AI solutions cannot be measured with traditional static RCTs. Guo advocated for a definitive shift toward a “more pragmatic and continuous evidence generation approach,” embracing pragmatic clinical studies, real world evidence, and simulation-based methodologies. Guo hoped that such an alignment would be a “gamechanger for evidence generation”, and that harmonization would provide “clarity and transparency” to innovators. Furthermore, if health technology assessment explicitly values data from wearables and electronic health records, it will incentivize investment in robust data infrastructure, and encourage a shift towards a health system that learns continuously.
The grim reality of health technology assessment
Katarzyna Markiewicz-Barreaux, AI Strategic Intelligence Lead at Phillips and a Health Technology Assessment Committee Co-chair at MedTech Europe, provided sobering statistics to underscore the urgency of the issue. She revealed that research indicates that only a very low percentage of AI medical devices have sufficient evidence to support a full Health Technology Assessment, particularly regarding economic impact and safety, even after receiving a CE mark.5 Markiewicz-Barreaux noted that this lack of evaluation is dangerous, as improper use risks technology making crucial mistakes at clinical sites, and eroding trust. She described the process for innovators as “Russian roulette” due to a lack of transparency regarding evidence requirements.
Structural change is essential for progress
George Wharton provided structural guidance, identifying the current inertia as a “problem of coordination” because “no single actor is responsible for the whole process” of approval, HTA, adoption, and reimbursement. For professional-facing tools, the lack of defined “outcome measures” (e.g., workflow efficiency versus time saved) reinforces risk aversion. Distinguishing between “necessities” and “niceties,” Wharton argued that the “essential” interventions are those that address structural problems and integrate entire classes of evolving technologies systematically. The absolute “prerequisite” is a “shared language and taxonomy”—like the one provided in the report—because regulators use risk-based classifications while HTA agencies use function-based categories, making coordination difficult. He emphasized the necessity of aligning “evidence standards to be aligned with those functions, with those use cases” and committing to “iteration and experimentation” to prevent the evaluation gap from widening.
Building a robust structural foundation to ensure adequate health technology assessment
The discussion confirmed that health care systems are operating under a self-imposed paradox: desiring rapid AI innovation while maintaining static, risk-averse evaluation gates. The speakers issued a clear mandate for next steps: policy efforts must focus on structural, systemic factors. This requires immediate investment in developing a shared language and taxonomy across jurisdictions and stakeholders, coupled with the adoption of continuous, pragmatic, and probabilistic methods (like Bayesian analysis) for evidence generation. Ultimately, public and private sectors must co-create a system that commits to rapid iteration and experimentation, ensuring the regulatory architecture is flexible enough to manage technologies that change and relearn over time. The future of health care hinges on whether policymakers can pivot from incremental improvements to building a robust structural foundations that ensures adequate health technology assessment.
AI governance for healthcare: A digital health evaluation framework
Solving systemic adoption barriers for digital health technologies requires frameworks that can solve issues in AI governance for healthcare.
Download White PaperReferences
- Meskó, B. et Al. (2018). Will artificial intelligence solve the human resource crisis in healthcare? Available from: https://bmchealthservres.biomedcentral.com/articles/10.1186/s12913-018-3359-4
- Farah L. et Al. (2024) Suitability of the Current Health Technology Assessment of Innovative Artificial Intelligence-Based Medical Devices: Scoping Literature Review. Available from: https://www.jmir.org/2024/1/e51514/
- Park, Y. et. Al (2020). Evaluating artificial intelligence in medicine: phases of clinical research. Available from: https://pubmed.ncbi.nlm.nih.gov/33215066/
- World Health Organization (WHO) & International Telecommunication Union (ITU). (2024). Going digital for noncommunicable diseases: The case for action. Available from: https://www.who.int/publications/b/71552
- Farah, L. et Al. (2023). Are current clinical studies on artificial intelligence-based medical devices comprehensive enough to support a full health technology assessment? A systematic review. Available from: https://doi.org/10.1016/j.artmed.2023.102547