For localized information and support, would you like to switch to your country-specific website for {0}?
Key takeaways
- Generative AI is expanding what’s possible—but not yet what’s dependable
- Trust in clinical AI still begins with specialized medical algorithms
- The future lies in hybrid intelligence—pairing accountability with enhanced performance
There’s a growing buzz in healthcare: “Can’t we just let generative AI do it all?”
In an era where large language models or generative foundation models can predict protein structures, draft clinical notes, generate trial summaries, and assist in diagnostic reasoning, it’s tempting to think we’ve reached the end of specialized medical algorithms—the pre-trained and validated, task-specific clinical AI models that quietly predict disease progression in heart failure and kidney disease, or flag sepsis in real time.
While they have only recently begun to enter mainstream medical decision-making, compared to the projected capabilities of generative AI, an endpoint-driven, single-purpose, specialized medical algorithm feels almost old-school.
We’re standing at a critical juncture in the evolution of clinical AI. Generative models are rewriting what’s possible—but for now, specialized medical algorithms remain the main regulatory-tested pathway to consistent quality, performance, and trust.1,2
Generative AI in healthcare: The seduction of scale
Two landmark studies published in 2025 illustrate both the pace and the limits of progress.
Epic Systems and Microsoft Research unveiled CoMET, a family of massive generative models trained on more than 300 million patient records and 16 billion encounters. CoMET learned to predict medical events—from diagnoses to healthcare interventions—by simulating “future health timelines.” Remarkably, while it matched or outperformed traditional task-specific, specialized models in some disease areas, in others it didn’t—despite the scale of the data it was trained on.3
At the same time, results from the Delphi-2M model, a modified Generative Pre-trained Transformer (GPT-2 derived) large language model, were published in Nature by researchers from the European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI). It was trained on data from 400,000 UK Biobank participants and externally validated using data from 1.93 million individuals from Denmark. Delphi-2M models the natural history of human disease, simultaneously predicting the rates of more than 1,000 diseases (specifically 1,258 distinct states) covering the complete ICD-10 list. Instead of forecasting only a single outcome, Delphi-2M predicts multiple diseases and their timing of appearance, enabling the simulation of future health trajectories up to 20 years ahead of disease onset and in a multi-morbidity context.4
Yet both studies also disclose their current limitations: generative models are not ready for point-of-care use yet, putting aside the issue of missing regulatory frameworks. Delphi-2M reports limitations with selection and immortality bias, data source missingness, and the achieved prediction performance. Both Delphi-2M and CoMET show inconsistent levels of accuracy across diseases, and, for the latter, efficacy depends on model scale.3,4
Specialized medical algorithms still perform better in diagnosis and risk prediction of diabetes when compared to Delphi-2M. This is also true for CoMET when considering specific chronic disease outcomes, such as 1–3 year risk for chronic heart failure, stroke, heart attack, or atherosclerotic cardiovascular disease in hyperlipidemia.3,4
Why trust in clinical AI still starts with specialized medical algorithms
The adoption of AI in healthcare requires trust, built on the foundation of high accuracy and reproducibility—key components that specialized medical algorithms still deliver. Namely, they are:
- Validated for specific clinical endpoints and measurable outcomes
- Explainable, auditable, and regulator-approved
- Embedded into workflows, ensuring that every patient benefits automatically and equitably
Specialized algorithms have clear input and output parameters, allowing clinicians to understand the foundation of the inference. Generative AI models trained on native healthcare data, on the other hand, rely on learning the grammar and contextual logic of health (as in CoMET) from massive longitudinal data to achieve reproducibility through aggregation over multiple unique, probabilistic simulations.3
The future value of generative AI models lies in their ability to simulate potential paths in a patient’s health journey, providing quantitative, data-driven insights into what is likely to happen next. In other words, proactive rather than reactive care planning.
A proposal for the future of AI clinical decision support tools: Hybrid intelligence
Given the benefits of both approaches for clinical AI, clinicians will be exposed to both specialized medical algorithms and generative foundation models in the future. This “hybrid intelligence” would pair today’s validated specialized medical algorithms with generative AI that learns in the background—refining the understanding of health trajectories and discovering new patterns.
In a hybrid framework, frontline systems stay powered by endpoint-driven specialized medical algorithms that predict, interpret, and decide. On the back-end, generative models continuously learn from real-world data, identifying subtle population patterns and informing the next generation of clinical tools.
Over time, specialized medical algorithms may benefit from generative AI’s ability to consider the broader clinical context, driving the personalization of care. At the same time, fine-tuning a generative model could improve task-specific predictive performance. This “hybrid architecture” is a win-win that would allow physicians to benefit from the strengths of both models. As an example, a hybrid foundational model could trigger a specialized algorithm when necessary or appropriate.
Healthcare-specific foundation models improve by connecting them to trusted clinical knowledge sources, such as knowledge graphs and/or applying retrieval-augmented generation (RAG) (see Table 2). This leads to more reliable and consistent outputs grounded in facts and strengthened by higher-quality reasoning. Dampening the fear of hallucinations and creating mechanisms to validate and audit the performance of foundational models is critical for building trust and broadening adoption among health systems. The reinforcing mechanism of a “hybrid intelligence” model supports the necessary marriage of accountability with performance.
The adoption of AI in healthcare: Why this matters for healthcare leaders
For healthcare enterprise leadership, the adoption of AI in healthcare demands both vision and continuous improvement related to AI readiness. The health systems that will lead in the next decade are those that redesign their infrastructure in an AI-first way—enabling the rapid adoption of mature AI applications while providing safe environments for innovation. Accordingly, when it comes to AI-enabled clinical decision support, health systems should consider:
- Investing in validated and regulatory-approved specialized medical algorithms: These established technologies deliver clinical performance today, safeguard patients, and will likely remain an important component of high-performance enterprise clinical AI solutions.
- Learning from real-world data, benchmarking against clinical truth, and mitigating bias: Create sandbox environments to run new medical algorithms, generative models enriched by RAGs, and/or knowledge graphs in a safe, non-decision-making environment.
- Building strong literacy and organizational readiness: Establish governance frameworks that treat AI as a continuously learning ecosystem, where accountability, validation, and equity are integral.
Medical algorithms and clinical AI: The path forward
We are entering the next era of medical algorithms and clinical AI—one focused on understanding the language of disease and the patient journey at the individual and population levels. Foundational models, by tokenizing longitudinal health records and other sources of patient health data, are helping us to identify patterns of disease progression in novel ways. Much like how Large Language Models learned grammar by reading billions of sentences, foundational models trained on medical data are learning the latent structures that shape disease onset, progression, and the interaction of those structures across a population.4 Such disease models bring us one important step closer to realizing the so-called “digital twin”: a virtual representation of a patient to simulate how health conditions may evolve over time and how different interventions might influence outcomes.
The breakthroughs that foundational models bring to medical science are undoubtedly exciting. However, they need not come at the cost of leveraging safe and effective technologies that can help patients now. As we move boldly into the future of clinical AI, a hybrid intelligence model for the clinical implementation of digital solutions will allow physicians to bring the best of both worlds to their patients.
References
- Pantanowitz, L. et Al. (2025). Nongenerative Artificial Intelligence in Medicine: Advancements and Applications in Supervised and Unsupervised Machine Learning. Available from: https://doi.org/10.1016/j.modpat.2024.100680
- Shanmugam, D. et Al. (2025). Generative Artificial Intelligence in Medicine. Available from: https://doi.org/10.1146/annurev-biodatasci-103123-095332
- Waxler, S. et Al. (2025) Generative Medical Event Models Improve with Scale. Available from: https://arxiv.org/abs/2508.12104
- Shmatko, A. et Al. (2025) Learning the natural history of human disease with generative transformers. Available from: https://www.nature.com/articles/s41586-025-09529-3 (Accessed: 14 October 2025).
- Henriksson, A. et Al. (2023). Multimodal fine-tuning of clinical language models for predicting COVID-19 outcomes. Available from: https://doi.org/10.1016/j.artmed.2023.102695