28 mei 2025
It’s time to modernize medical device regulation
European AI regulation is some of the world’s most robust - and rightly so. The 2024 AI Act is a vital first step to establish a clear EU regulatory framework for AI. It prevents harm across diverse industries including the healthcare sector, with risk-based classification, transparency, and human oversight requirements.
The more comprehensive, baseline product–safety regulation for European medical AI tools, however, presents a challenge for innovators, physicians and patients alike.
The Medical Devices Regulations’ (MDR) strict design, development, and post-market surveillance requirements mandate that manufacturers certify their products for each distinct intended use case, detailing the specific patient group, inputs, outputs, and clinical claims.
Why the MDR falls short for generative AI
For hardware, it makes sense. But medical generative AI, by nature, is almost impossible to validate in this way. Medical large language models (LLMs) are particularly good at generalization. They are swiss army knives; they can create a near infinite range of outputs and are not limited by any one use case. Indeed, medical LLMs are highly likely to have reasoning capabilities which go beyond their intended use.
Notified Bodies have specifically said that “practice has shown that it is difficult for manufacturers to sufficiently prove conformity for AI devices, which update the underlying models using in-field self-learning mechanisms.
Currently, notified bodies do not consider medical devices based on such models to be "certifiable", unless the manufacturer takes measures to ensure the safe operation of the device within the scope of the validation described in the technical documentation."
AI innovators must therefore secure MDR approval for one specific indication and patient population, then request to broaden the scope with a new indication and/or patient population every time. Certifying a medical-grade LLM for general clinical use would take decades.
We've spoken with various Notified Bodies, and been told "everything is possible as long as we demonstrate conformity to the GSPRs (General Safety & Performance Requirements), with appropriate clinical data, or non-clinical data in the case of article 61(10) applying to specific GenAI medical device software [which allows manufacturers, in certain circumstances, to demonstrate conformity based on non-clinical testing methods alone]".
But this approach is clearly flawed: it would leave every AI company to figure out compliance themselves, with notified bodies reviewing devices on a case-by-case basis.
The consequences of inaction: eroding public trust
This approach - brute forcing a square-peg innovation through round-hole regulation - increases delays to patient care, by confining proven innovations to highly limited use cases. As a result, we are likely to see innovations move to the US or other large markets.
It creates a significant risk of damaging public trust in healthcare regulators, as patients look to countries which have moved faster to regulate emerging technologies according to their risks, benefits, and unique properties.
As Derraz et al. (npj Precision Oncology, 2024) put it, “As the benefits of these therapies become known to the public, they will expect the regulations of their own country to allow access to these therapies. If this does not happen, public support for their national frameworks are likely to be eroded.”
New guidance is required to ensure all companies are assessed equally, and require the same evidence in order for all European AI innovations to get to market in time to prove their benefits to patients.
Setting the precedent
Recognizing these challenges, some regulators are beginning to forge new paths. South Korea's Ministry of Food and Drug Safety (MFDS) issued specific guidelines in January 2025 for approving generative AI – particularly LLMs and LMMs – as medical devices (Park et al., 2025).
The Korean MFDS guidelines focus specifically on LLM/LMM-based software tools directly involved in patient diagnosis, treatment, or prognosis, distinguishing them from more generalist AI systems or simpler tools like medical dictation software.
Key considerations in their framework include:
Detailed Intended Use & Warnings: To mitigate the higher risk of off-label use associated with versatile LLMs compared to conventional AI, the guidelines explicitly require manufacturers to provide comprehensive intended usage details (defining purpose and indications) along with clear warnings against such off-label use. This information must be delivered via user instructions, similar in concept to medication package inserts.
Specific Performance Evaluation: Addressing the complexity of evaluating free-text outputs, the guidelines mandate clinical evaluation of model performance by multiple expert clinicians from the relevant fields. This evaluation uses structured methods, such as grading the clinical significance of any errors. Automated metrics (e.g., BLEU, ROUGE, METEOR, accuracy, F1 score) are used supplementarily.
Managing Unique LLM Risks: The guidelines acknowledge risks like misinformation, explainability challenges, and the potential for automation bias.
However, instead of mandating specific technical features (like uncertainty indicators or explainability tools ), the primary mitigation strategy is to restrict the device's use exclusively to qualified clinicians in the relevant fields, with this limitation clearly stated in the user instructions.
Potential inconsistency due to prompt sensitivity or stochasticity, and the risk of model performance drift over time, are acknowledged, with general recommendations for periodic evaluation but without specific mandatory actions or monitoring requirements stipulated yet.
These guidelines are still preliminary, with significant limitations. Nonetheless, the South Korean initiative provides a tangible example of the dedicated regulatory thinking needed for generative AI in medicine.
In Europe, we need to find a similar approach — developing tailored regulatory frameworks to protect patient safety and data sovereignty without stifling innovation. Learning from international efforts and fostering collaboration between regulators, developers, and clinicians are crucial steps. If we do nothing, the risk is clear: untenable delays to patient care. The time to act is now.
28 mei 2025
Rethinking the MDR
Rethinking the MDR
2 apr 2025
Delphyr M1: Small & Powerful
Delphyr M1: Small & Powerful
4 dec 2024
Delphyr Receives Funding from INH
Delphyr Receives Funding from INH
1 aug 2024
Team Expansion: Frederick
Team Expansion: Frederick
1 jun 2024
Team Expansion: Robin
Team Expansion: Robin
7 jan 2024
Partnership Reinier de Graaf