A Comparative Analysis of AI Safety Assurance: From Aerospace’s Formal Verification to Pharmaceutical Risk-Based Validation

A split image comparing a rigid geometric blueprint (aerospace) to a flexible network centered on a human hand (pharmaceutical AI).

Executive Summary and Introduction

This report addresses a critical question at the intersection of technology and public safety. How should we assure the safety of artificial intelligence (AI) in high-stakes fields like medicine? And how do these standards compare to the gold standard of aerospace?

The central argument is that the safety paradigms for aerospace and pharmaceuticals are, and should be, distinct.

This distinction is not a matter of one industry having “lower” standards. Instead, it is a necessary and rational adaptation. This adaptation is driven by fundamentally different risk profiles, technologies, and definitions of success.

The aerospace industry faces the risk of singular, catastrophic failures. It has built a paradigm around achieving provable correctness. This “Safety-I” approach uses deterministic methods, like formal verification, to mathematically prove that a system is free from error. It is a world of absolutes, designed for systems that can be decomposed and whose failure modes are predictable.

The pharmaceutical industry, by contrast, operates in a world of inherent biological variability. Its “Safety-II” paradigm is not about proving perfection. It is about managing risk and variability to ensure success. This has led to a validation framework based on “fitness for intended use,” GxP compliance, and robust data integrity.

The introduction of modern AI breaks the deterministic model of aerospace. This is especially true for generative models that can “hallucinate” or confidently produce false information. However, this new technology can be accommodated by the risk-based, human-centric model of pharmaceuticals.

The industry, with regulatory guidance from the FDA, is developing a sophisticated credibility assessment framework. This framework relies on a defined Context of Use (COU) and a Human-in-the-Loop (HITL) model. In this model, expert scientists serve as the final validation gate.

This report further illustrates these distinctions through a practical case study of Moderna’s AI governance.1 This case study demonstrates how the Human-in-the-Loop (HITL) model is applied. It serves as the crucial validation and safety control for generative AI in a regulated environment.

Ultimately, this analysis concludes that aerospace’s formal verification and the pharmaceutical industry’s risk-based credibility assessment are not equivalent. However, they are equally valid and appropriate. Each is a rigorous and sophisticated response to the unique challenges of its domain.

Part I: The Aerospace Paradigm – A Standard of Provable Correctness

The field of aerospace engineering operates under one of the most demanding safety regimes of any industry. The consequences of system failure are immediate, catastrophic, and public. This has driven a culture and a set of engineering disciplines aimed at achieving the highest possible levels of reliability and safety.

This paradigm provides a benchmark for high-assurance systems. It serves as a critical point of comparison for any technology intended for use in a safety-critical context. The foundation of this approach is not merely to test for failures but to mathematically prove their absence. This creates a standard of provable correctness that shapes every aspect of development.

Foundations of Avionics Software Assurance

The core challenge in aerospace is ensuring the absolute dependability of highly complex systems. These systems operate in harsh, unforgiving environments where the financial and human cost of failure is immense.3

This has led to the development of rigorous, prescriptive standards that govern the entire lifecycle of avionics systems. The most prominent of these are DO-178C for software and DO-254 for hardware.3

  • DO-178C: Software Considerations in Airborne Systems and Equipment Certification
  • DO-254: Design Assurance Guidance for Airborne Electronic Hardware

These are not merely guidelines. They are integrated into the formal certification process required by authorities like the Federal Aviation Administration (FAA).3 The primary objective is to provide a body of evidence. This evidence must demonstrate that a system meets its specified requirements with an exceptionally high degree of confidence.

This entire philosophy is predicated on a specific view of systems. It assumes systems can be logically decomposed into constituent parts.

The behavior of these parts is considered bimodal. They are either functioning correctly according to their specification, or they have failed.5

This bimodal assumption is critical. It enables a systematic approach to failure analysis, such as Fault Tree Analysis or Failure Mode and Effects Analysis (FMEA). Using these methods, engineers can trace the potential propagation of a single component failure through the entire system.

The goal is to design a system where no single failure can lead to a catastrophic outcome. This deterministic, decomposable view of the world is the intellectual bedrock of the entire aerospace safety assurance framework.

Formal Verification Explained: The Pursuit of Absolute Correctness

To meet the stringent requirements of avionics certification, the aerospace industry has embraced methodologies that go far beyond conventional testing. The most rigorous of these is formal verification (FV).

Formal verification (FV) is a set of mathematical techniques.3 It is used to prove or disprove the correctness of a system’s design against a formal, unambiguous specification.

This differs from testing. Testing can only show the presence of bugs for the specific cases tested. Formal verification, in contrast, aims to provide a mathematical proof that the system is free of certain errors for all possible inputs and states.3

The principal advantage of formal verification is its potential to guarantee the “complete absence of error against a design specification”.3

This guarantee is achieved through methods like model checking.8 Model checking involves an exhaustive exploration of the system’s state-space. The goal is to determine if the system can ever enter an undesirable state, such as one that violates a safety property.

By applying these techniques early, engineers can identify and correct deep design flaws that might be missed by thorough testing.7 This provides correctness guarantees. Such guarantees are indispensable for systems where the consequences of failure are catastrophic.6

For example, NASA used model checking to analyze safety properties in its Runway Safety Monitor. This process exposed potential design problems before deployment.8

However, this pursuit of absolute correctness comes at a significant cost. Formal verification is a highly complex, computationally intensive process that requires specialized expertise.3 It is most effective on systems with well-defined, deterministic behavior.

Traditional FV approaches also present a challenge. They are often applied late in a “waterfall” design process. This can lead to significant rework if flaws are discovered.3

In response, developers created newer methodologies like “continuous verification.” These methods integrate formal techniques into modern, agile development workflows.3 This allows for iterative verification as the system is built.

This new methodology directly addresses the limitations of “waterfall” verification.3 Continuous verification integrates with modern Agile development. It allows for the partial and iterative verification of models as they are implemented.3

This approach enables the early detection and correction of errors. It also helps ensure the verified design is accurate and representative of the final executable on the target hardware.3 In some implementations, engineers perform this verification on the target hardware itself. This ensures a verified, executable model in real-time as the application evolves.3

Redundancy and Fault Tolerance: Designing for Failure

While formal verification strives for perfection, the aerospace paradigm also plans for its absence. Recognizing that perfect components are an impossibility, the paradigm heavily relies on redundancy and fault tolerance.

The objective is to design systems that can continue to function safely even after one or more components have failed. Redundancy is the primary means of achieving this fault tolerance. It is the inclusion of extra components that are not strictly necessary for the system to function under normal conditions.9

Aerospace systems employ multiple layers of redundancy. This can range from simple duplication of critical sensors to complex, fully dispersed architectures with multiple redundant channels, often described as a “brick wall” design.9

For instance, modern fly-by-wire aircraft typically have three or four parallel flight control computers. These computers run concurrently, and their outputs are compared through a voting system. If one computer produces a divergent result, it is identified as faulty and outvoted by the others. This allows the system to continue operating seamlessly.

This approach ensures that a single-point failure, whether in hardware or software, does not lead to a loss of control.9 This design philosophy explicitly assumes that failures will happen. It provides a robust, pre-planned mechanism to manage them, ensuring the system’s survivability and reliability.

The “Safety-I” Philosophy: A World Without Error

The various methods of the aerospace approach can be understood through the lens of the “Safety-I” paradigm.5 Safety-I is the traditional and still dominant view of safety in most high-risk industries.

Safety-I Definition: Safety is the absence of adverse outcomes—a state where as few things as possible go wrong.5

The core assumption of Safety-I is that identifiable failures cause accidents. Something breaks, a procedure is violated, or a human makes an error.

The primary function of safety management, therefore, is to be reactive. When an incident occurs, an investigation is launched to find the root cause. Corrective actions are then taken to eliminate that cause and prevent a recurrence.5 Risk assessment focuses on identifying hazards and calculating their likelihood, with the goal of erecting barriers to prevent harm.5

Within this model, human performance is often viewed as a liability. Humans are seen as the most variable and least predictable component of the system. They are a potential source of error that must be constrained through rigid procedures, extensive training, and automation.5

The goal is to minimize deviation from the “work-as-imagined” model, where every action is prescribed and predictable. The entire aerospace safety paradigm is a direct and logical consequence of its unique risk profile: the need to prevent low-frequency, high-consequence, catastrophic events. It is a framework optimized for a deterministic world, dedicated to engineering a system free from error.

Part II: The Pharmaceutical Paradigm – A Framework of Risk-Managed Efficacy

The pharmaceutical industry, like aerospace, is a heavily regulated, high-stakes environment where safety is paramount. However, the nature of the risks, the sources of variability, and the definition of a successful outcome are fundamentally different.

Instead of engineering deterministic mechanical systems, pharmaceutical science contends with the inherent, unpredictable variability of biological systems. This has given rise to a distinct safety and quality paradigm.

This paradigm is not focused on the absolute elimination of error. It is focused on the rigorous management of risk and variability. The goal is to consistently produce safe and effective medicines for a diverse patient population. This framework emphasizes process control, data integrity, and a documented “fitness for intended use” rather than a mathematical proof of absolute correctness.

The GxP Regulatory Environment: Mandating Integrity and Traceability

The foundation of pharmaceutical quality is a set of regulations and guidelines collectively known as Good Practice, or GxP. This umbrella term includes Good Manufacturing Practice (GMP), Good Clinical Practice (GCP), and Good Laboratory Practice (GLP).

The overarching goal of GxP is to ensure that medicinal products are developed, manufactured, and controlled to the quality standards appropriate for their intended use.11 This safeguards patient health. A central pillar of the GxP framework is the emphasis on data integrity and traceability.

In the United States, the FDA’s Code of Federal Regulations, particularly Title 21 CFR Part 11, is foundational. It establishes the criteria for when electronic records and signatures are considered trustworthy, reliable, and equivalent to paper records.11

This regulation is critical in the modern digital era of drug development. It mandates stringent controls for computerized systems. These controls include secure access, user authentication, and the generation of indelible, time-stamped audit trails.11 These trails record every creation, modification, or deletion of data.

The purpose is to create a verifiable and legally binding data trail. This ensures that all information supporting a drug’s safety and efficacy is accurate, complete, and protected from tampering. This focus on data integrity is a cornerstone of the entire regulatory system.

Computer System Validation (CSV): Fitness for Intended Use

Building on this regulatory foundation, the industry employs a specific process. To comply with GxP regulations like 21 CFR Part 11, the pharmaceutical industry uses a formal process known as Computer System Validation (CSV).

CSV is the documented process of providing a high degree of assurance.13 It proves that a specific computerized system will consistently produce a result meeting its predetermined specifications and quality attributes.

Guiding Principle of CSV: The key phrase is not “provable correctness” but “fitness for intended use”.11

The objective of CSV is to generate documented evidence that the system does exactly what it is supposed to do.15 It must do so reliably and reproducibly, within its specific operational context.

This is a fundamentally pragmatic and context-dependent approach. A system is not validated in the abstract. It is validated for a particular task in a particular environment.

The process follows a structured lifecycle approach that typically includes several key phases:15

  • 1. Planning and Risk Assessment: Defining the scope of the validation and assessing the system’s potential impact on product quality, patient safety, and data integrity.
  • 2. Specification: Documenting the User Requirements Specification (URS), Functional Specification (FS), and Design Specification (DS).
  • 3. Qualification:
    • Installation Qualification (IQ): Verifying that the system is installed correctly according to specifications.
    • Operational Qualification (OQ): Testing to ensure the system operates as intended across its specified operational ranges.
    • Performance Qualification (PQ): Demonstrating that the system performs reliably and consistently in the actual production environment with real users and data.
  • 4. Change Control and Maintenance: Establishing formal procedures to manage any changes to the system to ensure it remains in a validated state throughout its operational life.

This systematic process ensures that every aspect of the system’s function that can impact GxP requirements is thoroughly tested and documented. This provides a robust body of evidence for regulatory review.

GAMP 5 and the Risk-Based Lifecycle

To guide this context-dependent process, the industry widely adopts the Good Automated Manufacturing Practice (GAMP) guide, currently in its fifth edition (GAMP 5).12

Developed by the International Society for Pharmaceutical Engineering (ISPE), GAMP 5 is not a regulation itself. Instead, it is recognized globally as the leading set of best practices for achieving compliant GxP computerized systems.17

The central tenet of GAMP 5 is its pragmatic, risk-based approach.12 This principle acknowledges that not all systems carry the same level of risk. It holds that validation efforts should be scaled accordingly.

The framework encourages companies to focus their resources on the areas of highest risk to patient safety, product quality, and data integrity.12 For example, a system that directly controls a critical manufacturing parameter for an injectable drug would require far more stringent validation than a system used for internal project management.

This risk-based approach provides flexibility and efficiency. It allows companies to tailor their validation strategy to the specific context and complexity of the system, rather than applying a one-size-fits-all standard.17 The GAMP 5 lifecycle model, often depicted as a “V-model,” provides a structured path from concept through retirement. It emphasizes continuous monitoring, change management, and leveraging supplier documentation.17

The “Safety-II” Philosophy: Managing Variability for Success

The pharmaceutical industry’s quality paradigm aligns closely with the principles of the “Safety-II” philosophy.5 This stands in stark contrast to Safety-I’s focus on preventing failures.

Safety-II Definition: Safety is the ability of a system to succeed under varying conditions.5

The goal is not simply to ensure that as few things as possible go wrong. The goal is to ensure that as many things as possible go right.5

This perspective is exceptionally well-suited to the realities of drug development and healthcare. Biological systems are inherently variable. Patients respond differently to the same treatment. Manufacturing processes have natural fluctuations. It is impossible to eliminate this variability.

The Safety-II approach, therefore, focuses on understanding and managing this variability to achieve consistently successful outcomes.5 It recognizes that everyday performance adjustments are not a source of error. They are essential for adapting to changing conditions and making things work.

In the Safety-II view, humans are not a liability. They are a critical resource for resilience and flexibility.5 The expertise and adaptive capacity of scientists, clinicians, and operators are what allow the system to succeed despite inevitable challenges.

The pharmaceutical paradigm embodies this philosophy. It does not seek to prove that a system can never fail in the abstract. Instead, it builds robust processes, ensures data integrity, and relies on expert human oversight to manage the inherent variability of its domain. The ultimate goal is to consistently deliver safe and effective therapies to patients. The entire framework is an apparatus for managing statistical risk across a population, not for achieving the deterministic, error-free perfection sought in aerospace.

Part III: The Disruptive Challenge of Artificial Intelligence

The advent of AI, particularly deep learning and generative models, presents a profound challenge to the established safety paradigms in both aerospace and pharmaceuticals.

These technologies are not deterministic, decomposable, or easily specified like traditional software. Their statistical, probabilistic nature and their capacity for emergent, unpredictable behavior—most notably “hallucination”—fundamentally clash with the aerospace goal of provable correctness.

At the same time, this complexity pushes the pharmaceutical industry’s risk-based framework into new territory. In response, regulators and industry bodies are developing novel frameworks. They are not trying to force this new technology into old molds but are instead designing ways to manage AI’s unique properties and risks.

AI Hallucination and the Limits of Determinism

A primary concern with modern generative AI models, such as Large Language Models (LLMs), is the phenomenon of “hallucination”.20

This term refers to the tendency of these models to generate outputs that are fluent, confident, and plausible-sounding but are factually incorrect, nonsensical, or not grounded in their training data.21

It is crucial to understand that hallucination is not a “bug” that can be fixed in the traditional sense. It is an inherent artifact of how these models work. They are designed to be generative, to extrapolate and create novel content by identifying statistical patterns, not to function as deterministic databases.21

In a high-stakes, regulated field like pharmaceutical research, the risks posed by hallucinations are severe. An AI system that hallucinates could provide incorrect medical advice or medication dosages, leading directly to patient harm.20

It could also:

  • Misreport scientific data.
  • Invent non-existent citations.
  • Propose flawed experimental designs.

These errors could result in significant financial losses, wasted research efforts, and serious legal and compliance liabilities.20 The potential for a single AI error to have outsized fallout is a major barrier to adoption. It underscores why these tools cannot be treated as just another piece of software within existing validation frameworks.20

The Infeasibility of Formal Verification for Deep Learning

This challenge is particularly acute for the aerospace paradigm. The industry’s gold standard for software assurance, formal verification, is fundamentally incompatible with today’s deep learning models.23

Formal methods are designed for classic, non-machine-learned software. In such software, the system’s logic is explicit, and its behavior can be mathematically defined and proven against a specification.6 Deep neural networks (DNNs) and LLMs, by contrast, are statistical and probabilistic. Their “logic” is not explicitly programmed. It is encoded implicitly in millions or billions of weighted parameters, learned from data.

Applying formal methods to these systems is an active but nascent area of academic research, and it faces monumental challenges.24

The sheer scale and complexity of modern neural networks make an exhaustive exploration of their state-space computationally intractable.25 Furthermore, creating a complete and formal specification for a generative model on an open-ended task is often impossible. How does one formally specify “provide a useful summary of clinical trial data” in a way that can be mathematically proven?

Current research in FV for DNNs is largely confined to narrower, more constrained problems, like verifying the robustness of image classifiers.23 It is not a mature or scalable technology that can be applied to the broad, generative tasks for which a company like Moderna is leveraging OpenAI’s models. The pursuit of provable, absolute correctness is, for the foreseeable future, a dead end for this class of AI.

The FDA’s Regulatory Response: A New Risk-Based Framework

With formal verification off the table for these models, regulators are not attempting to force this new technology into old molds. The U.S. Food and Drug Administration (FDA) has moved to establish a modern regulatory framework.

In January 2025, the agency issued a pivotal draft guidance titled, “Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products”.26

This guidance does not attempt to apply old CSV principles to AI. Instead, it introduces a new, flexible framework specifically designed for it: the risk-based credibility assessment framework.26

This framework is a direct evolution of the pharmaceutical industry’s existing risk-based philosophy. It implicitly accepts that an AI model cannot be proven “correct” in an absolute sense. Instead, it focuses on establishing trust and credibility in the model’s output for a very specific, well-defined purpose.

The framework is organized around a seven-step process:29

  1. Define the question of interest.
  2. Define the Context of Use (COU) for the AI model.
  3. Assess the AI model risk.
  4. Develop a plan to establish the credibility of the model’s output.
  5. Execute the plan.
  6. Document the results.
  7. Determine the adequacy of the model for the COU.

Two concepts are central to this framework.

The first is the Context of Use (COU). This is the specific task, environment, and manner in which the AI model will be deployed.26 The entire credibility assessment is relative to this COU.

The second is the method for assessing risk. This method is based on two factors:26

  • Model Influence: The degree to which the model’s output contributes to a decision.
  • Decision Consequence: The significance of an adverse outcome if the decision is incorrect.

A high-risk example would be a model used as the sole determinant for a critical patient monitoring decision. A lower-risk example would be a model used as an assistive tool to flag potential issues for human review.26

This framework provides a structured yet adaptable approach for evaluating AI. It focuses regulatory scrutiny where the risk to patients is highest and emphasizes the need for ongoing lifecycle maintenance for adaptive models.29

Industry Adaptation: Integrating AI into GAMP

In parallel with regulatory developments, industry standards bodies are also evolving their best practices. The ISPE, the organization behind GAMP 5, has recognized the need to provide specific guidance for this new class of technology.

The Second Edition of GAMP 5, published in 2022, was updated to explicitly incorporate emerging technologies. It added new appendices covering topics like Artificial Intelligence and Machine Learning (AI/ML).17

More significantly, ISPE has since published a dedicated ISPE GAMP® Guide: Artificial Intelligence.31 This guide provides a comprehensive, holistic framework for the development, validation, and use of AI-enabled systems within a GxP-regulated environment.

It adapts the classic GAMP lifecycle model for ML subsystems, outlining a process that moves from concept and prototyping through production and operation.33

Crucially, it extends GAMP’s core risk-based philosophy to the specific hazards associated with AI. This includes a focus on risks related to the quality of training data, the design of the model, and the need for effective monitoring and control during operation to ensure the human remains in control.18

By providing this detailed guidance, ISPE is helping the industry apply its well-established principles of risk management to the unique challenges posed by AI. This ensures that these systems are developed and deployed in a manner that safeguards patient safety and data integrity.

Part IV: Case Study – AI Governance and Application at Moderna

Moving from the theoretical to the practical, it is essential to examine how a leading biopharmaceutical company is implementing these new frameworks.

Moderna’s strategic adoption of AI, particularly its partnership with OpenAI, provides a compelling case study. It shows how the industry is grappling with the promise and peril of generative AI.

Their approach reveals a nuanced, bifurcated strategy for validation and safety. They use one strategy for traditional, predictive AI, and a distinct, process-oriented one for generative AI. In both cases, the human expert serves as the lynchpin of the entire system.

The OpenAI Partnership and Scope of Use

Moderna has entered into a strategic partnership with OpenAI. The company is deploying ChatGPT Enterprise to thousands of employees across the entire organization.2 The goal is to empower every function—from legal to clinical development—with advanced AI capabilities. Moderna aims to accelerate the development of new mRNA medicines.2

The applications are diverse. Legal and branding teams use ChatGPT Enterprise to streamline non-GxP tasks like analyzing contracts, answering policy questions, and tailoring communications.2

More critically, AI is being piloted in GxP-adjacent domains. A key example is a custom GPT named Dose ID, which is being developed and validated to assist clinical development teams.2 The purpose of Dose ID is to help scientists review and analyze the complex datasets from clinical trials. This supports the crucial decision of selecting the optimal vaccine dose for late-stage studies.2

The tool is designed to automate parts of the analysis, provide a clear rationale, reference its data sources, and generate informative charts to augment the clinical team’s judgment.2

Validation in Practice – The Human-in-the-Loop (HITL) Imperative

Moderna’s primary safety mechanism is not a technical fix for the model itself. Given the inherent unreliability of generative AI, the company relies on a procedural control: the rigorous implementation of a Human-in-the-Loop (HITL) workflow.1

HITL is a collaborative framework. It intentionally integrates expert human oversight into AI-driven processes.1

This approach is designed to leverage the best of both worlds. It combines the speed, scale, and data-processing power of AI. It pairs this with the critical thinking, domain expertise, and intellectual rigor of human scientists.1

In the specific application of the Dose ID GPT, Moderna’s process is explicitly “led by humans and with AI input”.2 The AI tool serves as a powerful assistant; it does not make the final decision.

The tool augments and enhances the clinical team’s ability to evaluate vast amounts of data. However, the ultimate responsibility for judgment, interpretation, and the final decision rests entirely with the human experts.2

This procedural safeguard is the core defense against the risks of AI error. The human expert acts as the final validation gate for every significant output. They ensure that any AI-generated findings are scientifically sound, aligned with research objectives, and safe before they influence critical development decisions.1 This makes the human not just a user, but an essential component of the system’s safety architecture.

Situating Moderna’s Approach within Quality Frameworks

Moderna’s practices demonstrate a sophisticated understanding of how to apply different validation strategies to different types of AI. This approach aligns with the principles of the emerging regulatory and industry frameworks.

Their strategy is effectively bifurcated:

1. For traditional, predictive AI:

Moderna employs a classic model validation methodology. A prime example is a custom-trained convolutional neural network (CNN) used for automated Sanger sequencing analysis. This is a critical quality control step for their DNA templates and final mRNA products.

This is a narrow AI with a well-defined task: produce a pass/fail score on sequencing data. To validate this model, Moderna trained it on a massive dataset of over 20,000 labeled data files. The model’s performance was then benchmarked against human experts. It was shown to “exceed individual human performance,” increasing the consistency and quality of the QC process.

Similarly, the company employs a logistic regression algorithm, trained on historical data, to predict which mRNA orders are at risk of producing insufficient material. This is a standard and robust method for validating a predictive machine learning model.

2. For broad, generative AI:

For tools like ChatGPT Enterprise, where direct performance validation is infeasible, their approach shifts to a procedural, risk-based model. This is a practical application of the FDA’s new framework.

Analyzed through the FDA’s lens:

  • The Context of Use (COU) for the Dose ID tool is decision support for clinical data analysis, not automated decision-making.
  • The Model Influence is deliberately kept low, as the tool is assistive and the human expert is the final decider.
  • The Decision Consequence, however, is high, as dose selection is critical to patient safety and trial success.

This combination of low influence and high consequence makes the HITL control an essential risk mitigation measure. Moderna is not attempting to prove that the underlying OpenAI model is flawless. Instead, they have designed a safe and compliant process for using an imperfect but powerful tool within a specific, high-stakes context.

This process-centric approach, combined with a broader digital strategy emphasizing quality and traceability, demonstrates a clear alignment with the foundational principles of GxP and the risk-based future of AI regulation.

Part V: A Synthesis of Standards – Direct Comparative Analysis

The safety paradigms of the aerospace and pharmaceutical industries have evolved from fundamentally different problem sets, risk profiles, and technologies. The introduction of generative AI further accentuates these differences.

This makes a direct, feature-by-feature comparison essential. It helps explain why a single, universal standard for AI safety is neither feasible nor desirable. The aerospace approach seeks provable correctness in a deterministic system. The pharmaceutical approach seeks credible, risk-managed performance in a probabilistic one.

Methodological Divergence: Proof vs. Credibility

The most significant divergence lies in the core objective of the assurance process.

  • Aerospace: The industry’s standard is built on Formal Verification. This methodology has the explicit goal of generating a mathematical proof of a system’s correctness against its formal specification.3 The ambition is to exhaustively analyze the system’s behavior and eliminate entire classes of errors before deployment.
  • Pharmaceuticals (with AI): The emerging standard is the Risk-Based Credibility Assessment, as articulated in the FDA’s guidance.26 This framework’s goal is not absolute proof. Instead, it aims to generate sufficient evidence that a model is credible and trustworthy for its specific Context of Use. The focus is on managing error risk during operation through model evaluation, process controls, and human oversight.

Philosophical Contrasts: Determinism vs. Probabilism

These methodological differences stem from deeper philosophical contrasts.

  • Aerospace (Safety-I): This paradigm operates on the assumption that systems are largely deterministic and decomposable. It views the world in a bimodal fashion—things are either working correctly or they have failed. The primary goal is to prevent failure by identifying and eliminating root causes.5
  • Pharmaceuticals (Safety-II): This paradigm accepts that the systems it deals with—both biological and computational—are inherently variable and probabilistic. Success is not achieved by eliminating variability but by understanding and managing it through robust, adaptive processes and expert judgment.5

The Role of the Human Expert: Operator vs. Validator

The role and perceived value of the human expert differ dramatically between the two paradigms.

  • Aerospace: In a highly automated, formally verified system, the human pilot is an expert operator of a system designed to be provably correct. Within the traditional Safety-I framework, human unpredictability can be viewed as a potential liability to be constrained.5
  • Pharmaceuticals (with AI): In the context of generative AI, the human expert is the central and most critical component of the safety and validation system. Within the Human-in-the-Loop model, the scientist is not merely an operator but the final validator of the AI’s output. They are the essential safety control that mitigates the model’s inherent unreliability.1

A Tale of Two Risk Profiles: Catastrophe vs. Spectrum

Ultimately, the divergence in safety standards is a rational response to two vastly different risk profiles.

  • Aerospace: The risk profile is dominated by the threat of a singular, catastrophic system failure leading to mass casualties. The entire safety framework is engineered to drive the probability of this one type of event to near zero.
  • Pharmaceuticals: The risk profile is a statistical spectrum of adverse events distributed across a large, diverse patient population. Risks range from a drug being ineffective to minor side effects to rare but severe reactions. The regulatory framework is designed to manage this statistical risk, balancing a new medicine’s potential benefits against its potential harms.

Table 1: Comparative Framework of Safety Assurance in Aerospace and Pharmaceuticals

FeatureAerospace ApproachPharmaceutical AI Approach
Primary Safety PhilosophySafety-I: Focus on the absence of negative events (accidents, failures).5Safety-II: Focus on the presence of positive outcomes (success under varying conditions).5
Core Goal of AssuranceProvable Correctness / Error Elimination: Mathematically prove the system meets its specification.3Risk-Managed Fitness for Purpose: Provide evidence the system is credible for a specific use.[11, 26]
Primary Verification MethodFormal Verification & Exhaustive Testing: Aims for complete absence of error against a design specification.3Risk-Based Credibility Assessment & Human-in-the-Loop (HITL): Manages risk through process controls and expert oversight.[1, 29]
View of System BehaviorDeterministic & Decomposable: Assumes predictable behavior and that the system can be broken down into bimodal components.5Probabilistic & Holistic: Accepts inherent variability and non-deterministic behavior, especially in AI.[5, 21]
Handling of Variability/ErrorEliminate at the source: Identify root causes and engineer them out of the system.5Manage and mitigate in operation: Use adaptive processes and human expertise to control for inevitable variability.[1, 5]
Role of Human ExpertSystem Operator (Potential Liability): A component whose variability may need to be constrained.5System Validator (Essential Safety Component): The final arbiter of correctness and the primary control against AI error.1
Approach to Third-Party SoftwareRigorous supply-chain certification and formal verification of components.Process-based controls (e.g., HITL) on the use of the third-party software, focusing on the application’s context.1
Applicability to Generative AIFundamentally Incompatible: Formal methods cannot be applied to large-scale, non-deterministic generative models.24Natively Designed For It: The risk-based, context-dependent framework is specifically designed to accommodate AI’s unique properties.[18, 26]

The Debate Over Harmonization and Ethical Considerations

Despite the distinct paradigms analyzed, a compelling argument exists for harmonizing safety standards. This argument primarily pushes for elevating healthcare’s AI assurance to the stringent levels seen in aviation.

Proponents argue that because both fields are safety-critical, aviation’s safety culture and human factors engineering could significantly reduce AI deployment risks in medicine and enhance patient safety.

This debate is rooted in fundamental ethical considerations, particularly the principle of non-maleficence (minimizing harm).

If different industries adopt different safety levels for AI, it raises complex questions of accountability.

Consider the “black box” nature of many AI models. Their internal decision-making process is opaque. This makes it inherently challenging to pinpoint the exact cause of an error. It is therefore difficult to assign responsibility when harm occurs.

This challenge has led some to advocate for universal principles of fairness, transparency, and human safety, regardless of the application.

Furthermore, the risk-based framework adopted by the pharmaceutical industry is not without its own critics. Some argue that a “risk-based” approach can be ambiguous, create regulatory gaps, and fail to define clear methods for measuring harm. Others contend that some AI applications are fundamentally incompatible with human rights. Framing them as a “risk” to be “mitigated” is a flawed premise.

However, the regulatory trajectory for AI appears to be following a path similar to aviation’s. It is evolving from voluntary principles toward binding, risk-based statutes. The core challenge remains balancing innovation with safety, a debate that is far from over.

Conclusion: Equivalence, Adequacy, and Fitness for Purpose

Is the AI used in Moderna’s vaccine development up to the same safety standards as the aerospace industry?

The direct answer is no, nor should it be.

The analysis reveals that a direct comparison of specific methodologies, like formal verification, is a category error. Applying a deterministic proof method from avionics to a probabilistic system like ChatGPT is technologically infeasible and philosophically misaligned.

The standards are not the same. This is because the nature of the technology and the context of the risk are fundamentally different.

This distinction, however, does not imply that pharmaceutical standards are lower or inadequate.

To the contrary, the pharmaceutical industry has developed a robust and highly appropriate safety framework for AI. It has done so in concert with regulators like the FDA and standards bodies like ISPE.

This framework is demonstrably fit for its purpose. That purpose is managing the specific risks of AI within the specific context of drug development.

The reliance on a risk-based credibility assessment, anchored by a defined Context of Use (COU), constitutes a high standard of safety assurance. It is operationalized through rigorous Human-in-the-Loop (HITL) validation.

This approach does not ignore the inherent limitations of AI, such as hallucinations or third-party code. Instead, it explicitly acknowledges these risks. It builds a resilient, human-centric procedural system around the technology to contain them.

The safety of the system does not reside solely in the AI model itself. It resides in the integrated process of its use. This process includes the indispensable role of the human expert as the final arbiter of safety and scientific validity.

Ultimately, the two industries represent two different but equally valid philosophies of safety. Each is born of its own unique challenges. Aerospace engineering strives to build a perfect, error-free system for a predictable physical world. In its application of generative AI, the pharmaceutical industry is focused on building a resilient, expert-driven process to navigate the complex, probabilistic world of biology.

The standards are not equivalent; they are appropriately and rigorously distinct.


Works Cited

  1. metaphacts GmbH, “Human-in-the-Loop for AI: A Collaborative Future in Research Workflows,” 2024, blog.metaphacts.com
  2. OpenAI, “Moderna and OpenAI partner to accelerate the development of life-saving treatments,” 2024-2025, openai.com/index/moderna/
  3. ResearchGate, “Continuous Formal Verification for Aerospace Applications,” 2024, researchgate.net
  4. Federal Aviation Administration (FAA), “Design Assurance of Airborne Electronic Hardware,” 2014, faa.gov
  5. NHS England, “Safety-I and Safety-II White Paper,” 2015, england.nhs.uk
  6. MDPI, “Formal Methods for Verifying the Correctness of IoT Systems,” 2023, mdpi.com
  7. arXiv, “Formal Verification of Software Systems,” 2021, arxiv.org
  8. NASA Technical Reports Server (NTRS), “Formal Verification of Safety Properties for Aerospace Systems,” 2005, ntrs.nasa.gov
  9. DTIC, “Avionics Suite Design,” 1980, apps.dtic.mil
  10. MDPI, “Integrating Safety-I and Safety-II in Near-Miss Management,” 2023, mdpi.com
  11. Intuition, “Understanding GAMP 5 Guidelines for System Validation,” 2025, intuitionlabs.ai
  12. USDM, “GAMP 5: An Overview of Good Automated Manufacturing Practice,” 2024, usdm.com
  13. GetReskilled, “What is Computer Systems Validation (CSV)?” 2024, getreskilled.com
  14. PwC, “How AI is transforming computer system validation (CSV),” 2024, pwc.com
  15. Sware, “What is Computer System Validation,” 2024, sware.com
  16. Intuition, “CSV in Pharmaceutical and Biotech Industries,” 2024, intuitionlabs.ai
  17. ISPE, “GAMP 5 Guide: A Risk-Based Approach to Compliant GxP Computerized Systems (Second Edition),” 2022, ispe.org
  18. ISPE, “Machine Learning Risk and Control Framework,” 2024, ispe.org
  19. Erik Hollnagel, “Safety-I and Safety-II,” erikhollnagel.com
  20. The National Law Review, “AI ‘Hallucinations’ Are Creating Real-World Risks for Businesses,” 2024, natlawreview.com
  21. Wikipedia, “Hallucination (artificial intelligence),” 2024, en.wikipedia.org
  22. PMC, “AI in Medicine and ‘AI Hallucinations’,” 2023, pmc.ncbi.nlm.nih.gov
  23. arXiv, “Formal Verification of Deep Neural Networks in Object Detection,” 2024, arxiv.org
  24. arXiv, “Formal Methods for Machine Learning,” 2021, arxiv.org
  25. arXiv, “Challenges of Formal Verification for Deep Learning,” 2024, arxiv.org
  26. WCG, “The Role of AI in Regulatory Decision-Making for Drugs & Biologics: The FDA’s Latest Guidance,” 2025, wcgclinical.com
  27. U.S. Food and Drug Administration (FDA), “FDA Proposes Framework to Advance Credibility of AI Models Used in Drug and Biological Product Submissions,” 2025, fda.gov
  28. U.S. Food and Drug Administration (FDA), “Considerations for the Use of Artificial Intelligence To Support Regulatory Decision-Making for Drug and Biological Products,” 2025, fda.gov
  29. U.S. Food and Drug Administration (FDA), “Draft Guidance: Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making,” 2025, fda.gov
  30. ISPE, “GAMP 5 Guide (Second Edition),” 2022, ispe.org
  31. ISPE, “ISPE GAMP Guide: Artificial Intelligence,” 2024, ispe.org
  32. ISPE, “GAMP Guide: Artificial Intelligence,” ispe.org
  33. ISPE, “Applying GAMP Concepts to Machine Learning,” 2023, ispe.org
  34. ISPE, “New EU AI Regulation and GAMP 5,” 2023, ispe.org
  35. Inductive Quotient, “Moderna: Harnessing AI for Drug Development,” 2024, inductivequotient.com
  36. metaphacts GmbH, “Human-in-the-Loop for AI: A Collaborative Future,” 2024, blog.metaphacts.com
  37. Moderna, “Moderna Digital WhitePaper,” 2020, modernatx.com
  38. International Journal of TCHM & PR, “Integrating Human Factors in the Healthcare System: Embracing Aviation Methodologies and Artificial Intelligence,” 2024, ijtmrph.org
  39. MIT News, “Stratospheric safety standards: How aviation could steer regulation of AI in health,” 2024, news.mit.edu
  40. Osher Lifelong Learning Institute at Boise State University, “How Aviation Could Steer Regulation of AI in Health,” 2024, boisestate.edu
  41. arXiv, “The Balanced, Integrated and Grounded (BIG) Argument for AI Safety,” 2024, arxiv.org
  42. PMC, “Accountability and Safety in AI-Based Clinical Tools,” 2020, pmc.ncbi.nlm.nih.gov
  43. Capitol Technology University, “Ethical Considerations of Artificial Intelligence,” 2024, captechu.edu
  44. Project Management Institute (PMI), “Top 10 Ethical Considerations for AI Projects,” 2024, pmi.org
  45. RAND Corporation, “Risk Framework for the EU AI Act,” 2024, rand.org
  46. Access Now, “The EU should regulate AI on the basis of rights, not risks,” 2021, accessnow.org
  47. OneGiantLeap, “How AI regulation is following the flightpath of aviation,” 2024, insights.onegiantleap.com

Comments

Leave a Reply