Lightning Strikes: When CARA Met Elsa

Written by ConcertAI | Aug 21, 2025 1:53:40 PM

Biopharma Multi-Agents, AI-to-AI Workflows, and Regulatory Interactions

By Jeff Elton, PhD, Vice Chairman of ConcertAI; Pyeush Gurha, CTO of ConcertAI; and Stephen Yip, PhD, CARAai architect of ConcertAI

The pharmaceutical industry stands at a transformative juncture where AI-enhanced drug development and AI-mediated regulatory interactions are becoming reality. The U.S. FDA's introduction of "Elsa," an AI-powered regulatory decision support system, changes how drug sponsors may want to think about preparing and submitting regulatory materials. In this new paradigm, platforms like ConcertAI's CARAai™ become critical bridges—enabling sponsors to anticipate, prepare for, and optimize their interactions with regulatory AI systems.

"When CARA met Elsa” is a metaphor¹ for the emerging ecosystem where pharmaceutical companies leverage sophisticated AI platforms to engage with regulatory AI systems. CARAai, with its comprehensive real-world evidence from all 50 U.S. states, integrated clinical trial data, and specialized AI agents, serves as a critical preparation platform that helps sponsors understand and stress test how regulatory AI like Elsa will evaluate their submissions.

The Convergence of AI in Drug Development and Regulation

Throughout 2024 and 2025, we have reviewed and commented on the FDA's systematic evolution from conceptualizing AI's regulatory role to operationalizing it through Elsa. This journey began with the FDA's Preliminary Guidance Document on AI in support of regulatory decisions, and culminated in Commissioner Marty Makary's announcement of Elsa—the agency's generative AI system for regulatory assessments, decisions, and safety surveillance. This transformation highlights a fundamental shift in how pharmaceutical development and regulation will interact in the AI era.

Elsa's Foundation and Unique Capabilities

While Elsa's exact architecture remains proprietary, we can infer its foundational components. Like other major AI platforms—OpenAI's GPT-4/5, Anthropic's Claude, and XAI’s Grok—Elsa incorporates standard scientific literature, clinical guidelines, toxicology databases, and teratogen registries. These public and government-sponsored resources represent table stakes, the required knowledge base for any credible medical AI system.

Figure 1: OpenAI 4.x Training Data Sources

What truly distinguishes Elsa, however, is its exclusive access to the FDA's institutional knowledge accumulated over decades. The agency possesses unique internal assets that no external system can access; program assessment documents with confidential reviewer commentary; proprietary safety databases from historical submissions; processed case summaries; and data from collaborative initiatives like Sentinel. This combination creates a powerful synergy where public scientific literature provides the knowledge foundation while internal FDA data enables nuanced regulatory judgment through retrieval-augmented generation (RAG), which may be further augmented by task-specific reasoning.

Built on established LLM architecture (likely OpenAI or Anthropic), Elsa has been fine-tuned with this unique combination of the FDA's proprietary and public assets. This enables Elsa to provide analyses, assessments, and recommendations that no other system can match, regardless of how comprehensive its public training data. The system maintains strict information compartmentalization, ensuring sponsor confidentiality while leveraging decades of collective regulatory wisdom with unprecedented speed and comprehension. As FDA teams refine their interactions with Elsa—improving prompts, adding specialized agents, and expanding data sources—the system will evolve from a powerful tool into an indispensable partner that fundamentally transforms regulatory science.

The Technical Architecture of Agent-to-Agent Communication

Modern AI systems can communicate with each other through a series of microservices with standardized APIs, enabling sophisticated multi-system workflows across different enterprises. Most LLMs expose APIs that accept message histories as inputs—for example, OpenAI's Chat Completions API can receive a conversation thread and pass it to Anthropic's Claude for further analysis. This interoperability enables the creation of multi-agent systems where different agents and specialized LLMs handle specialized tasks: one drafts clinical protocols, another assesses patient recruitment feasibility, and a third evaluates protocol amendment risks across global sites.

Intelligent agents orchestrate these LLM interactions through three operational modes. Agents can automatically chain responses between systems, generate custom outputs based on multiple LLM inputs, or prompt human intervention at critical decision points. These agents can delegate tasks, coordinate multiple LLMs, and maintain context across complex analytical workflows. Additionally, agents enhance LLM capabilities through retrieval-augmented generation (RAG) and autonomously orchestrating reasoning to achieve the task objective and validate the outcomes—by accessing proprietary databases, integrating real-time data, and grounding responses in specific regulatory documents. By splitting reference materials into vector-embedded fragments, agents ensure that outputs remain accurate, current, and contextually appropriate for regulatory requirements.

Biopharma’s Rapid Adoption of AI-to-AI Workflow

Leading pharmaceutical companies are rapidly deploying multi-LLM and multi-agent environments across their R&D operations. Current applications include AI-augmented scientific writing, protocol development, and regulatory document preparation. Within a matter of months, R&D workflows will increasingly use LLMs and agents for assessing programs from discovery through first-in-human studies, designing clinical trials, selecting clinical sites, and providing active surveillance of trial performance and safety.

This rapid transformation extends beyond efficiency gains. As each step of the development process becomes supported and documented by LLM models and agents, sponsors face a critical new challenge: understanding how an outside reviewer—particularly regulatory AI—might analyze the same information. They must consider how their data fits within the context of current standard-of-care outcomes, evaluate whether they have addressed the safety and tolerability issues of existing therapeutic solutions, and demonstrate relative value compared to OTC, generic, alternative, or non-interventional approaches. The fundamental question evolves from "What does our data show?" to "How will Elsa see this program, and what would Elsa's questions and assessments be?"

CARAai as the Strategic Bridge: Anticipating Regulatory AI Perspectives

ConcertAI's CARAai platform enables sponsors to stress-test a sponsor’s AI-enriched submissions through a regulatory-like AI lens before formal FDA interaction. When sponsor LLMs prepare submissions, they can integrate with CARAai's platform through exposing their APIs, accessing comprehensive real-world data from all 50 U.S. states, medical claims data, standard-of-care guidelines, and clinical trial intelligence. This allows sponsors to validate their claims, benchmark against current treatments, and proactively identify potential regulatory concerns.

The future regulatory process will involve iterative AI-to-AI dialogue between sponsor and FDA systems. Sponsor LLMs will prepare submissions enriched with accumulated prompts and assessments underlying their analyses. These can be provided to Elsa for preliminary evaluation, generating initial assessments and identifying areas that require clarification. While this AI-mediated interaction might seem risky, platforms like CARA reduce uncertainty by providing robust, well-documented evidence that helps anticipate regulatory perspectives. Though CARAai cannot access the FDA's proprietary data like Elsa can, it offers the next best thing: comprehensive and up-to-date real-world evidence and analytical capabilities that help sponsors prepare for regulatory scrutiny.

The Evolution to Cross-Enterprise AI Ecosystems

"CARA meets Elsa" is a metaphor for the AI decision-interaction world we are entering. For all companies, AI-to-AI interactions will become standard following the patterns described above. This transformation will begin within Virtual Private Cloud environments of the largest companies, integrating their first-party and licensed third-party data. As these multi-LLM and agent-based scientific, clinical, and business processes evolve to contain the main body of analyses, assessments, decisions, and submission documents, workflows will naturally extend across enterprise and agency boundaries, enabling LLM-to-LLM interactions, submissions, queries, and AI agent-to-agent debates.

Why are we so certain of this trajectory? Two factors make it inevitable. First, prompt-level interactions and responses maintained and abstracted by LLM systems in life science companies are becoming integral parts of the scientific and clinical records used for internal funding decisions, phase transitions, and regulatory submissions. These digital artifacts naturally flow into the regulatory review process. Second, it will serve both FDA efficiency and submission standardization for the agency to open APIs and enable certain levels of prompt access in their systems. This represents a natural evolution from the database transitions and consolidations of recent years, enabling the FDA to more effectively ensure public safety while advancing needed medicines more rapidly.

Top-performing biopharma companies will push this transformation forward with alacrity, integrating complementary platforms such as CARAai to ensure that they have the most robust data and intelligence underpinning their decisions and programs. By doing so, they will predict with increasing accuracy how the FDA and other agencies will view their plans and submissions, transforming regulatory interaction from a high-stakes, one-shot submission into an iterative, AI-mediated dialogue that ultimately accelerates patient access to innovative therapies.

Since January of 2025, we’ve commented extensively on the series of formal and informal document releases on the FDA’s position on AI, how AI can support regulatory decisions, and the agency’s own use of advanced AI tools in its internal workflows and as decision support. This started with the release of the FDA's Preliminary Guidance Document on AI in Support of Regulatory Decisions through to the formal announcement of the agency’s use of generative AI solutions in support of regulatory assessments, decisions, and safety surveillance. This was unveiled as “Elsa” in the FDA blog and social postings by Marty Makary.

While the exact evolution and training of Elsa is not really known, we can be confident it includes a broad body of scientific literature, clinical guidelines, toxicology databases², teratogen databases³, etc. Much of this is “table stakes” for the major generative AI solutions, such as OpenAI, Anthropic, and Grok. For example, when asking OpenAI 4.x what was included in its training, most of the expected public and government-sponsored assets are provided as resources.

But the FDA is a unique organization with a broad set of internal assets⁴ such as program assessment documents with internal commentary and analyses supporting those assessments; internal safety databases from past studies, summary of cases processed, etc.⁵; and collaborative initiatives such as Sentinel⁶. When this combination of internal proprietary (e.g., non-public and containing confidential manufacturer or sponsor information) and internal public (with some limitations on what the public side tools can access) data, there is a powerful combination where the standard public reference sources and literature can function as a knowledge graph or enhance FDA staff queries through retrieval-augmented generation (RAG).

Let’s assume for the moment that Elsa is a unique version of one of the existing large language model (LLM) architectures and processing workflows (e.g., OpenAI, Anthropic) that has tuned and even been allowed to extend its training to a domain of proprietary and unique assets (e.g., FDA assets that are public, FDA assets where there are only public-facing summaries, documents submitted that are confidential to the submitter, and internally generated datasets, analyses, and tools). Elsa could therefore provide a set of analyses, assessments, and recommendations that no other system could—no matter how broad and contemporary the training. No confidential information from sponsors or manufacturers is compromised in this process as Elsa’s process is akin to how an experienced FDA program lead or evaluator would work through the years. The difference is speed and breadth of coverage. Over time, the Elsa and FDA expert interactions will become even more synergistic as prompts get refined for specific analyses, data is augmented, new sources added, multiple LLMs added with specialized capabilities, and agents increasingly act as pre-processors and extensions of the expert humans in the agency.

Before we discuss how sponsors might start interacting with Elsa, we want to develop how one LLM might interact with another. Most LLMs have APIs that take a list of messages as their inputs. For example, OpenAI GPT4 has a “chat completion API” that can accept a conversation history from a user (or agent!), followed by a sequence of alternating messages. In this case, a set of interactions in OpenAI can be sent to Anthropic through the API for Claude’s further analysis.

This is where it gets interesting. This same ability to use multiple LLMs through APIs can allow the structuring of chained or multi-agent systems where different LLMs are assigned specific roles or tasks.⁷ For example, one LLM can develop software code, another can review it, and a third can test it. Or, one LLM can draft a clinical trial protocol, another can assess the prospective patient accrual time, and a third can assess the likelihood of amendments across settings and countries.

These interactions can be abstracted such that an agent can summarize any conversation or interaction history as a message transformer. These agent-abstracted and -managed LLM interactions can take different forms—for example, (a) the automated response can be added to the LLM history automatically, (b) a specific or custom response can be generated, or (c) the agent can prompt the human user who, in turn, can sustain, intervene, or terminate the process. This all scales to systems where agents can delegate tasks to other agents or LLMs, where multiple agents can be orchestrated with very different roles.

Finally, agents can be augmented with retrieval capabilities. For example, data can be extracted from a private or proprietary database, integrate new data that isn’t within the LLMs training set, or constrained/grounded to assure that any response is consistent with the question, a protocol, guidelines, or another specific document. By using a set of reference documents split into shards, mapped to an embedding vector, and then aligned with a vector-store, an agent can override any default chat agent and provide a context-aligned, up-to-date, and validated response. These are critical capabilities for any demanding clinical or scientific analysis or inferencing.

Now, consider the biopharma sponsor. Most life sciences organizations will find exceptional value and power by building a large-scale multi-LLM environment for their discovery, translational, clinical development, and safety operations. Indeed, most large pharma have already moved towards AI-augmented scientific writing, coding, and regulatory submission document preparation. It will be only a matter of a few more months before more and more R&D workflows are using LLMs and agents for assessing the programs to move from discovery to first-in-human studies, design clinical trials, select clinical sites, and provide active surveillance of trial performance and safety.

As each step of the process is supported and documented by models and agents, it will be equally critical to understand how an outside partner might analyze the same information, place it into the context of current standard-of-care outcomes, evaluate whether the safety and tolerability issues of current therapeutic solutions have been improved upon, and look at the relative value for the health system compared to a range of OTC, generic, alternative, or non-interventional approaches. Or, stated differently, How will Elsa see this program and what would Elsa’s questions and assessments be?

Just as multiple agents can be coordinated around different task areas, so too can agents be set to assess each other and to debate⁸. The sponsor’s LLM(s) can prepare a submission as a function of both the standard outputs with the accumulated results of abstracted prompts and assessments that underlie a trial design, interpretations of trial outcomes, and accompanying statistical analyses. This can be provided to Elsa for a set of queries and follow-up requests. Elsa, in turn, can generate a first set of assessments, areas of follow-up, and even a suggested agenda for an internal review session and session with the sponsor. This process might feel as though it is rife with unknowns and risk. Our belief is that can be reduced through the multi-LLM and agent-based approaches we've described above. For example, the sponsor’s multi-LLM environment could have APIs open to ConcertAI’s CARAai platform, which contains one of the largest-scale datasets of cancers from all 50 U.S. states, medical claims on the entire U.S. population—all integrated with standard-of-care guidelines, authoritative references that supported the development of those treatment approaches, a broad set of clinical trials, etc., with a range of specialized agents for different outcomes analyses, safety assessments, and clinical research. Any sponsor results early plans, results, or conclusions could be corroborated and competed against CARA. While this is not the base of information and training that’s available within the FDA for Elsa, it does provide a robust set of insights with exquisite documentation that can support the best possible interaction with Elsa.

“WhenCARA met Elsa” is a metaphor for the AI decision-interaction world we are entering. For all companies, LLM-to-LLM and agent-to-agent interactions will be standard following the patterns we’ve described above. This will begin within the Virtual Private Cloud environments of the largest companies approximate to their first-party and licensed third-party data. But as these multi-LLM and agent-based scientific, clinical, and business processes evolve and contain the main body of analyses, assessment, decisions, and documents for submissions, we will move into workflows that cross enterprises and agencies, allowing LLM-to-LLM interactions, submissions, queries, and debates. Why are we so sure? First, prompt-level interactions and responses will be maintained and abstracted by the LLM systems being deployed in most life science companies. These will constitute part of the scientific and clinical records that are used for major internal funding, phase, transition, and regulatory submissions. Second, it will serve the efficiency and standardization of submissions for the FDA to open APIs and a certain level of prompt access in their systems. This is a natural evolution of the transitions and consolidations of databases of the last several years and will enable them to be more effective in all aspects of assuring the safety of the public and that needed medicines are advanced as rapidly as possible. Top-performing biopharma will push this forward with alacrity, integrating complementary platforms such as CARAai to assure that they have the most robust data and intelligence underpinning their decisions, and that programs are predicting with increasing accuracy how the FDA and other agencies might view their plans and submissions.

¹ Aside from an obvious reference to the Rob Reiner/Nora Ephron movie where the characters move from a dislike for each other to believing they may be the perfect match—something that may become true across AI systems as well.

² For examples see: https://research.lib.buffalo.edu/tox-In-depth/databases, https://pubchem.ncbi.nlm.nih.gov; former TOXNET sources: https://www.nlm.nih.gov/toxnet/index.html; broad summary of sources: https://libguides.anl.gov/c.php?g=471498&p=3227056.

³ There are many that are maintained by US academic centers (e.g., https://terisweb.deohs.washington.edu/content/about-teris) and various national health services (e.g., https://uktis.org). For a summary of sources. see: https://pmc.ncbi.nlm.nih.gov/articles/PMC2943188/.

⁴ For a summary as of May 2025, see: https://www.congress.gov/crs-product/R48133.

⁵ See https://fis.fda.gov/sense/app/95239e26-e0be-42d9-a960-9a5f7f1c25ee/sheet/7a47a261-d58b-4203-a8aa-6d3021737452/state/analysis.

⁶ See https://www.fda.gov/safety/fdas-sentinel-initiative and https://www.sentinelinitiative.org/about/who-involved#fda-sentinel-leadership.

⁷ There are a number of solutions for this. Examples include https://langroid.github.io/langroid/quick-start/ and https://www.langchain.com/langgraph, https://www.crewai.com/enterprise.

⁸The area of LLM debating is evolving rapidly. One overview with the potential role of "judge" can be found here: https://aws.amazon.com/blogs/machine-learning/improve-factual-consistency-with-llm-debates/.

View full post