Building the Trustworthy Data Foundation for Healthcare AI

The Hidden Challenge Behind Healthcare AI

The Hidden Challenge Behind Healthcare AI

Healthcare AI is advancing rapidly, with growing focus on multimodal models, precision oncology, and large-scale real-world evidence generation. But beneath that momentum lies a foundational challenge that is often overlooked: clinical data is highly inconsistent across healthcare systems.

Even when healthcare organizations are measuring the same biological signal, laboratory tests are often represented differently using different coding systems, naming conventions, associated units, or institution-specific implementation patterns. Within a single institution, those inconsistencies may be manageable. However, when aggregating oncology laboratory data across hundreds of healthcare organizations for analytics, research, and AI development, those differences become a major interoperability challenge.

Without harmonization, AI systems and downstream analytics may interpret clinically equivalent laboratory measurements as entirely different variables. This can introduce hidden variability, reduce reproducibility, and create institution-specific artifacts that weaken downstream insights and model reliability.

Our team, spearheaded by ConcertAI’s Parvati Naliyatthaliyazchayil, recently published research in JMIR Medical Informatics [1], addressing this challenge through designing a scalable harmonization framework to standardize laboratory tests together with their associated units across heterogeneous healthcare systems. I sat with Parvati recently to learn more about her work and its role in improving healthcare interoperability; here are some thoughts based on our conversation. 

Creating Clinically Consistent Data at Scale

Rather than relying on downstream AI systems to interpret fragmented laboratory inputs, the framework applies a deterministic and clinically grounded approach to harmonize laboratory measurements and their associated units before analytics and AI model development occur. 

“As healthcare AI scales across institutions, ensuring clinical data remains consistent and interoperable becomes increasingly important,” Parvati told me. “This work focused on helping create a more reliable data foundation for oncology AI and real-world evidence.”

The work focused on developing scalable harmonization logic capable of resolving laboratory variability across multisource oncology datasets while maintaining data integrity, traceability, and provenance across downstream workflows.

The framework was evaluated across approximately 6.3 billion laboratory records from roughly 10 million oncology patients spanning multiple healthcare systems and EHR environments. It demonstrated substantial improvements in laboratory harmonization performance, increasing unit accuracy and completeness from approximately 70% to over 99% across fragmented multisource datasets while significantly improving interoperability and laboratory unit assignment consistency across downstream analytics workflows.

“Semantic variation in how clinical measurements are captured can introduce complexity when combining healthcare data from multiple sources,” Parvati noted. “This work focused on helping harmonize those signals before downstream AI modeling begins.”

Why This Matters for Oncology and AI

The work is especially critical in oncology, where laboratory measurements are foundational for biomarker analysis, treatment response evaluation, toxicity monitoring, longitudinal patient tracking, and precision medicine initiatives.

As healthcare organizations increasingly scale enterprise AI and real-world evidence programs, the ability to reliably aggregate and standardize laboratory data across institutions becomes increasingly important. High-quality laboratory harmonization helps improve analytical consistency across multisource datasets while strengthening the reliability and reproducibility of downstream research and AI workflows.

By harmonizing both laboratory tests and their associated units before downstream analytics and AI model development occur, the framework helps create more interoperable and AI-ready oncology datasets at scale. The resulting harmonized laboratory layer supports a broad range of downstream applications, including oncology analytics, biomarker research, longitudinal patient studies, treatment response analysis, translational research, and AI development workflows.

The work also contributed to ConcertAI’s oncology data processing pipelines supporting scalable analytics and AI-ready clinical data infrastructure across heterogeneous healthcare environments.

More broadly, the research reinforces an increasingly important industry reality: trustworthy AI requires trustworthy and interoperable clinical data underneath it. Creating traceable, transparent AI applications is a cornerstone to everything we do at ConcertAI.

Supporting the Future of Trustworthy Healthcare AI

The research reflects a broader shift happening across healthcare AI: as organizations scale enterprise AI initiatives, the focus is expanding beyond model development to the quality and interoperability of the underlying clinical data infrastructure.

By addressing a foundational interoperability challenge in multisource oncology data, the work reinforces ConcertAI’s broader focus on oncology real-world evidence, clinical AI, and scalable healthcare analytics. The harmonization framework was incorporated into ConcertAI’s  multisource oncology data processing pipelines supportingreal-world evidence generation, translational research, precision oncology product offerings across partner healthcare systems and research organizations.

Beyond laboratory harmonization itself, the framework helps strengthen the reliability, consistency, and scalability of downstream analytics and AI systems built on top of multisource oncology data.

Looking ahead, this framework’s knowledge-table-driven design makes it well-suited for agentic AI, where autonomous systems could apply harmonization logic, validate laboratory values, and maintain provenance across institutions without manual intervention, enabling more scalable and reliable oncology workflows.  

References

[1] Naliyatthaliyazchayil P, Stenerson T. Harmonizing Logical Observation Identifiers Names and Codes (LOINC) Codes and Units in Real-World Oncology Data: Method Development and Evaluation. JMIR Med Inform. 2026 Mar 9;14:e81254. doi: 10.2196/81254. PMID: 41802234; PMCID: PMC13010070.