Decoding healthcare’s unstructured data with a novel multi-agent system

Healthcare runs on data, but most of it is unstructured text. Clinical notes, discharge summaries, test results, and now AI-generated records are piling up faster than ever. Without structure, this information is difficult to use for patient safety, population health, or hospital operations.
Medical coding is one of the most established ways to bring structure. By translating clinical language into standardized medical codes - such as ICD-10 - it safeguards patient records, ensures hospitals are reimbursed correctly, and provides structured data that supports research and policy. But despite its importance, progress in automating coding has stalled. Models still miss rare codes, fail to generalize, and lag behind system updates, leaving records vulnerable to errors, compliance risks, and brittle performance that slows progress across the field.
Since 2018, most models have relied on memorizing patterns from annotated datasets through supervised (or semi-supervised) learning. This approach falters with rare codes, struggles across specialties, and breaks whenever coding systems are updated. Developers encounter the same limitations in the machine learning models they use to build their healthcare applications. This makes many builders end up relying purely on non-optimized (or poorly finetuned) LLMs with few-shot prompting to provide a more versatile interface for providing coding suggestions. However, the problem with this approach is that these LLMs tend to be very inaccurate and unreliable coders in their own right.
A new framework for medical coding
At Corti, we take on the toughest problems in healthcare AI - and our latest research paper, Code Like Humans, introduces a new multi-agent framework for coding. Today we are pleased to share that this has just been accepted at EMNLP 2025: one of the world’s leading AI conferences.
Code Like Humans rethinks medical coding as a reasoning process rather than brute-force memorization of 100,000+ codes. The framework, made out of targeted LLMs, uses four agents that mirror how professional coders work:
- Evidence extractor - isolates conditions in a clinical note that need to be coded.
- Index navigator – searches the ICD alphabetical index to find candidate codes.
- Tabular validator – verifies candidates against the tabular list and official guidelines.
- Code reconciler - reconciles the candidate list, removing invalid codes and sequencing the final output.
Consider a case like chronic obstructive pulmonary disease (COPD) for example. The evidence extractor could flag COPD as a codeable condition in the note. The index navigator would surface COPD in the ICD index and suggest a candidate like J44. The tabular validator might then refine this against the tabular list, specifying J44.0 if the note indicated an acute lower respiratory infection. Finally, the code reconciler would place this in sequence with other codes to produce a compliant final set.
Because the agents are modular, each can be improved and evaluated independently - creating opportunities for developers to benchmark, extend, or even swap in specialized components. Even when using general-purpose language models without fine-tuning, the framework outperformed conventional approaches on rare codes and showed adaptability to new ICD versions. Building on that foundation, our teams have already refined the approach, prepared it for real-world use and shaped the next release of products that Corti will deliver for developers in the weeks ahead. Follow docs.corti.ai for the latest technical updates.
Why this matters
Accurate coding safeguards patient safety and ensures the right care is documented. It also provides the structured data that drives research and policy. When coding fails, the consequences ripple out: distorted records, compliance risk, delayed payments, and billions in wasted revenue and time.
In testing, Code Like Humans showed strength in handling rare codes, a known blind spot for conventional models. Its agent-based structure also enables adaptability to new ICD versions and across specialties - two areas where existing approaches repeatedly fail. The EMNLP acceptance signals that medical coding can move past stagnation toward systematic, reliable automation.
What’s ahead
Acceptance at EMNLP is an important validation step, but the real impact must be felt in the clinic, not in the lab. In the months ahead, we will introduce the first commercial products built on this research, giving developers tools to code like humans at scale. These products will be:
- Compliance-ready from day one (HIPAA, GDPR, audit trails, sovereign options)
- Developer-first - with APIs that minimize wasted cycles
- Built for production speed - moving from pilot to deployment in weeks, not years
Corti has always been research-driven, validating ideas in peer-reviewed forums before translating them into commercial-grade solutions. What sets us apart is our ability to turn breakthroughs into infrastructure that is safe, fast, and ready for the complexity of real-world healthcare. With Code Like Humans, we are one step closer to solving one of the most costly and complex challenges in healthcare.
Stay tuned for more on this soon.