SNOMED CT: healthcare’s new common language (and why AI should care)

No items found.

In 1837, William Farr, the world's first medical statistician, faced a fundamental problem: how do you count instances of tuberculosis when different hospitals record it under various names such as consumption, phthisis, and lung disease? A disease was frequently recorded under multiple names, while identical terms were applied to entirely different conditions. This lack of standardization made it nearly impossible to track disease patterns, compare mortality data across regions, or conduct meaningful statistical analysis of public health trends.

To address this challenge, Farr developed a classification system that standardized disease terminology across medical institutions. His system organized diseases not only by medical category but also incorporated synonyms and local terms that captured the various regional names used for the same conditions. This systematic approach allowed for consistent recording and comparison of disease data across different hospitals and geographic areas.

Farr's classification is widely regarded as the predecessor to the modern International Classification of Diseases (ICD) system used worldwide today. In the ICD system, each disease, condition, or injury is assigned a unique alphanumeric code that serves as its standardized identifier. From its modest beginnings in 1893 with just 161 codes for causes of death, the ICD has undergone remarkable expansion to over 14,000 codes in ICD-10. This exponential growth reflects both medicine's advancing complexity and the system's evolution beyond mortality statistics. Today, these codes enable multiple critical functions: clinical documentation, health research, resource allocation, and billing and reimbursement.

Even though most countries use ICD-10, monitoring global health trends is challenging. ICD-10’s 14,000 codes may seem like a lot, but it is not enough. This was evident in the early 2000s, when carbon monoxide poisoning from charcoal burning rapidly became a common suicide method across Hong Kong, Taiwan, Japan, Korea, and Singapore. However, because there are no ICD-10 codes for “suicide by charcoal burning,” detecting this trend was slow, and monitoring the effectiveness of policies to prevent it was difficult. This is where SNOMED CT comes in.

SNOMED CT is a powerful clinical terminology that can express almost any clinical concept in a machine-readable way. Not only does it comprise +350 000 concepts, but one can also combine them to express new concepts. So, what does this mean?  While there is no existing concept for suicide due to charcoal burning, you can combine “Suicide (event) 44301001” andOpen charcoal fire (physical object) 257204008” using the “Due to” relationship. This way, you can express an infinite number of concepts without needing a unique code for every possible combination. 

How does this fit with ICD-10 and other coding systems?

If SNOMED CT is so powerful, why do we still use coding systems such as ICD-10 and CPT? They serve different purposes. SNOMED CT is built for capturing clinical detail at the point of care. ICD-10 and CPT are increasingly built for billing. To connect the two, there are mapping tools: rule sets that translate SNOMED CT codes into ICD-10 or CPT.

At Corti today, our AI models go straight from clinical notes to ICD-10 or CPT codes (for our Amerian customers). However, we imagine a future workflow like this:

  • Clinical note → SNOMED CT
  • SNOMED CT → ICD-10 / CPT

That would mean capturing rich meaning first, and only later translating it into the billing codes that systems require. 

Where Corti fits in

At Corti, we build the infrastructure layer for healthcare AI: models and APIs that can listen to consultations, structure what they hear, and hand that data to the systems that need it.

One of the things our infrastructure already supports is automated medical coding. Today, our models can take a clinical note and predict ICD-10 or CPT codes: formats widely used for billing, reimbursement, and reporting across different healthcare systems.

But that’s just the start. In the future, AI models may use SNOMED CT as a transparent and precise representation of every patient journey — a symbolic representation. We can transform the SNOMED CT representation into any coding system, we can retrieve any clinical information about patients, and we can even generate clinical notes ancored in the SNOMED CT representations. SNOMED CT may be the key to making medical AI transparent and explainable, solving the critical black box problem that currently limits trust in healthcare AI systems.

See Joakim’s original blog post here.