Neuroscience of AI

Our R&D team has created the most effective way to reveal how LLMs work. Joakim explains why we need these tools and how they work.

In this Builder Series conversation, Corti researcher Joakim Edin examines a problem many AI teams face: modern neural networks work well, but their internal logic is hard to explain.

That opacity becomes costly when models fail. Hallucinations, looping behaviour, or confident errors are difficult to fix without understanding the mechanisms behind them. Teams often rely on trial and error, retraining models and hoping performance improves.

Mechanistic interpretability aims to change that. Instead of treating models as black boxes, it seeks to explain how neural networks compute their outputs, neuron by neuron, so improvements become engineering decisions rather than guesses.

Edin’s work focuses on circuit discovery, identifying which neurons and connections drive specific behaviours. Standard methods analyse neurons in isolation, but many behaviours only emerge from interactions between neurons. Gradient Interaction Modifications, or GIM, addresses this by surfacing which neurons matter together.

This approach topped the Mechanistic Interpretability benchmark on Hugging Face and reflects why interpretability matters in healthcare AI. Trust, safety, and explainability are not optional. Understanding why a model fails is a prerequisite for fixing it.