We’ve all worked with someone who could follow instructions but couldn’t adapt when things changed. We even have a derogatory term to describe a person like this: a tool — - dependable perhaps, but narrow, rigid, and ultimately more work for the rest of the team.
A partner, on the other hand, thinks with you. They anticipate. They adapt. They contribute reasoning to the shared mission.
Today’s robots, for all their mechanical precision, have largely fallen into this category: they are powerful tools, yet, they are still tools. They have no agency. They can follow a pre-scripted sequence of movements, but if the environment changes unexpectedly – a box is out of place, a new obstacle appears, the lighting shifts – they get confused, fail, or require extensive retraining.
If an agent or member of a team (whether human or machine) has:
… then they are not a true “agent.” They can only operate in a narrow lane. They might execute their specific part fine, but they’re useless (and sometimes dangerous) when conditions change. Robots like this require constant human micromanagement. In unpredictable environments, this isn’t just inefficient – it’s unsafe.
What we really need are partners – robotic teammates with agency that can read the situation, adapt in real time, and reason about the mission alongside us.
And now, for the first time, recent research from VERSES AI has shattered that barrier through Active Inference.
A new paper quietly dropped on July 23, 2025, titled, “Mobile Manipulation with Active Inference for Long-Horizon Rearrangement Tasks.” In it, the VERSES AI research team, led by world-renowned neuroscientist, Dr. Karl Friston, demonstrates the blueprint for a new robotics control stack that achieves what has never been possible before: an inner-reasoning architecture of multiple active inference agents within a single robot body — working together for whole-body control to adapt and learn in real-time in unfamiliar environments. This hierarchical, multi-agent Active Inference framework enables robots to adapt in real time, plan over long sequences, and recover from unexpected problems, all without retraining.
Unlike current robots, which operate as one monolithic controller, this new blueprint operates as a network of collective intelligent agents, each powered by Active Inference – where every joint, every limb, every movement controller is itself an Active Inference agent with its own local understanding of the world, all coordinating together under a higher-level Active Inference model – the robot.
Think of it as a miniature society of decision-makers living inside a single robot’s body:
Lower-level agents handle precise control (like moving a gripper), while higher-level agents plan sequences of actions to achieve goals.
Each agent maintains intrinsic beliefs (about its own state) and extrinsic beliefs (about its relation to the world).
These agents continuously share belief updates and prediction errors:
In effect, the robot is a complex adaptive system of collective intelligence – much like a human body’s coordination between muscles, reflexes, and conscious planning.
All of these agents working together within the robot are able to communicate continuously, updating their beliefs based on Chief Scientist Dr. Karl Friston’s Free Energy Principle – the same mathematical framework that underlies human perception, learning, and action. These Active Inference agents within the robot body share with each other their belief states about the world and continually update their actions based on prediction errors. This is the same process humans use when walking across a crowded room carrying coffee: constant micro-adjustments at the joint level, limb coordination for balance, and high-level planning for navigation.
This means these Active Inference robots don’t just execute pre-programmed actions, they perceive, predict, and plan in concert, dynamically adjusting instantly if the world doesn’t match their expectations. That’s something reinforcement learning (RL) robots simply can’t do, as every adjustment requires extensive retraining.
This is not just an upgrade to robotics.
It’s a redefinition of what a robot is.
This breakthrough doesn’t stand alone – it’s built on two other major VERSES innovations:
AXIOM: A new scale-free Active Inference architecture that unifies perception, planning, and control in a single generative model.
VBGS (Variational Bayes Gaussian Splatting): A probabilistic, “uncertainty-aware” method for building high-fidelity 3D maps from sensor data.
Together, AXIOM and VBGS give Active Inference robots both brains and senses tuned for real-time teaming.
Until now, the field of robotics has struggled to meet this standard. Most control systems are either:
Reinforcement Learning (RL) has driven impressive demos, but it hits walls in the real world:
The result? Reinforcement Learning robots are powerful tools that may shine in lab benchmarks for fixed contexts, but in a real-world setting like a warehouse, hospital, or disaster zone, they’re too rigid and brittle to be trusted as independent operators, and they are expensive to retune at every new variation.
In Active Inference, logic isn’t bolted on after-the-fact as a “rule set” – it emerges naturally from:
That means logic here is probabilistic and contextual:
VERSES’ new research changes the game for robotics.
Instead of a single, monolithic Reinforcement Learning (RL) policy, their architecture creates a hierarchy of intelligent agents inside the robot, each running on the principles of Active Inference and the Free Energy Principle for seamless learning and adaptation in real-time.
Here’s what’s different:
The result is a system that can walk into an environment it has never seen before, understand the task, and execute it — adapting continuously as conditions change.
What we are talking about is a complex adaptive system that reasons at every scale of the body.
This mimics the human body’s sense of its position and movement in space; the ability to move without conscious thoughts, like walking without looking at our feet. This sense relies on sensory receptors in muscles, joints, and tendons that send information to the brain about body position and movement.
Every level runs the same principle: minimize free energy by aligning predictions with sensations and goals.
Top‑down: higher agents send preferences/goals (priors) to lower ones.
Bottom‑up: lower agents send prediction errors upward when reality deviates.
This circular flow yields real‑time adaptation: if the wrist feels unexpected torque, the arm adjusts, the base repositions for leverage, and the planner switches to an alternate grasp — without halting or reprogramming.
Base–arm coupling: The base isn’t a separate “navigation mode.” Its free‑energy minimization includes arm prediction errors, so the robot walks its body to help the arm (extending reach, improving approach geometry). That’s how you get whole‑body manipulation.
Robots need a world model that updates online without forgetting. VBGS provides that by:
Representing the scene as a mixture of Gaussians over 3D position + color (a probabilistic radiance/occupancy field).
Learning by closed‑form variational Bayes updates (CAVI with conjugate priors), so it can ingest streamed RGB‑D without replay buffers or backprop.
Maintaining uncertainty for every component – perfect for risk‑aware planning and obstacle avoidance.
Including a component reassignment heuristic to cover under‑modeled regions quickly; supports continual learning without catastrophic forgetting.
In the robot: VBGS builds a probabilistic map of obstacles, articulated surfaces (drawers, fridge doors), and free space. The controller reads this map to plan paths and motions, assigning higher “costs” to occupied or risky regions. Because the map is Bayesian, where it designates high uncertainty, the policy that guides the robot shifts to conservative behavior (slow down, keep distance) or initiates “active sensing” for an information-gathering move like a brief re-scan or viewpoint change before committing to contact.
AXIOM complements embodied control with a world‑model and planning core that is fast, interpretable, and expandable:
sMM (slot mixture model): parses pixels into object‑centric latents (position, color, extent) via mixtures.
iMM (identity mixture): assigns type tokens (object identity) from continuous features; type‑conditioned dynamics generalize across instances.
tMM (transition mixture): switching linear dynamics (SLDS) discover motion primitives (falling, sliding, bouncing) shared across objects.
rMM (recurrent mixture): learns sparse interactions (collisions, rewards, actions) linking objects, actions, and mode switches.
Growth and Bayesian Model Reduction: expands components on‑the‑fly when data demands it; later merges redundant clusters to simplify and generalize.
Planning by expected free energy: trades off utility (reward) with information gain (learn what matters), choosing actions that both progress goals and reduce uncertainty.
AXIOM shows how Bayesian, object‑centric models learn useful dynamics in minutes (no gradients), clarifying how Active Inference can scale beyond low‑level control to task‑level understanding and planning, AND interoperate with human reasoning.
Since the system is built from agents that are themselves adaptive learners, the robot doesn’t need exhaustive pre-training for every possible variation. It can:
Active Inference proves superior, adapting in real-time, with no offline training.
Benchmark Tasks (long‑horizon, mobile manipulation): “Tidy House”, “Prepare Groceries”, and “Set Table.”
Active Inference (AI) vs Reinforcement Learning (RL) baselines
Success/Completion (averaged over 100 episodes):
Training burden (baseline Reinforcement Learning): ~6,400 episodes per task + 100M steps per skill (7 skills) to train.
Active Inference: no offline training; hand‑tuned skills over a handful of episodes; adapts online (skills retry/compose autonomously).
Active Inference Adaptation: recovers from sub‑task failures by re‑planning (alternate approach directions, base repositioning) without retraining.
Significance: This is the first demonstration that a fully hierarchical Active Inference architecture can scale to modern, long‑horizon robotics benchmarks and outperform strong RL baselines on success and adaptability — without massive offline training.
Active Inference robots are uniquely suited to team up with humans. They reason in a way that is compatible with us, sharing a common sense-making framework: perceiving the environment, predicting outcomes, and adjusting actions to minimize uncertainty.
Here’s what that means in practice:
Safety through shared situational awareness: The robot can share its internal 3D “belief map” of the environment, including areas of uncertainty, with its human partner in real time. If it’s unsure whether a space is safe, that uncertainty becomes a joint decision point.
Safety through uncertainty management: The robot doesn’t just act; it calculates confidence. If confidence is low, it seeks input, pauses, or adapts, reducing risk.
Fluid division of labor: Just as two humans can read each other’s intentions and adjust roles dynamically, an Active Inference robot can anticipate when to lead, when to follow, and when to yield control for safety or efficiency.
Adaptive Role Switching: If a robot sees the human struggling with a task, it can take over — or vice versa — without a full reset.
In safety-critical domains like manufacturing, disaster response, or healthcare, this means a human and a robot can operate as true partners, each understanding, predicting, and adapting to the other’s actions without rigid scripting.
When both humans and robots operate on Active Inference principles, the synergy is remarkable. Here’s what happens when both parties have reasoning and logic capabilities:
When one side doesn’t have that capacity, the reasoning partner must micromanage the non-reasoning partner, creating delays, errors, and frustration.
When you combine this internal multi-agent robotic structure with the Spatial Web Protocol, the collaboration scales beyond a single robot. This internal coordination becomes even more powerful through the HSTP and HSML. A team of robots (or a robot and a human) can operate as if they’re part of the same organism, with shared awareness of objectives, risks, and opportunities, so knowledge gained by one agent can inform all others.
HSTP (Hyperspace Transaction Protocol): Enables secure, decentralized exchange of belief/goal updates, constraints, and state updates between distributed agents — whether human, robot, or infrastructure. No central brain required.
HSML (Hyperspace Modeling Language): Gives all agents shared semantic 3D models of understanding of the environment/places, objects, tasks, and rules — so every agent reads the same plan the same way.
The result: The same belief propagation that coordinates a robot’s elbow and wheels can coordinate two robots, a human coordinator, and a smart facility, instantly and safely.
Imagine a hospital delivery robot, a nurse, and a smart inventory system -all operating as if they were parts of one coordinated organism, sharing the same mission context in real time. This level of cohesive interoperability across agents and platforms is one of the most profoundly beautiful aspects of this new technology stack.
So what might this look like in the real world?
Aviation Ground Ops:
Current RL-driven robots require scripted contingencies for every deviation from plan.
An Active Inference robot spots a luggage cart blocking a service bay, predicts the delay’s ripple effects, and instantly negotiates a new task order with human supervisors — avoiding knock-on delays.
Disaster Response:
RL robots can detect hazards, but often lack the reasoning framework to weigh competing risks without retraining.
An Active Inference robot in a collapsed building senses structural instability, flags the uncertainty level, and suggests alternate search routes in collaboration with human responders.
Industrial Logistics:
In a smart factory, robots equipped with Active Inference and VBGS mapping adapt to new conveyor layouts without reprogramming, while humans focus on production priorities.
VERSES’ broader research stack ties this directly into scalable, networked intelligence:
Together, these form the technical bridge from a single robot as a teammate to globally networked, distributed intelligent systems, where every human, robot, and system can collaborate through a shared understanding of the world.
The levels of interoperability, optimization, cooperation, and co-regulation are unprecedented and staggering. Every industry will be touched by this technology. Smart cities all over the globe will come to life through this technology.
This isn’t just a robotics upgrade — it’s a paradigm shift.
Where RL robots are powerful but brittle tools, Active Inference robots are reasoning teammates capable of operating in the fluid, unpredictable reality of human environments.
This is happening right now, and it changes what we can expect from robotics forever.
VERSES AI’s remarkable work shows for the first time that a robot can evolve beyond the current Reinforcement Learning (RL) limitations:
This is the first public demonstration that Active Inference scales to real robotics complexity while outperforming current paradigms on efficiency, adaptability, and safety – without the data and maintenance burden of Reinforcement Learning.
It’s the first public proof that Active Inference can scale to the complexity of real-world tasks.
VERSES AI’s research stack resets the bar for what we should expect from robots in homes, hospitals, airports, factories, AND from the teams we build with them.
Want to learn more? Join me at AIX Learning Lab Central where you will find a series of executive training and the only advanced certification program available in this field. You’ll also discover an amazing education repository known as the Resource Locker — an exhaustive collection of the latest research papers, articles, video interviews, and more, all focused on Active Inference AI and Spatial Web Technologies.
Membership is FREE, and if you join now, you’ll receive a special welcome code for 30% all courses and certifications.
The global education hub where our community thrives!
Scale the learning experience beyond content and cut out the noise in our hyper-focused engaging environment to innovate with others around the world.
Join the conversation every month for Learning Lab LIVE!