What comes after ChatGPT may not look like a chatbot

Foundation models are starting to move from language into the physical world.

April 28, 20268 min read

For many people, it now feels obvious why ChatGPT was surprising.

But in a way, most of us noticed the wave late.

Between 2015 and 2022, language models were improving, OpenAI was already working, and the field was moving. But for most people, it did not feel like "the future" yet.

Early OpenAI office

Then ChatGPT launched and suddenly the shift became obvious.

I think something similar may be happening in robotics.

Not because humanoids will be doing everything inside our homes tomorrow. That will probably take much longer than people want.

But because the direction of the field is changing.

We are moving from robots programmed task by task to models that learn more general principles for acting in the physical world.

That is the part worth paying attention to.

A lot of serious companies are already betting on this direction: Tesla, OpenAI, Google, Amazon, DoorDash, Waymo, robotics labs, and a growing set of foundation-model teams.

Robotics was already huge, just not in the way people imagined

Robots have been useful for decades.

Industrial arms weld cars. Warehouse robots move boxes. Surgical robots assist doctors. Factories already depend on automation.

Industrial automation at scale

Each case required specific hardware, specific sensors, specific software, specific integration, and a relatively controlled environment.

So the point is not that robotics is new.

The point is that the old model was very specialized. If you wanted a robot to do a task, you designed a lot of the system around that task: the hardware, sensors, software, integration, and the environment.

The newer thesis is deeper than "more robots in factories." It is that robots may stop being machines programmed for one narrow job and start becoming systems that learn how to act in the physical world.

From specialized robots to generalist models

Early industrial robotics

The commercial story goes back to the 1960s, when Unimate brought industrial robotics into automotive production. Its value was clear: doing repetitive, dangerous, or physically heavy work with more consistency than a human.

Around the same time, academic research was already imagining robots that could perceive, reason, and act.

Shakey robot

Shakey, developed by SRI between 1966 and 1972, was one of the first mobile robots capable of perceiving its environment, planning routes, and reorganizing simple objects. It was an early demonstration of a powerful idea: a robot not only as a mechanical machine, but as a system that connects perception, reasoning, and action.

That works extremely well when the world is controlled.

A factory can be designed around the robot. The lighting is stable. The parts are known. The process is repetitive.

The hard part is the messy world.

A warehouse that changes. A hospital. A kitchen. A home. A back room where objects move, people interrupt, doors get stuck, floors are wet, and the environment was not designed for the machine.

That variability is what makes general robotics hard.

Why "simple" physical tasks are not simple

There is an old idea in AI called Moravec's paradox.

Things that feel hard to humans, like chess, math, or symbolic reasoning, became easier for computers earlier.

Things that feel easy to humans, like walking, grabbing a cup, folding clothes, cleaning a table, or moving through a room, are extremely hard for machines.

The reason is that our physical abilities are the result of a long evolutionary process: vision, balance, touch, force, coordination, spatial intuition, and fine manipulation.

We do not notice how much is happening because we do it without thinking.

That is why many important robotics demos look boring to a normal person.

A robot folding clothes or cleaning a kitchen is less viral than a humanoid doing a backflip.

But technically, the boring demo can be much more impressive.

A robot doing acrobatics can be in a controlled environment, with preprogrammed movement and very specific conditions.

Cleaning an unknown kitchen is different. It requires recognizing objects, inferring their function, deciding what goes where, coordinating fine movements, and reacting to mistakes.

The real test is not whether a robot can perform a scripted movement.

The real test is how much variability it can handle.

What language models changed

ChatGPT changed public perception because it showed that a general model could handle many language tasks without being programmed one by one.

Before, we had systems for translation, summarization, classification, writing assistance, and other specific jobs.

Then LLMs learned broad patterns of language and started adapting to many tasks.

Robotics is trying to make a similar transition.

This is where VLA models matter: vision-language-action models.

In simple terms, these models connect what a system sees, what it understands, and what it does.

Vision language action models

Google DeepMind's RT-2 was one early sign of this direction: using web-scale visual and language knowledge, plus robot data, to translate what the system sees and understands into physical actions.

That matters because a robot needs more than language understanding.

If you tell a robot to clean a spill, it needs to recognize the spill, understand the goal, find a tool, plan the movement, use the right force, adjust when the cloth folds, and recover if the object moves.

Language gives context.

Vision gives state.

Action is where everything becomes real.

There is no internet of movement

Language models had a gift: the internet.

There was an enormous amount of text available: books, websites, documentation, forums, code, articles.

Robotics does not have the same thing.

There is no massive internet of what it feels like to grab a cup, how much force closes an old drawer, how a fabric changes when it is folded, or how to react when a door gets stuck.

That is one of the biggest bottlenecks in the field.

Robotics data bottleneck

Robots need physical data.

And physical data is expensive, slow, messy, and operationally hard to collect.

That is also why I became interested in the data layer for robotics. The scarce thing is not only better hardware or more compute. It is the loop between real-world experience, demonstrations, feedback, evaluation, and deployment.

Real time matters

There is another huge difference between a chatbot and a robot.

If ChatGPT takes two extra seconds to answer, it is annoying but usually fine.

If a robot takes two extra seconds to decide its next movement, it can pause, shake, collide, lose balance, or look useless.

Physical AI needs to act in real time.

That means progress is not only about making models smarter. It is also about making them act smoothly, predict ahead, and update while the world changes.

Real-time robot action models

This is where robotics becomes much more than "LLMs with arms."

It is perception, planning, control, hardware, safety, latency, data, and operations all at once.

One idea from Physical Intelligence is real-time chunking: instead of deciding one tiny movement at a time, the model predicts short sequences of movement while it is already preparing the next sequence.

Why humanoids get so much attention

Humanoids are interesting for a simple reason: the human world was built for human bodies.

If you want to automate spaces already designed for people, you have two options.

You redesign the environment for machines.

Or you build machines that can operate in human environments.

Humanoids are attractive because they promise to work inside spaces that already exist.

But they also carry the hardest constraints: balance, hands, locomotion, fine manipulation, energy, safety near people, manufacturing cost, maintenance, and reliability.

So I do not think the future is simply "humanoids replace all specialized robots."

The more likely outcome is a mix.

Factories, warehouses, mining, agriculture, hospitals, and industrial kitchens will still use many specialized robots because specialization is often cheaper and more reliable.

Homes, offices, retail, and certain human spaces may eventually make more sense for humanoids if the reliability and cost get good enough.

What still needs to happen

The technical progress is real, but the bottlenecks are still large.

Reliability is the first one. In software, a model that is right 80 percent of the time can still be useful with human supervision. In robotics, 80 percent can be terrible. A company does not want a robot that almost always handles a part correctly if the remaining errors stop the line.

Cost is the second. A robot is not just software. It has a body, sensors, batteries, parts, maintenance, installation, support, insurance, and downtime.

Data is the third. Robots need to operate in the real world to improve. But to operate in the real world, they need to be good enough. That loop is only starting.

Safety is the fourth. A digital agent can make a bad document. A physical agent can move a tool, hit an object, spill liquid, or get too close to a person.

Integration is the fifth. Many companies are not ready for physical automation. Their operations depend on exceptions, informal communication, messy layouts, and tacit knowledge from operators.

Before adding robots, many businesses will have to redesign workflows, physical spaces, operational data, and responsibility.

So when does it arrive?

It depends on what we mean by "arrive."

Robots are already here in factories, logistics, and industrial environments.

What is new is flexibility.

In the next few years, I think the most important progress will happen in controlled commercial environments: manufacturing, warehouses, food, industrial inspection, mining, specialized agriculture, hospitals, and services where tasks are repeatable, measurable, and economically clear.

Homes will be harder.

Not because demand is low, but because homes are chaotic, emotional, diverse, and not standardized. Every home is different. Every family organizes things differently. The tolerance for mistakes is low. The cost has to fall a lot.

My current view is simple:

Robotics used to be mostly a story about specialized machines.

Now it is becoming a story about models that learn general principles of physical action.

And if that transition works, the next ChatGPT moment may not happen on a screen.

It may happen when a machine starts doing useful work in the real world.