Robotics does not have a hardware bottleneck. It has a data bottleneck.

In physical AI, the scarce layer is not compute. It is high-quality multimodal data and feedback loops.

February 7, 20264 min read

Over the last few months I spoke with many people across xAI, OpenAI, Anthropic, and data-labeling companies.

I became obsessed with finding an entry point to build a high-quality data collection and labeling company.

The reason was simple: 2025 felt like the year the market made something explicit that many people already suspected.

Models do not improve through architecture alone. They improve when you unlock better data and better feedback loops.

We saw strong signals:

Meta bought a large stake in Scale AI for $14B.
Mercor reached $500M annual run rate and a $10B valuation in three years.
Surge AI reached $1B annual run rate without outside funding.

And those are only a few examples among other fast-growing players like Handshake, Micro1, Turing, and Snorkel.

So where is data demand actually coming from?

The pattern: scaling still rewards data and compute, but easy data is gone

Practical scaling laws are a simple intuition: if you invest more compute and more well-selected data, models tend to improve.

It is not magic, not infinite, and not equally true for every domain. But as a game rule, it has been consistent.

The problem is that easy data (cheap, abundant, scrapeable, low-friction) is no longer the bottleneck it was in 2018. Today the bottleneck looks different:

Rights and permissions
Quality (not just quantity)
Specialized domains (math, science, finance, code, rare tasks)
And above all: evaluation and verification (how useful the output actually is)

That is why human-data companies exploded: the market realized progress was not only "more GPUs" but "more high-quality feedback."

What these companies actually do

At the core, they build a production and quality-control factory for human data:

Recruit labor at scale (from crowd workers to domain experts)
Filter by task-specific skill
Run operational onboarding and training
Design tasks (what to collect, how to measure, how to compare)
Run QA (consistency, auditing, anti-fraud, evaluator calibration)

At large scale, projects can involve thousands or tens of thousands of people generating or evaluating data: accented voice, step-by-step reasoning, safety, preferences, edge cases, and more.

Why entering this market is not obvious

After speaking with founders and researchers, the market shape looked clear:

Massive supply: every week a new company offers data + workforce + QA.
Concentrated demand: large budgets are controlled by a small set of frontier labs and big tech companies.

That creates a hyper-competitive, relationship-driven market.

To win large generic human-data contracts, you usually need one or both:

Distribution/network: direct access to buyers and decision-makers
Truly scarce data: something that is hard to replicate and not a commodity

That was the point where my thesis changed.

The shift: in robotics, scarcity is real and the dataset layer is still early

In text, image, and video, the world already learned how to scale on large datasets. In robotics, the constraints are very different.

You cannot scrape the physical world. You must instrument it.

Robotics needs multimodal data that almost nobody has at scale:

RGB + depth
IMU / proprioception
Force/torque and contact
Tactile signals
Actions (what the robot or human actually did)
Intent, task phase, success/failure
Real physical context (mess, friction, odd objects, poor lighting, kids, pets, etc.)

Off-policy vs on-policy: the real world charges for every learning bit

A useful way to frame it:

Off-policy / offline: learn from logs, human demonstrations, teleoperation, or previously collected data.
On-policy / online: collect while acting, learn from environment feedback, and improve through iteration.

Autonomous vehicles make this obvious: mapping, route validation, rare scenario collection, HD map maintenance, and continuous updates are expensive and slow. Even with fleet scale, the world keeps changing.

Now imagine humanoids or home robots.

The home-robot problem: almost no one records what robots need to learn

For household robotics, datasets are absurdly scarce. Not because tasks are conceptually impossible, but because:

The physical world has infinite variation (kitchens, tools, textures, sizes, clutter)
Many tasks require contact, force, and fine control
Useful ground truth is not just video, it includes physical signals

Simulation helps, but it is not a silver bullet. High-fidelity simulation of contact, friction, deformation, tactile feedback, and rare real-world effects is still hard. Domain randomization helps, but sim-to-real gaps remain significant.

And the hardware needed to capture robust depth, force/torque, tactile, and other reliable streams is not deployed in millions of homes.

My thesis: the biggest gap in robotics is data and loops

After this rabbit hole, my conclusion is: for many real-world tasks, the limiting factor is not the robot. It is the dataset and the learning loop.

But "data" here does not mean collecting videos. It means building a full system:

How you capture (instrumentation)
How you structure (formats and multimodal synchronization)
How you label (actions, phases, rewards, errors)
How you run QA (consistency and auditability)
How you convert all of that into measurable policy/model gains

That is why I think this is one of the best moments to start in robotics. Models and hardware are crossing a capability threshold, while the physical-world data layer is still immature.

Whoever builds that layer with real advantages (capture + structure + QA + loops) is not selling "datasets."

They are selling progress.