
“The ChatGPT moment for physical AI is here,” Jensen Huang, CEO of NVIDIA, declared at CES 2026, “when machines begin to understand, reason and act in the real world.”
It’s a bold claim. Huang has made similar predictions before - at CES 2025, he said the moment was “just around the corner.” But this time, the evidence is harder to dismiss. At Airtree, we’ve been tracking robotics for years, waiting for the right moment. Is this it?
Chinese startup Unitree is now shipping a humanoid robot for $5,900. Robots at Physical Intelligence have made coffee for 13 hours straight without human intervention. Google DeepMind released models that can control any robot form factor: arms, humanoids, or mobile platforms, from a single neural network. Morgan Stanley projects a $5 trillion annual market by 2050. Over $10 billion flowed into robotics in 2025 alone.
Something is clearly happening. But are these news stories indications of a breakthrough that finally transforms a research field into commercial momentum that reshapes the economy?
The short answer is yes and no. The AI foundations are falling into place faster than anyone expected. But robots need bodies, bodies need factories, and factories need supply chains, which are increasingly concentrated in a single country. Now felt like the right moment to dive back into the industry and form a thesis on what we believe is likely, when, and where the startup winners will emerge.
The most significant development in robotics over the past two years has been the emergence of Vision-Language-Action (VLA) models. These represent a fundamental shift in how robots learn, and early evidence suggests they may follow similar scaling laws to those that made large language models so powerful.

To appreciate why this matters, consider how robotics evolved. For decades, the field operated on a premise that seemed sensible: break the problem into manageable pieces. Engineers built separate systems for perception, planning, and control. Each component would be optimised independently, then connected through carefully designed interfaces. A perception system using convolutional neural networks would identify objects and estimate their positions, outputting structured data like “there’s a cup at coordinates (0.3, 0.15, 0.2).” A separate planning system, often classical algorithms with no machine learning at all, would compute collision-free paths. A control layer would convert those paths into motor commands.
This architecture produced robots that could perform narrow, pre-programmed tasks with impressive precision. But when anything unexpected happened, they failed catastrophically. The perception system might estimate an object’s position slightly wrong; the planner, trusting this estimate completely, would compute a path to the wrong location; the controller would execute that path perfectly—and miss the object entirely. Each layer operated in isolation, losing information at every handoff.
VLAs abandon this separation entirely. They take raw camera images, combine them with natural language instructions, and output motor commands directly. One end-to-end neural network handles everything. As researchers at Physical Intelligence put it: “When I try to pick up this glass, I don’t think about it in terms of perception and then planning and then control. I just go for it.”
Google DeepMind’s Gemini Robotics can adapt to robot forms ranging from bi-arm platforms to Apptronik’s Apollo humanoid, learning new tasks from as few as 50 demonstrations. Their Gemini Robotics-ER model achieves state-of-the-art performance on spatial understanding benchmarks, enabling reasoning about 3D environments in ways previous systems couldn’t. Generalist AI has demonstrated what they call an “intelligence threshold” at around 7 billion parameters, where models trained on over 270,000 hours of manipulation data (robots handling objects) show consistent improvement with additional scale. NVIDIA’s GR00T N1 enables humanoid robots to understand ambiguous instructions and execute complex tasks across different robot bodies.
This pattern - more data and compute producing better performance - is what researchers hoped to see. If it holds, the field has converged on an architecture that works like large language models do: learning general capabilities from diverse data rather than engineering narrow solutions. The evidence is promising, though robotics scaling laws remain less proven than their language model equivalents. The datasets are smaller, the results are newer, and much of the evidence comes from companies with incentives to be optimistic.

Not everyone is convinced VLAs are the right path. Jim Fan, who leads robotics research at NVIDIA, points out that most parameters in these models serve language and knowledge rather than physics. The vision system discards low-level details that matter for handling delicate objects.
His team is exploring approaches where robots build internal simulations of their environment and reason about physics before acting, rather than mapping pixels directly to motion. Whether this matters more than scaling VLAs with more data remains an open question.
Here is where the optimistic narrative runs into hard realities. Large language models were trained on essentially unlimited data - the entire publicly available internet. Robotics has no equivalent.
“Today’s leading AI technology, such as large language models, remain wordsmiths in the dark; eloquent but inexperienced, knowledgeable but ungrounded.”
Fei-Fei Li, CEO World Labs
There is no internet of robot actions. Training data must be painstakingly collected through simulation, human teleoperation or careful demonstration. Even the largest proprietary datasets represent orders of magnitude less data than language models consume.
The challenge runs deeper than volume. Robot training data must capture not just what movements look like, but also how objects respond to force, how materials deform under pressure, how liquids slosh and powders scatter. The diversity problem is equally severe: a robot trained to fold white towels may fail on terry cloth, and one trained in well-lit laboratories struggles in real homes.
This bootstrap problem - getting models capable enough to deploy and begin learning from real-world experience - remains the central bottleneck. As the Physical Intelligence research team described it:
“[The industry is ]in a bootstrap phase where basically anything goes. Whatever you can figure out how to add to the model, it’s good. Whether you can add sim, human videos, handheld devices, or teleoperation, it doesn’t matter. You just need to figure out some way to bootstrap yourself to the point where you can deploy.”
Enormous resources are flowing toward solving this problem and multiple approaches are showing promise.
Startups like MicroAGI and Build are focused on creating and commercialising task-specific datasets.
Simulation offers another path. World Labs, founded by Fei-Fei Li, transforms text prompts into editable 3D worlds that serve as robot training environments.
NVIDIA’s Cosmos and Omniverse work together as a pipeline to create more diverse data for training. A robot trained on 1,000 videos of a kitchen learns that specific kitchen. Cosmos Transfer takes those videos and generates thousands of variants: different lighting conditions, different surface textures, different times of day. The original trajectory stays the same, but the visual world around it shifts.
Agility Robotics is using Cosmos Transfer for large-scale synthetic data generation. 1X Technologies trains its humanoid NEO Gamma using both Cosmos Predict and Cosmos Transfer. The models have been downloaded over two million times. For robotics teams without the resources to collect millions of real-world trajectories, synthetic augmentation offers a shortcut.
The caveat is that synthetic data still struggles with contact physics. You can change how something looks, but predicting how novel materials deform under pressure remains hard. Cosmos helps with visual generalisation. It doesn’t solve manipulation generalisation.
Sunday Robotics attempts to solve this problem through their “Skill Capture Glove” that records how humans actually move through everyday tasks. The gloves cost around $400 per pair. The company has shipped over 2,000 to “Memory Developers” who perform household chores in their own homes while wearing them. To date, they’ve collected nearly 10 million trajectories of real-world manipulation data across more than 500 American homes.
But the most compelling results come from deployment itself. The ultimate goal is what researchers call the “deployment phase,” where robots are reliable enough to work autonomously in commercial settings, generating training data as a byproduct of economically valuable work.
Recent results show what this looks like. Teams have deployed robots on actual commercial tasks: building cardboard boxes, operating espresso machines and folding laundry. As the robots work, humans provide feedback, marking successes and failures, occasionally correcting the approach. This data flows back into training.

The results are impressive. Robots running for hours continuously. Throughput more than doubling compared to demonstration-only baselines. And crucially, this approach handles failures that simulation could never anticipate. When a new shipment of cardboard arrived with imperfect perforations that caused sheets to stick together, the robot learned to handle it. No simulator would have predicted that problem.
This is the flywheel that makes robotics scaling plausible. Once models are good enough to deploy, deployment generates data, data improves models, better models enable broader deployment. The cost of data collection shifts from positive (paying humans to teleoperate) to negative (robots do useful work while learning).
Software scales infinitely at near-zero marginal cost. Robots face the opposite economics: every deployment requires a physical body with motors, sensors, batteries, and structural components.
Current humanoid robots remain expensive. Agility Robotics’ Digit costs around $250,000. Tesla projects Optimus will reach $20,000-$30,000 at scale, but early production costs will be higher. Even collaborative robot arms represent significant capital expenditure for smaller manufacturers.
But the cost curve is moving, driven primarily by China.
Unitree’s $5,900 humanoid and Noetix’s $1,370 entry-level unit demonstrate price points that seemed years away. Over 150 humanoid robot companies now operate in China, creating an ecosystem where iteration happens rapidly and suppliers compete intensely on price. Government policy explicitly targets humanoid dominance, with subsidies flowing and a national plan to secure the complete innovation ecosystem.
China dominates key component manufacturing and controls approximately 90% of rare earth processing; virtually all of the heavy rare earths essential for motors. US trade restrictions designed to limit Chinese access to advanced technology have, instead, reinforced China’s push toward self-reliance.
Western robotics companies face a strategic choice: source from China and accept supply chain risk, or pay significantly more while Chinese competitors undercut on price. Neither option is comfortable. Tesla’s Optimus reportedly sources components from Chinese manufacturers rather than Japanese incumbents, suggesting even the largest Western players find the cost differential hard to ignore.
Production capacity constraints are real. Chinese manufacturers shipped roughly 13,000 humanoids worldwide in 2025 - meaningful growth, but far from mass scale. Tesla’s ambitious roadmap projects 50,000-100,000 units in 2026, scaling toward “millions” by 2027, but this requires manufacturing infrastructure that hasn’t been demonstrated.
“The winner won’t be the one who ships the first humanoid. It’ll be the one who ships the first million — and makes them smarter every day.”
Brett Adcock, Figure AI
Precision components are a constraint today, though perhaps not for long. Harmonic drives, the strain wave gears enabling smooth movement, have long been dominated by Japan’s Harmonic Drive Systems. But Chinese manufacturers are catching up fast, with over $2.5 billion invested in manufacturing expansion from 2023-2025. Companies like Leaderdrive, Laifual, and Han’s Motion Technology now offer lower-cost alternatives. Planetary roller screws remain the more acute bottleneck. Swiss specialists GSA and Rollvis hold over 50% of the global market, and Chinese import dependence sits around 80%. But here too, capacity is racing to catch up: Beite Technology alone is building a $260 million facility targeting 1 million annual units by 2026, and several other Chinese manufacturers are scaling production. The question is whether supply expansion can keep pace with demand projected to surge from thousands of humanoids to millions by decade’s end.
Battery life tells a similar story. Current humanoids run for roughly two hours on a single charge, nowhere near a full work shift. Bain & Company estimates this could reach six hours by 2030, a meaningful improvement but still not eight hours. Some manufacturers aren’t waiting for better batteries. UBTech’s Walker S2 can autonomously swap its own battery packs in under three minutes, walking to a charging station and replacing depleted cells without human help. It’s an engineering workaround rather than a breakthrough, but it works. The constraint is real; so is the progress.
Another constraint is edge inference. When a robot needs to catch a falling object or adjust its grip on something slippery, it has milliseconds to react. Sending data to a server and waiting for a response, even on fast internet, is too slow. The robot has to think on its own hardware, in real time.
This is harder than it sounds. The most capable AI models are enormous, and running them requires serious computing power. Current models manage around 5-10 decisions per second on high-end hardware. Smooth, responsive movement needs 30-50. It’s like trying to run a Hollywood film on a phone from 2015. New chips from NVIDIA and on-device models from Google DeepMind are closing this gap, but it remains a real bottleneck.
Reliability requirements vary dramatically. A robot folding laundry can fail occasionally. A robot in a surgical theatre cannot. Robotics veteran Rodney Brooks captures the challenge: “It’s got to be very, very reliable. And it’s got to work with a bunch of nines — 99.999% of the time.”
Standard Bots founder Evan Beard, in a piece co-written with Packy McCormick, said:
“If you believe, like we do, that there is a continuous spectrum of economically valuable jobs, many of which robots can do today, then the best thing to do is to get your robots in the field early and get to work.
Each deployment teaches you where you are on the gradient. Success shows you what’s stable, failure shows you where the model breaks, and both tell you exactly what to work on fixing next. You iterate. You take small steps.”
The applications that work first will be the boring ones.
Industrial manipulation in structured environments leads the way. Robots handling known objects in predictable layouts, like warehouse picking, assembly tasks, materials handling, can achieve the reliability required for commercial viability. Amazon’s partnership with Agility Robotics, BMW’s deployment of Figure’s robots, and logistics providers adopting mobile manipulation all point in this direction. The cobot (collaborative robot) market has already crossed $3.5 billion. These systems work alongside humans, handling repetitive tasks while people manage exceptions.
Semi-structured commercial environments come next. Commercial kitchens, laundromats, fulfilment centres where tasks are repetitive enough for robots to learn, failures tolerable enough to absorb and human supervisors available for edge cases. This category is closer than many expected.
Consumer home robots remain the most ambitious goal and the most distant.

Every home is a unique, unstructured environment. In my house, you’re likely to find different lights on each day, toys on the floor, shoes near the door and children running around. Many people have pets. A robot must operate flawlessly around unpredictable humans and fragile belongings. The reliability bar is higher than in factories, and the environments are infinitely more variable. Industry consensus places meaningful home deployment in the 2028-2030 window at the earliest, with mass adoption in the 2030s.
Outdoor environments such as construction sites, agricultural fields and urban streets, face similar challenges. Healthcare and personal care, where failure consequences can be severe, will expand in supervised contexts but remain distant for direct patient interaction.
Even reliable, affordable robots need buyers willing to navigate significant barriers.
The ROI calculation is harder than it looks. A robot that works 95% of the time isn’t automatically better than a human worker. Integration often costs more than the robot itself. Add training, maintenance contracts, insurance, and the political cost of workforce displacement. Large companies like Amazon and BMW can absorb these costs for strategic learning. Most mid-sized manufacturers can’t justify the risk.
Safety certification creates a gatekeeping function that few startups are prepared for. Companies deploying robots internally, Tesla using Optimus in its own factories for instance, can move faster since they’re not selling to third parties. But commercial sales trigger a web of requirements: CE marking for Europe, OSHA compliance and ANSI standards for the US, third-party testing from organisations like UL or TÜV Rheinland. Humanoid-specific standards don’t exist yet. Companies selling humanoids today must navigate a patchwork of industrial and service robot regulations never designed for their products. Most robotics startups lack the in-house expertise to manage this.
The skilled labour shortage cuts both ways. The same shortage that creates demand for robots makes it harder to find people who can deploy, maintain, and supervise them. Early adopters report that the bottleneck often isn’t the robot; it’s finding technicians who understand both mechanical systems and AI software. The technician workforce won’t develop until deployment scales. But deployment can’t scale without technicians.
Consumer demand remains assumed rather than proven. Industrial use cases have clear ROI: a robot that moves boxes faster than a human, without breaks or injuries, pays for itself in measurable ways. Consumer use cases are less clear. What problem does a home robot solve that justifies a car-sized purchase? The “robot butler” vision assumes people want household help badly enough to pay for it, learn to operate it, and trust it around their families. Smartphones succeeded because they solved universal problems everyone already knew they had. It’s not obvious that most households feel the same urgency about automating laundry.
At the start of a market trend, when we don’t know exactly how a market will be structured long term but we believe it will be large, we typically look to invest in “picks and shovels” companies—the companies that build the infrastructure others will build their robotic applications on.
Data infrastructure represents the most obvious gap. Companies building robots need ways to manage, visualise, version, and analyse massive training datasets. They need tools to identify failure modes and prioritise data collection. Robotics teams spend a disturbingly large percentage of engineering time on data processing rather than on model improvement.
When Joe Harris, founder of Alloy, described how complex it is to manage multi-modal data over time, and how critical it is to simplify data processing for continual learning, it immediately clicked. Software tools like Alloy can meaningfully accelerate robotic deployment by enabling teams to analyse data in hours, rather than days, unlocking more frequent model iterations and performance improvements.
Data infrastructure doesn’t stop at training. When a robot fails mid-task in a customer’s facility, someone needs to know. Traditional software monitoring doesn’t translate directly; robot failures are physical and contextual. The emerging category is what you might call mission and deployment operations. At the job level, this means tracking whether individual tasks completed successfully, flagging anomalies, and routing failures to the right response. At the customer level, it means dashboards showing uptime, throughput, and incident history, allowing robotics-as-a-service companies to diagnose, audit, and report against customer SLAs.
Real-world and synthetic data generation will be important as data quantity and diversity requirements increase. NVIDIA’s Isaac Sim provides foundational capabilities, but opportunities exist to create data for domain-specific environments, like hotels or hospitals.
Safety and compliance tooling will become essential as robots move into environments with untrained humans. Human and robot collaboration is only considered safe when all hazards are identified and reduced to acceptable levels. Cobot safety standards rely on risk assessments tailored to each application. Startups building systems to help robot companies navigate certification and regulatory compliance address real barriers to deployment. Valgo and Saphira are early examples of startups tackling this pain point.
There will also be opportunities for vertical-specific platforms, where a robot’s training, form factor, and certification are designed for a specific application or environment, such as logistics or laundry, allowing high performance and five-nines reliability, with swift integration and deployment. This is most likely to be in more structured commercial environments in the near term.
And finally, there will be companies that “do the work” rather than sell into existing operators, investing to build a vertically-integrated product that delivers a service more efficiently than existing incumbents, such as Anduril in defence, or Waymo in autonomous vehicles.
Not yet. But the pieces are falling into place faster than most observers expected. VLA models are showing early signs of the scaling laws that transformed language AI. Hardware costs are declining, driven by consumer electronics, Chinese competition and manufacturing scale. The data flywheel, where deployment generates training data that enables better deployment, is about to start turning.
At Airtree, the largest opportunities lie in the picks and shovels of this transition and vertical-specific platforms focused on well-structured commercial use-cases.
If you are building in this space, particularly if you are solving the practical problems that separate impressive demos from commercial deployments, we would like to hear from you.
And we also want to hear from you if you disagree with anything written here. This article is designed to help start a conversation that helps us refine our perspective. Reach out at jackie@airtree.vc.