Imitation Learning
Imitation learning (also called "learning from demonstration") is a training paradigm where a robot learns to perform tasks by observing and replicating human demonstrations, rather than through hand-coded programs or reinforcement learning reward signals. A human shows the robot what to do — via teleoperation, kinesthetic teaching (physically guiding the robot's arm), or video demonstration — and the robot learns a policy that generalizes from those examples to new situations. Imitation learning is the primary data source for the 2026 generation of VLA models and humanoid robot foundation models.
Methods
Behavioral cloning is the simplest approach: treat demonstrations as supervised learning data, mapping observations to actions. It works well when sufficient diverse demonstrations are available, but suffers from compounding errors — small deviations from the demonstrated trajectory can cascade into states the model has never seen. DAgger (Dataset Aggregation) addresses this by iteratively collecting new demonstrations in states the robot actually visits, correcting the distribution mismatch. More sophisticated approaches include inverse reinforcement learning (inferring the reward function the human was implicitly optimizing), generative adversarial imitation learning (training a policy that a discriminator can't distinguish from human demonstrations), and diffusion policies (modeling the demonstration distribution as a denoising diffusion process).
Data Scaling Laws
A landmark 2025–2026 finding is that robotic imitation learning follows power-law scaling similar to language models, but with a crucial difference: environment and object diversity matters far more than raw volume. Research demonstrated that a policy's generalization to new objects and environments scales as a power law with the number of unique training environments and objects. Once demonstrations per environment reach a threshold (roughly 50), additional repetitions in the same setting yield minimal improvement. The practical implication: four data collectors working a single afternoon across 32 environments can produce policies that achieve 90% success in entirely novel settings. This makes data collection tractable at commercial scale.
Data Sources
The dominant data collection method is teleoperation: a human operator wears VR controllers or an exoskeleton and controls the robot in real time while the system records observations and actions. Figure AI's Helix model was trained on 500+ hours of teleoperated data. But teleoperation is labor-intensive and scales linearly with human hours.
Alternatives are emerging. Video learning extracts manipulation skills from internet videos of humans performing tasks — the data is virtually unlimited (YouTube alone has billions of hours of humans doing things with their hands), but the mapping from third-person human video to first-person robot actions is nontrivial. Synthetic demonstrations from simulation bypass human labor entirely: NVIDIA's Cosmos generated 780,000 synthetic trajectories in 11 hours, equivalent to nine months of human demonstrations. The likely endgame is a mixture: human demonstrations for seed data, simulation for massive augmentation, and internet video for concept grounding.
Relationship to Reinforcement Learning
Imitation learning and reinforcement learning are complementary, not competing. Imitation learning provides a strong starting policy ("do roughly what the human did"), and RL fine-tunes it for performance ("now do it better than the human"). This combination — imitation for initialization, RL for optimization — is the standard training recipe for most 2026 robot foundation models. Pure RL from scratch is sample-inefficient for complex manipulation; pure imitation hits a ceiling at demonstrator skill level. The combination transcends both limits.
Further Reading
- Data Scaling Laws in Imitation Learning for Robotic Manipulation — ICLR 2026
- The State of AI Agents in 2026 — Jon Radoff