Overview
What Kite solves
The robotics stack is fragmented — simulation, training, data generation, deployment, and hardware integration live in separate tools. Kite merges them into one IDE so teams spend time on policies, not plumbing.
Before Kite, engineers wire together MuJoCo or Isaac for simulation, write custom scripts for synthetic data, provision cloud GPUs by hand, maintain URDFs across repos, and re-implement glue code for every new robot. A new team typically takes weeks to reach the first meaningful training run.
With Kite, you open the IDE, pick a robot, describe the environment, and run a policy. Everything underneath — sim backends, controllers, sensors, datasets, and compute — is already wired and validated. Sim-to-real compatibility is baked into the project layout rather than bolted on afterwards.
Unified workspace
Agents, models, data, simulation, and monitoring in one surface.
Sim-to-real on day one
Compatibility validated at project creation, not at deploy time.
Cloud compute
GPU and CPU provisioned per run. No rig to buy or maintain.
Zero setup
URDFs, packages, libraries, and drivers handled upfront.
New team? Start from a preset project (quadruped, arm, or humanoid). Everything — URDF, sim, datasets, and compute — is already configured so you can run a training job in minutes.
Platform
Pre-supported robots & URDF upload
Kite ships with curated URDFs for common research platforms and accepts your own robot definition with a drag-and-drop upload.
Unitree G1 (humanoid), Unitree Go2 (quadruped), Boston Dynamics Spot, and the SO-100 arm come preloaded with validated URDFs, collision meshes, actuator models, and reference controllers. Switch between them from the IDE's robot picker without modifying project code.
Drop a custom URDF (single file or zipped package with meshes) into the project. Kite parses joints, links, transmissions, and sensors; validates the kinematic chain; surfaces physical inconsistencies; and rebuilds the simulation scene automatically.
Built-in platforms
Unitree G1, Unitree Go2, Boston Dynamics Spot, and SO-100.
URDF validation
Inertia, joint limits, and mesh integrity checked on upload.
Dependency-free
Packages and libraries resolved automatically.
Sim-to-real parity
One URDF drives sim, visualization, and real hardware.
Platform
Robotics agent
A coding agent tuned for robotics — aware of URDFs, controllers, sensors, and the Kite SDK. It writes policy scaffolding, debugs training runs, and suggests fixes grounded in the current project state.
The agent reads your project files, URDF, sim config, and run logs. It proposes code edits, generates reward functions, fixes observation and action spaces, and explains why a training run is diverging — with references to the exact file and line.
A fast coding agent ships on the Free plan for single-file assistance. The Pro plan unlocks a deeper agent with multi-file refactors, longer context, and access to training-run telemetry as a first-class tool.
Project-grounded
Suggestions reference real joints, links, and sensors.
Multi-file edits
Cross-file refactors and project-wide search.
Live telemetry
Streams simulation and training logs during a run.
Models & Data
World generation — World Labs
Generate photorealistic 3D scenes from text or reference imagery using the World Labs integration, then drop them directly into your training environment.
Describe an environment — warehouse with mixed-height shelving, cluttered domestic kitchen — and Kite produces a Gaussian-splat plus mesh representation with collision geometry. Seeds are preserved so a training run can be reproduced from the prompt alone.
Vary lighting, clutter density, surface materials, and layout per episode to avoid overfitting. Each variation is deterministic given its seed, which keeps evaluation runs reproducible.
Text-to-3D
Prompt or reference image → navigable 3D scene.
Mesh + splat
Both representations available, usable as sim geometry.
Seeded variation
Deterministic randomization for reproducible eval.
Preset worlds
Warehouse, kitchen, and lab bench seeded per project.
Models & Data
Objects with SAM 3D
Segment any object from an image and turn it into a physics-ready 3D asset using the SAM 3D pipeline integrated into the IDE.
Upload an image, click the object you want, and Kite runs SAM 3D to produce a clean mesh with estimated geometry and textures. The result is tagged with a material model and ready to drop into a world.
Generated assets come with inertia estimates, collision primitives, and contact-friendly meshes so they behave predictably in MuJoCo. Batch generation is supported for data-augmentation workflows.
Click-to-lift
SAM 2 for segmentation, SAM 3D for 3D reconstruction.
Sim-ready meshes
Collision primitives auto-paired with each asset.
Material hints
Friction and material estimates attached on import.
Batch mode
Generate object libraries for randomized training.
Models & Data
Vision-Language-Action models
Frontier policies from Physical Intelligence — the π-series — are ready to train from the IDE. Kite handles tokenizers, adapters, and cloud training orchestration.
The current registry includes Physical Intelligence π0 and π0-FAST, with additional members of the π-series onboarded as they are released.
Point a training job at a Kite dataset, pick a base checkpoint, and launch. Kite manages image, state, and action normalization, PEFT adapters, and evaluation replay against the sim you trained on.
Policy registry
One-click fine-tune from the registry UI.
PEFT adapters
LoRA-style training fits on a single GPU.
Sim evaluation
Automatic policy rollout after each checkpoint.
Dataset adapters
HuggingFace, LeRobot, and custom formats supported.
Fine-tunes inherit the policy's original image and action normalization — you don't need to re-implement preprocessing to match a base checkpoint.
Models & Data
Fleet policy evaluation
Run a trained policy across many randomized cloud simulations, measure success and failure rates, and get per-episode video proof — no real robot, no babysitting.
An evaluation run rolls one policy out across N randomized sim episodes in parallel on GPU MuJoCo (MJX). Each episode is scored into a labelled outcome — success, or a failure mode (object dropped, timeout, not placed, not reached) — and the dashboard aggregates the fleet into a success rate, a failure-mode histogram, and a tile grid that flips between pass and fail as outcomes land, each with a playable rollout video.
Choose the scene from one of your projects, or from a built-in benchmark-environment catalog (e.g. SO-Arm Pick & Place, SO-Arm Reach) so you can evaluate a policy with no project setup at all. ACT, π-series, smolVLA, and reinforcement-learning policies are all supported.
Success / failure rates
Labelled outcomes aggregated into a rate plus a failure-mode histogram.
Video proof
A per-episode MP4 (and Rerun .rrd) for every rollout.
Benchmark catalog
Built-in environments to test a policy against, decoupled from projects.
Massively parallel
N environments batched into one GPU job via MJX.
Runs reconcile server-side, so a fleet closes out with its success rate even if you close the tab mid-run.
Models & Data
Dataset Augmentation Studio
Paste a HuggingFace LeRobot dataset, describe a visual change — lighting, surface colour, texture, added or removed objects — and Kite generates an expanded dataset with new videos and matching joint trajectories, ready to push back to your own HuggingFace account for the next training round.
Augmentation is a standalone, in-memory flow: import a dataset, preview a few episodes, approve, and the pipeline generates visually-varied episodes with Google's Gemini models while preserving each episode's motion. The joint/action trajectory is carried over from the source so the new episodes stay physically consistent, with an optional Claude vision pass for object-changing edits.
Nothing is stored on Kite's servers — you review the generated episodes and push the result to HuggingFace under your account using a write token you connect in Settings → Integrations. The pipeline is provider-pluggable (mock for offline dev, per-frame image editing, Veo, and Gemini Omni via the Vertex Agent Platform).
Prompt-driven variation
Lighting, surface, texture, and object changes from plain text.
Motion preserved
Joint trajectories carried from the source episodes.
LeRobot in/out
Reads and writes valid LeRobot v3.0 datasets.
Push to your Hub
Results go to your own HuggingFace account, not Kite's.
Use augmentation to cheaply grow training variety without re-teleoperating: keep the same motions, change the scene, and fine-tune on the expanded set.
Models & Data
Physics & simulation engine
MuJoCo is the primary physics backend, extended by Kimodo — Kite's in-house kinematics and motion layer — so motion capture, retargeting, and contact-rich tasks work out of the box.
MuJoCo provides articulated dynamics, soft contacts, sensors, and deterministic rollout. Kite's adapter exposes the state as structured observations and wires your robot's URDF into the scene graph automatically.
Kimodo handles mocap retargeting, joint-space motion blending, and quaternion-safe transform composition. It bridges captured or synthesized motions into the simulator without the usual coordinate-system breakage.
MuJoCo backend
Articulated bodies, soft contacts, and sensors.
Kimodo motion
Mocap retargeting with quaternion-safe transforms.
Sensor config
RGB, depth, IMU, and encoders as project state.
Scene builder
Composes robots, worlds, and objects — no manual XML.
Models & Data
Synthetic data generation
Kimodo is Kite's synthetic-data engine — a motion generation and retargeting layer that produces physically consistent episodes across any robot in the project. Pair it with domain randomization and you get training corpora at scale, without recording a single frame of real data.
Kimodo sits between your task definition and the simulator. It generates joint-space motions from high-level task primitives, retargets them onto the target robot's kinematic topology, and preserves quaternion-safe transforms across coordinate frames — so the same task runs on a quadruped, an arm, or a humanoid without re-authoring per embodiment.
Each episode produces synchronized observations, proprioception, and actions in the Kite dataset format. Lighting, textures, friction, mass, sensor noise, and camera placement are randomized per episode; seeds are logged so any sample is reproducible. Datasets export directly to HuggingFace, LeRobot, and raw parquet for downstream training.
Kimodo motion engine
Generates joint-space motions from task primitives — the core of Kite's synthetic data.
Cross-embodiment retargeting
Same episode runs on quadrupeds, arms, and humanoids without re-authoring.
Quaternion-safe transforms
Coordinate frames stay consistent across sim, mocap, and export.
Randomization + export
Per-episode domain randomization, with HuggingFace, LeRobot, and parquet outputs.
Compute
GPU integrations for cloud workloads
Run training and data-generation jobs on cloud compute without managing infrastructure. CPU processing is included in the Basic plan; on-demand GPU compute is included in the Pro plan.
Training jobs run on Google Vertex AI custom jobs and Cloud Run GPU, with the same container image used locally. Kite mints short-lived tokens per job and streams logs back into the IDE.
Basic-plan workspaces get CPU processing — enough to run simulations, generate datasets, and train small policies. Upgrade to Pro to unlock GPU compute on demand: provisioned per-run with burst capacity during training spikes, then released. No rig to buy, cool, or maintain.
CPU — Basic plan
CPU processing included at the Basic tier.
GPU — Pro plan
On-demand cloud GPUs included with Pro.
Vertex AI & Cloud Run
Same container image used locally and in the cloud.