Robotics recovery data

Human-produced failure and recovery data for soft-object manipulation.

Low-Likelihood Data is building a rights-cleared dataset and hidden benchmark around the messy manipulation episodes that separate demos from usable robotic systems.

View example schema

01 / What we are building

Rights-cleared recovery episodes for hard manipulation cases.

Low-Likelihood Data is an early-stage robotics data company focused on human-produced failure and recovery data for deformable object manipulation.

We are starting with the cases that are easy for people but still difficult for robots: towels that collapse, bags that will not open, soft packages that slip, layers that are hard to separate, and objects hidden inside soft clutter.

The product is a rights-cleared dataset and hidden benchmark for these recovery episodes. It is designed for robotics foundation-model teams, VLA labs, humanoid companies, simulation teams, and physical-AI data vendors.

02 / Motivation

Most manipulation data overrepresents clean success.

Robots need to know what to do when the object does not behave cleanly. Soft and deformable objects create common failure cases:

The useful corner or edge is hidden
The wrong layer is selected
A bag, liner, or pouch collapses
Two objects are picked together
An object slips, folds, or twists during grasping
A target is covered by cloth or soft packaging
The opening, seam, label, or grasp point faces the wrong way

We are choosing cases like this because recovery is often the difference between a demo and a usable manipulation system. A model does not only need examples of successful handling, but also structured examples of failure detection, state correction, regrasping, and task decomposition.

03 / What the dataset contains

Each episode has a before state, ambiguity point, recovery action, and after state.

Initial task families:

Towel, T-shirt, and pillowcase corner finding
Layer separation after wrong-layer selection
Bad-fold recovery and refolding
Mixed soft-object retrieval from bins
Bag, liner, pouch, and polybag opening
Soft-package retrieval from clutter
Double-pick separation
Regrasping after slippage, bunching, twisting, or collapse

Initial capture stack:

Head-mounted egocentric video
Wrist or forearm video
Static side-view video
Before, failure/recovery, and after keyframes
Timestamps
Task and object metadata

Optional later additions include depth, hand-pose extraction, rough masks, object boxes, and buyer-specific capture variants.

04 / Annotation

A compact structure for failures, causes, recovery actions, and outcomes.

Each accepted episode can include:

Task family
Object family
Material state
Initial state
Subgoal
Failure event
Failure cause
Recovery action
Hand-object contact phase
Keyframes
Affordance tags
Outcome
Quality grade
Benchmark split

05 / Example schema

One episode, structured for model analysis and benchmark use.

Here is an example of what an episode might look like:

{
  "episode_id": "LLD_000001",
  "episode_metadata": {
    "task_family": "corner_finding",
    "object_family": "towel",
    "environment_type": "soft_object_bin",
    "modalities": ["head_view", "wrist_view", "static_side_view"],
    "rights_status": "cleared"
  },
  "state": {
    "initial_state_tags": ["twisted", "partly_occluded", "mixed_bin"],
    "initial_state_caption": "Towel partly twisted in mixed soft-object bin"
  },
  "events": [
    {
      "event_type": "failure",
      "label": "wrong_layer_grasp",
      "start_time": 4.12,
      "end_time": 5.03,
      "cause_tags": ["layer_confusion", "soft_collapse"]
    },
    {
      "event_type": "recovery",
      "labels": ["stabilize_material", "expose_corner", "change_grasp_angle"],
      "start_time": 5.04,
      "end_time": 9.80
    }
  ],
  "affordances": ["corner", "edge", "fold", "hidden_by_other_cloth"],
  "outcome": "success",
  "quality": {
    "grade": "A",
    "review_status": "accepted"
  }
}

06 / Intended use

Not a replacement for robot-native data. A focused complement.

Our dataset is not meant to replace robot-native data. It does not contain the robot's own actions, force signals, tactile data, proprioception, or deployment traces.

It is meant to complement those sources. Likely uses include:

Failure-mode evaluation
Hidden benchmark and regression testing
VLA post-training support
Representation learning
Task decomposition
Affordance and state-transition learning
Retrieval data for model analysis
Simulation scenario generation
Internal annotation schema design

07 / About the company

Low-Likelihood Data is a Finnish startup currently in discovery.

We are validating whether robotics teams want a focused recovery-data and benchmark product before building a large dataset.

The founding team combines machine-learning, operations, and commercial experience. Our current focus is narrow by design: build a small, high-signal specimen. Test it with technical buyers, and only scale production if buyers confirm that the data is valuable.

Instead of building a generic chore-video marketplace, we aim to build a reusable data asset around one hard problem: how agents recover when soft-object manipulation goes wrong.

ellis@lowlikelihooddata.com