D-Cubed: Latent Diffusion Trajectory Optimisation for Dexterous Deformable Manipulation

Abstract

Mastering dexterous robotic manipulation of deformable objects is vital for overcoming the limitations of parallel grippers in real-world applications Current trajectory optimisation approaches often struggle to solve such tasks due to the large search space to be covered and the limited task information available from a cost function. In this work, we propose D-Cubed, a novel trajectory optimisation method using a latent diffusion model (LDM) trained from a task-agnostic play dataset to solve dexterous deformable object manipulation tasks. D-Cubed learns a skill-latent space that encodes short-horizon actions in the play dataset using a VAE and trains a LDM to compose the skill latents into a skill trajectory, representing a long-horizon action trajectory in the dataset. To optimise a trajectory for a target task, we introduce a novel gradient-free guided sampling method that adapts the Cross-Entropy method for the reverse diffusion process. In particular, D-Cubed samples a small number of noisy skill trajectories using the LDM for exploration and evaluates the trajectories in simulation. Then, D-Cubed selects the trajectory with the lowest cost for the subsequent reverse process. This effectively explores promising solution areas and optimises the sampled trajectories towards a target task throughout the reverse diffusion process. Through empirical evaluation on a public benchmark of dexterous deformable object manipulation tasks, we demonstrate that \ourmethod outperforms traditional trajectory optimisation and competitive baseline approaches by a significant margin. We further demonstrate that trajectories found by \ourmethod readily transfer to a real-world LEAP hand on a folding task.

Approach

We propose D-Cubed, Latent Diffusion for Trajectory Optimisation in Dexterous Deformable Manipulation. D-Cubed is a novel trajectory optimisation approach that leverages a latent diffusion model (LDM) trained on a task-agnostic play dataset of a robot hand that contains various representative hand motions, such as closing and opening the hand, and moving individual fingers.

Generating imagined trajectories for distillation.

(1) A VAE is trained to learn a skill latent representation by reconstructing a short-horizon action sequence randomly sampled from the task-agnostic play dataset. (2) A latent diffusion model (LDM) is trained to compose skills into a skill trajectory, representing a long-horizon action trajectory sampled from the dataset. The LDM, capable of generative diverse skill trajectories, effectively facilitates exploration in the large state space of dexterous deformable object manipulation tasks. (3) During trajectory optimisation, the LDM generates a small number of skill trajectories. These trajectories are evaluated in a simulator, and the best sequence, characterised by achieving the minimum cost, is selected for the subsequent reverse process.

Results

Qualitative Experiments

The averaged normalised improved the Earth-Mover distance (EMD) and standard deviation is reported for each method.

Qualitative Results

Folding

Rope

Dumpling

Flip

Bun

Wrap

BibTeX

@article{yamada2024dcubed,
  author    = {Yamada, Jun and Zhong, Shaohong and Collins, Jack and Posner, Ingmar},
  title     = {D-Cubed: Latent DiffusioN Trajectory Optimisation for Dexterous Deformable Manipulation},
  journal   = {arXiv preprint arXiv:2403.12861},
  year      = {2024},
}