Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

Rebuttal Additions

Real World Videos

Simulation Trace Visualization

Below, we visualize the diffusion policy's outputs. Each action inference is sampled from a pseudo-random diffusion process over action sequences. The action sequence samples are visualized here as lines, where blue is the start of the action sequence while red is the end. We visualize multiple view points per trajectory.

Mailbox 1

Mailbox 2

Mailbox 3

Mailbox 4

Drawer 1

Drawer 2

Drawer 3

Catapult 1

Catapult 2

Balance 1

Balance 2

Transport 1

Transport 2

Results

Result videos below are from our distilled policy, and are sped up by 8x. The exception is the 1x speed catapult video due to fast catapult motions.

Balance

Catapult

Closest

Middle

Furthest

Furthest (1x speed, front view and block view, respectively) for visualization of block trajectory

Transport

Mailbox

Drawer