Conference on Robot Learning (CoRL), 2023
We introduce RotateIt, a system that enables fingertip-based object rotation along multiple axes by leveraging multimodal sensory inputs. Our system is trained in simulation, where it has access to ground-truth object shapes and physical properties. Then we distill it to operate on realistic yet noisy simulated visuotactile and proprioceptive sensory inputs. These multimodal inputs are fused via a visuotactile transformer, enabling online inference of object shapes and physical properties during deployment. We show significant performance improvements over prior methods and highlight the importance of visual and tactile sensing.
We show interactive visualization for rotation over multiple axes. Try to use your mouse to control the viewing angle!
We plot the relative improvements on varies objects shape for x-axis rotation. We find that point-cloud gives the largest improvement on objects with non-uniform w/d/h (width/depth/height) ratios and objects with irregular shapes such as the bunny and light bulb. The improvements on regular objects are smaller but still over 40%.
Similar to what we find in the oracle policy training, we observe the visuotactile policy has larger improvements on irregular and non-uniform objects.
We show that not using point cloud results in a 22% decrease in generalization gap while using point-cloud can improve it to only 8% drop. Visuotactile information are critical for OOD generalization. Using proprioception only will lead to a 41% performance drop while using vision and touch can improve it to 15% drop.
@inproceedings{qi2023general, author={Qi, Haozhi and Yi, Brent and Suresh, Sudharshan and Lambeta, Mike and Ma, Yi and Calandra, Roberto and Malik, Jitendra}, title={{General In-Hand Object Rotation with Vision and Touch}}, booktitle={Conference on Robot Learning (CoRL)}, year={2023} }
The interactive visualization and mesh visualization in paper are created by Viser.
This research was supported as a BAIR Open Research Common Project with Meta. In their academic roles at UC Berkeley, Haozhi Qi and Jitendra Malik are supported in part by DARPA Machine Common Sense (MCS), Brent Yi is supported by the NSF Graduate Research Fellowship Program under Grant DGE 2146752, and Haozhi Qi, Brent Yi, and Yi Ma are partially supported by ONR N00014-22-1-2102 and the InnoHK HKCRC grant. Roberto Calandra is funded by the German Research Foundation (DFG, Deutsche Forschungsgemeinschaft) as part of Germany’s Excellence Strategy – EXC 2050/1 – Project ID 390696704 – Cluster of Excellence “Centre for Tactile Internet with Human-in-the-Loop” (CeTI) of Technische Universität Dresden. We thank Shubham Goel, Eric Wallace, and Angjoo Kanazawa, Raunaq Bhirangi for their feedback. We thank Austin Wang and Tingfan Wu for their help on hardware. We thank Xinru Yang for her help on real-world videos.