Abstract
Achieving diverse and stable dexterous grasping for general and deformable objects remains a fundamental challenge
in robotics, due to high-dimensional action spaces and uncertainty in perception. In this paper, we present D3Grasp,
a multimodal perception-guided reinforcement learning framework designed to enable Diverse and Deformable
Dexterous Grasping. We firstly introduce a unified multimodal representation that integrates visual and
tactile perception to robustly grasp common objects with diverse properties. Second, we propose an asymmetric reinforcement
learning architecture that exploits privileged information during training while preserving deployment realism, enhancing
both generalization and sample efficiency.Third, we meticulously design a training strategy to synthesize contact-rich,
penetration-free, and kinematically feasible grasps with enhanced adaptability to deformable and contact-sensitive objects.
Extensive evaluations confirm that D3Grasp delivers highly robust performance across large-scale and diverse object categories,
and substantially advances the state of the art in dexterous grasping for deformable and compliant objects, even under perceptual
uncertainty and real-world disturbances. D3Grasp achieves an average success rate of 95.1% in real-world trials—outperforming
prior methods on both rigid and deformable objects benchmarks.
Diverse Grasping
Dexterous Grasping
Deformable Grasping