DexSinGrasp: Learning a Unified Policy for Dexterous Object Singulation and Grasping in Densely Cluttered Environments

Abstract

Grasping objects in cluttered environments remains a fundamental yet challenging problem in robotic manipulation. While prior works have explored learning-based synergies between pushing and grasping for two-fingered grippers, few have leveraged the high degrees of freedom (DoF) in dexterous hands to perform efficient singulation for grasping in cluttered settings. In this work, we introduce DexSinGrasp, a unified policy for dexterous object singulation and grasping. DexSinGrasp enables high-dexterity object singulation to facilitate grasping, significantly improving efficiency and effectiveness in cluttered environments. We incorporate clutter arrangement curriculum learning to enhance success rates and generalization across diverse clutter conditions, while policy distillation enables a deployable vision-based grasping strategy. To evaluate our approach, we introduce a set of cluttered grasping tasks with varying object arrangements and occlusion levels. Experimental results show that our method outperforms baselines in both efficiency and grasping success rate. Experimental results show that our method outperforms baselines in both efficiency and grasping success rate, particularly in dense clutter. Codes, appendix, and videos are available on our project website.

qualitative results

Pipeline Overview

Overview of DexSinGrasp framework image
Framework of DexSinGrasp. Firstly, we employ clutter arrangement curriculum learning to progressively improve our teacher policy's performance, and acquire two teacher policies at the end of this stage for dense and random arrangement tasks, respectively. We then collect various state and action data along with pointcloud data from the trained two teachers to train a vision-based student policy via behavior cloning, which better facilitates real-world deployment.

Simulation Experiments

We evaluate our teacher and student policies and compare them with GraspReward-only and Multi-state singulation baselines. The evaluation metrics are success rates denoted by SR, and average steps denoted by AS. We denote dense and random arrangement as D-n and R-n respectively, where n is the number of obstacles.

Success Rate
Average Steps
Dense 4 objects
D-4
Dense 6 objects
D-6
Dense 8 objects
D-8
Random 4 objects
R-4
Random 6 objects
R-6
Random 8 objects
R-8

To test generalization beyond cuboid clutters, we evaluate our policy on tightly packed irregular clutters formed by slicing a cuboid with random curves. After fine-tuning on 200 cases, we test on 50 unseen ones and achieve satisfactory results compared with baselines.

Irregular Clutter 1
Irregular Clutter 2
Irregular Clutter 3

Discussion on Dexterity

We evaluate three LEAP Hand variants—Low, Mid, and Full DoF—on dense and irregular clutters. As dexterity increases from 6 to 16 DoF, grasp success improves significantly, highlighting the importance of high-DoF hands in tight clutter scenarios. Note that (F) stands for Flexion & Extension DoF, (A) stands for Abduction & Adduction DoF. One single arrow represents one DoF in the Figure.

Full DoF LEAP Hand
Mid DoF LEAP variant
Low DoF LEAP variant
Full DoF LEAP Hand
Full DoF LEAP Hand
Mid DoF LEAP variant
Mid DoF LEAP variant
Low DoF LEAP variant
Low DoF LEAP variant
Success Rate on Embodiments with different DoFs
Average Steps on Embodiments with different DoFs

Real-World Experiments

We evaluate our policy on D-8, R-8, irregular, and practical clutters with 10 real-world trials per setting, where success is defined as lifting the target by 10 cm within 40 seconds. Despite no extensive sim-to-real adaptation, the policy shows strong performance, though affected by dynamic interactions and sim-to-real discrepancies. Notably, the R-8 policy achieves 60% success on practical clutters in a zero-shot setting, demonstrating robust generalization to unseen object shapes and spatial configurations.

Dense 8 objects
The target object is placed in the center of the clutter, as encompassed by light green dashes. The first row of clutters are used for evaluation. Results are shown in the following table.
Dense 8 objects
Dense arrangement
with 8 obstacles
Dense 8 objects
Random arrangement
with 8 obstacles
Dense 8 objects
Irregular clutter
Dense 8 objects
Practical clutter
Real-World Grasping Success Rate

We tested on more diverse practical clutters as shown below.

Dense 4 objects
Dense 6 objects
Dense 8 objects
Dense 8 objects