Transferable Active Grasping and Real Embodied Dataset
Chen, Xiangyu, Ye, Zelin, Sun, Jiankai, Fan, Yuda, Hu, Fang, Wang, Chenxi, and Lu, Cewu

Abstract: Grasping in cluttered scenes is challenging for robot vision systems, as detection accuracy can be hindered by partial occlusion of objects. We adopt a reinforcement learning (RL) framework and 3D vision architectures to search for feasible viewpoints for grasping by the use of hand-mounted RGB-D cameras. To overcome the disadvantages of photo-realistic environment simulation, we propose a large-scale dataset called Real Embodied Dataset (RED), which includes full-viewpoint real samples on the upper hemisphere with amodal annotation and enables a simulator that has real visual feedback. Based on this dataset, a practical 3-stage transferable active grasping pipeline is developed, that is adaptive to unseen clutter scenes. In our pipeline, we propose a novel mask-guided reward to overcome the sparse reward issue in grasping and ensure category-irrelevant behavior. The grasping pipeline and its possible variants are evaluated with extensive experiments both in simulation and on a real-world UR-5 robotic arm.

Bibtex:

@misc{chen2020transferable,
  title = {Transferable Active Grasping and Real Embodied Dataset},
  author = {Chen, Xiangyu and Ye, Zelin and Sun, Jiankai and Fan, Yuda and Hu, Fang and Wang, Chenxi and Lu, Cewu},
  year = {2020},
  eprint = {2004.13358},
  archiveprefix = {arXiv},
  primaryclass = {cs.RO}
}