Rhys Newbury | Generative Attention Learning: a “GenerAL” framework for high-performance multi-fingered grasping in clutter

Abstract: Generative Attention Learning (GenerAL) is a framework for high-DOF multi-fingered grasping that is not only robust to dense clutter and novel objects but also effective with a variety of different parallel-jaw and multi-fingered robot hands. This framework introduces a novel attention mechanism that substantially improves the grasp success rate in clutter. Its generative nature allows the learning of full-DOF grasps with flexible end-effector positions and orientations, as well as all finger joint angles of the hand. Trained purely in simulation, this framework skillfully closes the sim-to-real gap. To close the visual sim-to-real gap, this framework uses a single depth image as input. To close the dynamics sim-to-real gap, this framework circumvents continuous motor control with a direct mapping from pixel to Cartesian space inferred from the same depth image. Finally, this framework demonstrates inter-robot generality by achieving over 92% real-world grasp success rates in cluttered scenes with novel objects using two multi-fingered robotic hand-arm systems with different degrees of freedom.

@article{wu2020generative,
  title = {Generative Attention Learning: a “GenerAL” framework for high-performance multi-fingered grasping in clutter},
  author = {Wu, Bohan and Akinola, Iretiayo and Gupta, Abhi and Xu, Feng and Varley, Jacob and Watkins-Valls, David and Allen, Peter K},
  journal = {Autonomous Robots},
  pages = {1--20},
  year = {2020},
  publisher = {Springer}
}