MG-Grasp: Metric-Scale Geometric 6-DoF Grasping Framework with Sparse RGB Observations

1The Department of Electronic Engineering, Tsinghua University, Beijing 100084, China.
2The Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China.
Corresponding Author: daiyx23@mails.tsinghua.edu.cn
Overview of MG-Grasp pipeline
Real-world MG-Grasp workflow

Overview of MG-Grasp. From sparse RGB observations, our system reconstructs grasp-oriented geometry and produces reliable 6-DoF grasps.

Abstract

Single-view RGB-D grasp detection remains a common choice in 6-DoF robotic grasping systems, which typically requires a depth sensor. While RGB-only 6-DoF grasp methods has been studied recently, their inaccurate geometric representation is not directly suitable for physically reliable robotic manipulation, thereby hindering reliable grasp generation. To address these limitations, we propose MG-Grasp, a novel depth-free 6-DoF grasping framework that achieves high-quality object grasping. Leveraging two-view 3D foundation model with camera intrinsic/extrinsic, our method reconstructs metric-scale and multi-view consistent dense point clouds from sparse RGB images and generates stable 6-DoF grasp. Experiments on GraspNet-1Billion dataset and real world demonstrate that MG-Grasp achieves state-of-the-art (SOTA) grasp performance among RGB-based 6-DoF grasping methods.

Framework Overview

MG-Grasp pipeline

Method Highlights

  • Depth-free 6-DoF grasping: reliable robotic grasp generation from sparse multi-view RGB observations only.
  • Triangulation-based Scale Recovery: grounds up-to-scale two-view predictions into a shared metric coordinate system.
  • Two-stage Multi-View Refinement: enforces dense cross-view consistency using confidence-weighted 3D and 2D objectives.
  • Grasp-oriented fusion: refined geometry is filtered and decoded into stable 6-DoF grasps with a local grasp model.

6-DoF Pose Generation

Qualitative examples of generated 6-DoF grasp poses from MG-Grasp.

grasp_0100
grasp_0105
grasp_0108
grasp_0110

Benchmark Summary

Method Data Seen Similar Novel Average
PointnetGPD RGB-D 25.96/27.59 22.68/24.38 9.23/10.66 19.29/20.88
GraspNet RGB-D 27.59/29.88 26.11/27.84 10.55/11.51 21.41/23.08
TransGrasp RGB-D 39.81/35.97 29.32/29.71 13.83/11.41 27.65/25.70
HGGD RGB-D 59.36/60.26 51.20/48.59 22.17/18.43 44.24/42.43
FlexLoG RGB-D 72.81/69.44 65.21/59.01 30.04/23.67 56.02/50.67
GraspNeRF RGB 22.49/24.61 14.15/17.67 11.08/12.86 15.91/18.38
VG-Grasp RGB 59.23/54.65 36.34/35.13 10.84/11.85 35.47/33.88
MG-Grasp (Ours) RGB 63.70/66.80 56.03/57.35 23.22/20.47 47.65/48.21

Real-World Robotic Evaluation

We further validate MG-Grasp on a real robotic platform in tabletop scenes using a UR5e manipulator, a Robotiq 2F-85 adaptive gripper, and a RealSense D435i camera, while only using RGB images. Different from prior single-view real-robot evaluations, MG-Grasp uses 4 sparse RGB views per scene.

  • Success Rate: 35/40 = 87.5%
  • Completion Rate: 35/35 = 100%
  • Transparent Objects: 11/20 = 55.0% success, 11/15 = 73.3% completion

Real-World Robot Experiments

Real-robot grasping demonstrations in tabletop scenes using 4 RGB observations.

Real-world MG-Grasp workflow

Transparent Object Grasping

MG-Grasp also generalizes to transparent-object grasping scenarios in real-world experiments.

BibTeX

@misc{wang2026mggraspmetricscalegeometric6dof,
      title={MG-Grasp: Metric-Scale Geometric 6-DoF Grasping Framework with Sparse RGB Observations}, 
      author={Kangxu Wang and Siang Chen and Chenxing Jiang and Shaojie Shen and Yixiang Dai and Guijin Wang},
      year={2026},
      eprint={2603.16270},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2603.16270}, 
}