TY - GEN
T1 - FS-Net
T2 - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition
AU - Chen, Wei
AU - Jia, Xi
AU - Chang, Hyung Jin
AU - Duan, Jinming
AU - Shen, Linlin
AU - Leonardis, Ales
PY - 2021/11/2
Y1 - 2021/11/2
N2 - In this paper, we focus on category-level 6D pose and size estimation from a monocular RGB-D image. Previous methods suffer from inefficient category-level pose feature extraction, which leads to low accuracy and inference speed. To tackle this problem, we propose a fast shape based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation. First, we design an orientation aware auto encoder with 3D graph convolution for latent feature extraction. Thanks to the shift and scale invariance properties of 3D graph convolution, the learned latent feature is insensitive to point shift and object size. Then, to efficiently decode category-level rotation information from the latent feature, we propose a novel decoupled rotation mechanism that employs two decoders to complementarily access the rotation information. For translation and size, we estimate them by two residuals: the difference between the mean of object points and ground truth translation, and the difference between the mean size of the category and ground truth size, respectively. Finally, to increase the generalization ability of the FS-Net, we propose an online box-cage based 3D deformation mechanism to augment the training data. Extensive experiments on two benchmark datasets show that the proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation. Especially in category-level pose estimation, without extra synthetic data, our method outperforms existing methods by 6.3% on the NOCS-REAL dataset.
AB - In this paper, we focus on category-level 6D pose and size estimation from a monocular RGB-D image. Previous methods suffer from inefficient category-level pose feature extraction, which leads to low accuracy and inference speed. To tackle this problem, we propose a fast shape based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation. First, we design an orientation aware auto encoder with 3D graph convolution for latent feature extraction. Thanks to the shift and scale invariance properties of 3D graph convolution, the learned latent feature is insensitive to point shift and object size. Then, to efficiently decode category-level rotation information from the latent feature, we propose a novel decoupled rotation mechanism that employs two decoders to complementarily access the rotation information. For translation and size, we estimate them by two residuals: the difference between the mean of object points and ground truth translation, and the difference between the mean size of the category and ground truth size, respectively. Finally, to increase the generalization ability of the FS-Net, we propose an online box-cage based 3D deformation mechanism to augment the training data. Extensive experiments on two benchmark datasets show that the proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation. Especially in category-level pose estimation, without extra synthetic data, our method outperforms existing methods by 6.3% on the NOCS-REAL dataset.
KW - measurement
KW - training
KW - solid modeling
KW - three-dimensional displays
KW - convolution
KW - pose estimation
KW - training data
UR - https://ieeexplore.ieee.org/xpl/conhome/1000147/all-proceedings
U2 - 10.1109/CVPR46437.2021.00163
DO - 10.1109/CVPR46437.2021.00163
M3 - Conference contribution
SN - 9781665445108
T3 - Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 1581
EP - 1590
BT - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
PB - IEEE
Y2 - 20 June 2021 through 25 June 2021
ER -