Global-local motion transformer for unsupervised skeleton-based action learning

Boeun Kim*, Hyung Jin Chang, Jungho Kim, Jin-Young Choi

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

35 Downloads (Pure)

Abstract

We propose a new transformer model for the task of unsupervised learning of skeleton motion sequences. The existing transformer model utilized for unsupervised skeleton-based action learning is learned the instantaneous velocity of each joint from adjacent frames without global motion information. Thus, the model has difficulties in learning the attention globally over whole-body motions and temporally distant joints. In addition, person-to person interactions have not been considered in the model. To tackle the learning of whole-body motion, longrange temporal dynamics, and person-to-person interactions, we design a global and local attention mechanism, where, global body motions and local joint motions pay attention to each other. In addition, we propose a novel pretraining strategy, multi-interval pose displacement prediction, to learn both global and local attention in diverse time ranges. The proposed model successfully learns local dynamics of the joints and captures global context from the motion sequences. Our model outperforms stateof- the-art models by notable margins in the representative benchmarks. Codes are available at https://github.com/Boeun-Kim/GL-Transformer.
Original languageEnglish
Title of host publicationComputer Vision – ECCV 2022
Subtitle of host publication17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV
EditorsShai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
PublisherSpringer
Pages209–225
Number of pages17
Edition1
ISBN (Electronic)9783031197727
ISBN (Print)9783031197710
DOIs
Publication statusPublished - 28 Oct 2022
Event17th European Conference on Computer Vision (ECCV 2022) - Tel Aviv, Israel
Duration: 24 Oct 202228 Oct 2022

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume13664
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th European Conference on Computer Vision (ECCV 2022)
Abbreviated titleECCV 2022
Country/TerritoryIsrael
CityTel Aviv
Period24/10/2228/10/22

Fingerprint

Dive into the research topics of 'Global-local motion transformer for unsupervised skeleton-based action learning'. Together they form a unique fingerprint.

Cite this