Technische Universität München Robotics and Embedded Systems


Machine Recognition of 3D Human Action



Machine recognition of human actions constitutes an active research field due to its various applications in the field of robotics, automotive industry, video surveillance and human-machine interaction. The majority of work conducted in this area involves the use of 2D videos, despite the inherent problems due to pose and illumination variations. The recent development of depth sensors has created new opportunities to deal with these problems and advance this field. In this paper we survey the recent advances in 3D human action recognition. We present currently available 3D action datasets suitable for 3D action analysis, and discuss the novelties of recent research in 3D video representations and action classification process. Moreover, we address the limitations of the state-of-the-art and point out the challenges and promising directions of future research.

Brief Introduction

We consider the task of recognizing human motions with action classes in 3D data (depth video data). The interest in the topic is motivated by the increasing popularity of novel 3D sensing devices. The low-cost 3D sensors mark the arrival of a step change for computer systems capability. It can be regarded as a big step to the ultimate goal of human computing - a shift in computing from the desktop computers to a multiplicity of smart computing devices diffused into our environment.

During the past decades, machine recognition of 2D human actions received significant attention of human computing tasks, as human actions are the most natural means for humans to regulate interactions with the environment and other subjects. Research has mainly focused on recognizing actions from 2D video data taken by visible light cameras. Numerous 2D vision datasets are created to promote the development of human action recognition. Several surveys exist in the area of 2D action recognition \cite{rev-agg,rev-tur,rev-1999,rev-2007,rev-2006,rev-tan}. However, there is no comprehensive study within the area of 3D action recognition. Although 2D action recognition has several inherent connections with 3D action recognition, they differ in many points. First, 2D video data has intrinsic limitations (e.g., it is sensitive to illumination changes). Second, single 2D sensor (the visible light camera) can not provide 3D structural information of the environment, which offers discerning information to recover postures and recognize human actions. Alternatively, 3D video data reflect pure geometry and shape cues and provide depth information of the environment. Moreover, 3D sensors (depth sensors) have facilitated a human motion capturing technique \cite{skele} that outputs the 3D joint positions of the human skeleton (See Fig. \ref{fig:skeshow}). In addition, more and more 3D datasets dedicated to human action recognition have been created. Due to these reasons, the survey introduced in this paper attempts the first time to cover the lack of a complete description of 3D human action recognition.