To be successful at operating and collaborating with humans in daily life, intelligent mobile robots need to be able to learn about human movements, and gain understanding from an observed scene. This work shows how intelligent mobile robots can observe and then understand human activities as patterns and how this can be scaled for continual learning.
Turing Fellows, Professor Anthony Cohn, who has a world-wide reputation for his work in spatial and temporal representation and reasoning, and Professor David Hogg, who specialises in computer vision and machine learning, worked to understand how a robot can learn human movements by simply observing them in an office environment.
To understand human activity, it’s important to look at the relationship between humans, the objects they interact with and the environment. This work was looking to learn about human activities taking place from long-term observation of a human populated environment (an office) by an autonomous mobile robot.
A mobile robot wandered around in the School of Computing at the University of Leeds to collect this data over time and then observed the patterns in the spatial and temporal relationships between objects and the humans. For example if one person makes a coffee, the pattern would be similar for another person who makes a coffee.
“A human is detected on its RBGD camera using off–the–shelf detectors which are convolutional neural networks trained on thousands of human poses – the result is a frame-by-frame representation of a human as a set of 15 body joint locations. The qualitative spatial relations then encode how the hands of the human are interacting with nearby objects in the environment,” explains Professor Cohn. “The system then learns activity models in an unsupervised manner.”
As well as human movement, the robot also needs to be able to detect and track objects in real time, which has been a challenging problem. Detecting and tracking arbitrary objects in real time from a robotic platform poses additional difficulties. Therefore to learn the position of interesting objects within an environment, the robot first had to pre-build a 3D model of its environment.
This work enables a robot to learn about what humans do and how, contributing to its model of human intelligence. Whilst the performance of state-of-the-art robot perception is still far from human-level perception, this project demonstrates that it is possible for robots to learn consistent and meaningful patterns of detailed 3D human body pose sequences using unsupervised learning methods from multiple human observations in real-world environments.
“Projects before this have used an application of AI, but this is hard core research into artificial intelligence and computer vision applied in a robot,” says Professor David Hogg.
The mobile robot dataset has been made openly accessible (http://doi.org/10.5518/86) and open source software has been developed (qsrlib.readthedocs.org) to simplify the process of extracting spatio-temporal relationships from video data.
This work will help other human activity analysis researchers to move away from standard offline approaches where pre-processed visual data sets are used, to this solution which has been developed to generalise to real-world environments that mobile robots actually inhabit. These solutions provide exciting opportunities for the evolution of mobile robotics research in the long-term. Robots are becoming more and more widespread in everyday human-inhabited environments, rather than the traditional human-free factory settings where they have been deployed for many years; techniques such as these will mean that robots can learn about and interact with the humans they encounter in these environments.
This work was funded by the EU under the STRANDS project (600623) and formed the PhD work of Paul Duckworth who is now a postdoc at the University of Oxford, working in the robotics group of Prof Nick Hawes, who directed the STRANDS project when he was at Birmingham University.
For more information visit: