Section: Scientific Foundations
Scene Understanding for Activity Recognition
Participants : Guillaume Charpiat, François Brémond, Sabine Moisan, Monique Thonnat.
Introduction
Our goal is to design a framework for the easy generation of autonomous and effective scene understanding systems for activity recognition. Scene understanding is a complex process where information is abstracted through four levels: signal (e.g. pixel, sound), perceptual features, physical objects and events. The signal level is characterized by strong noise, ambiguous, corrupted and missing data. Thus to reach a semantic abstraction level, models and invariants are the crucial points. A still open issue consists in determining whether these models and invariants are given a priori or are learned. The whole challenge consists in organizing all this knowledge in order to capitalize experience, share it with others and update it along with experimentation. More precisely we work in the following research axes: perception (how to extract perceptual features from signal), understanding (how to recognize a priori models of physical object activities from perceptual features) and learning (how to learn models for activity recognition).
Perception for Activity Recognition
We are proposing computer vision techniques for physical object detection and control techniques for supervision of a library of video processing programs.
First for the real time detection of physical objects from perceptual features, we design methods either by adapting existing algorithms or proposing new ones. In particular, we work on information fusion to handle perceptual features coming from various sensors (several cameras covering a large scale area or heterogeneous sensors capturing more or less precise and rich information). Also to guarantee the long-term coherence of tracked objects, we are adding a reasoning layer to a classical Bayesian framework, modeling the uncertainty of the tracked objects. This reasoning layer is taking into account the a priori knowledge of the scene for outlier elimination and long term coherency checking. Moreover we are working on providing fine and accurate models for human shape and gesture, extending the work we have done on human posture recognition matching 3D models and 2D silhouettes. We are also working on gesture recognition based on 2D feature point tracking and clustering.
A second research direction is to manage a library of video processing programs. We are building a perception library by selecting robust algorithms for feature extraction, by insuring they work efficiently with real time constraints and by formalizing their conditions of use within a program supervision model. In the case of video cameras, at least two problems are still open: robust image segmentation and meaningful feature extraction. For these issues, we are developing new learning techniques.
Understanding For Activity Recognition
A second research axis is to recognize subjective activities of physical objects (i.e. human beings, animals, vehicles) based on a priori models and the objective perceptual measures (e.g. robust and coherent object tracks).
To reach this goal, we have defined original activity recognition algorithms and activity models. Activity recognition algorithms include the computation of spatio-temporal relationships between physical objects. All the possible relationships may correspond to activities of interest and all have to be explored in an efficient way. The variety of these activities, generally called video events, is huge and depends on their spatial and temporal granularity, on the number of physical objects involved in the events, and on the event complexity (number of components constituting the event).
Concerning the modeling of activities, we are working towards two directions: the uncertainty management for expressing probability distributions and knowledge acquisition facilities based on ontological engineering techniques. For the first direction, we are investigating classical statistical techniques and logical approaches. For example, we have built a language for video event modeling and a visual concept ontology (including color, texture and spatial concepts) to be extended with temporal concepts (motion, trajectories, events ...) and other perceptual concepts (physiological sensor concepts ...).
Learning for Activity Recognition
Given the difficulty of building an activity recognition system with a priori knowledge for a new application, we study how machine learning techniques can automate building or completing models at the perception level and at the understanding level.
At the perception level, to improve image segmentation, we are using program supervision techniques combined with learning techniques. For instance, given an image sampling set associated with ground truth data (manual region boundaries and semantic labels), an evaluation metric together with an optimization scheme (e.g. simplex algorithm or genetic algorithm) are applied to select an image segmentation method and to tune image segmentation parameters. Another example, for handling illumination changes, consists in clustering techniques applied to intensity histograms to learn the different classes of illumination context for dynamic parameter setting.
At the understanding level, we are learning primitive event detectors. This can be done for example by learning visual concept detectors using SVMs (Support Vector Machines) with perceptual feature samples. An open question is how far can we go in weakly supervised learning for each type of perceptual concept (i.e. leveraging the human annotation task). A second direction is the learning of typical composite event models for frequent activities using trajectory clustering or data mining techniques. We name composite event a particular combination of several primitive events.
Coupling learning techniques with a priori knowledge techniques is promising to recognize meaningful semantic activities.
The new proposed techniques for activity recognition systems (first research axis) are then contributing to specify the needs for new software architectures (second research axis).