Section: New Results
Interactive search and personalisation
Database denoising and multi visual queries
Participant : Sébastien Poullot.
One of IMEDIA's task inside the SCARFACE project is to introduce and develop a character retrieval system. For this purpose, we take as entries the tracking of the persons in video sequences computed by Thalès and construct a database of the profiles. A profile is a 3D frame, a bounding box that changes along the time line. Two original works have been proposed for searching in the profile database.
The first one consists in analysing features of each profile with respect to all the profiles in order to extract relevant features from it, and construct more representative databases.
The second one is to be able to search inside the database with a set of queries (pictures of the same person). An a priori work can be done on this set of queries in order to extract the relevant features (and remove the irrelevant ones). On the other side an a posteriori work can be done on late merging depending on the specificities of each sub query.
TRECVID Instance Search 2011
Before starting the developments for SCARFACE, we tested various algorithms in TRECVID 2011 INS (instance search) task. This task is close to the SCARFACE one: from a set of captures of one object, one should find its occurrences in a set of video sequences. This work has been done during the stay of Sébastien Poullot at NII (the Japan National Institute of Informatics) in July and August 2011.
The differences with SCARFACE are:
-
a high diversity in the type of the objects (people but also, places, vehicle, animals, etc),
-
the location of the object in the database is not given.
Our approach obtains good results (above the median scores of all teams) and works in a very short time (and without indexing system for speeding up the process) [15] . The choice for SCARFACE's method partially depends on these results. We still continue on the INS task in order to achieve better scores (various descriptors and various post and late fusion between sub queries).
Query generative models
Moreover, in order to enhance visual query results, we want to create some visual query generative models. It is directly linked to SCARFACE (a priori processes) and TRECVID works: given a set of images (considered as queries), we extract what gather them and what separate them in order to construct artificial relevant queries. For now we essentially work on some logo databases.
Object-based Visual Query Suggestion
Participants : Amel Hamzaoui, Pierre Letessier, Alexis Joly, Nozha Boujemaa.
After our work on the shared neighbours clustering methods in multi-sources case published in [10] , we are interested now to the case of a bipartite graph that we apply to object-based visual query suggestion using the visual words mining technique [16] . In fact, state-of-the-art visual search systems allow to retrieve efficiently small rigid objects in very large datasets. They are usually based on the query-by-window paradigm: a user selects any image region containing an object of interest and the system returns a ranked list of images that are likely to contain other instances of the query object. User's perception of these tools is however affected by the fact that many submitted queries actually return nothing or only junk results (complex non-rigid objects, higher-level visual concepts, etc..). We address the problem of suggesting only the object's queries that actually contain relevant matches in the dataset. This requires to first discover accurate object's clusters in the dataset (as an off-line process); and then to select the most relevant objects according to user's intent (as an online process). We therefore introduce a new object's instances clustering framework based on two main contributions: efficient object's seeds discovery with adaptive weighted sampling and bipartite shared-neighbours clustering. Experiments show that this new method outperforms state-of-the-art object mining and retrieval results on OxfordBuilding dataset. We finally describe two object-based visual query suggestion scenarios using the proposed framework and show examples of suggested object queries.
Interpretable Visual Models for Human Perception-based Object Retrieval
Participants : Ahmed Rebai, Alexis Joly, Nozha Boujemaa.
Understanding the results returned by automatic visual concept detectors is often a tricky task making users uncomfortable with these technologies. In this work we attempt to build humanly interpretable visual models, allowing the user to visually understand the underlying semantic. We therefore proposed a supervised multiple instance learning algorithm that selects as few as possible discriminant local features for a given object category. The method finds its roots in the lasso theory where a L1-regularization term is introduced in order to constraint the loss function, and subsequently produce sparser solutions. Efficient resolution of the lasso path is achieved through a boosting-like procedure inspired by BLasso algorithm. Quantitatively, the method achieved similar performance as current state-of-the-art, and qualitatively, it allows users to construct their own model from the original set of patches learned, thus allowing for more compound semantic queries. This work is part of the PhD of Ahmed Rebai [8] and it was published in ICMR 2011 proceedings [17] . This work was then extended to using geometrically checked feature sets rather than using single local features to describe the content of visual patches. We did show that this allows drastically reducing the number of the selected visual words while improving their interpretability. A publication was submitted to pattern recognition journal [11] .
Relevance feedback on local features : Application to plants annotations and identification
Participants : Wajih Ouertani, Michel Crucianu, Nozha Boujemaa.
As biological image databases are increasing rapidly, automated species identification based on digital data is of great interest for accelerating biodiversity assessment, researches and monitoring. In this context, our work falls within an investigation of computer vision techniques or more precisely: object recognition and content based image retrieval techniques to help botanist identifying and organizing his digital images' collections. Under believe that perception, recognition and decision are parts of human skills, this work focus on an interactive mechanism which tries to extract useful information from the user and gives him help to deal with large data amount. We adopted an explicit relevance feedback (RF) schema and we worked on extending it to deal with local intention through local features (LF) description. This mechanism helps discovering and dynamically defining new concept and interesting plant parts and feed identification ways interactively. Moreover since it relies to the content rather than labels one direct application is to fill the initially sparse annotation space with right annotations and in a reasonable time and with the introduce of one or many expertises. We recently explored and tested images local features matching involving high order features and non-rigid adaptation tentative to structure database with a patterns' discovery stage. Using those type of methods we expect to introduce a high level appearance information that tends to go beyond classical bag of features and histogram based distances at least from semantic gap and interpretation point of view. We argue our exploration way with the fact that initial search space can be exceedingly rich. By pre-structuring it we can hope to obtain a smaller search space together with more reliable inference. Also learning parts interactively with localized local features may require a lot of interaction since it requires a considerable number of examples. We experienced the design of combined machine learning and prior mining of matches which we are actually improving.