Section: Software
Alpage's linguistic workbench, including Sx Pipe
Participants : Benoît Sagot [correspondant] , Rosa Stern, Marion Baranes, Damien Nouvel, Virginie Mouilleron, Pierre Boullier, Éric Villemonte de La Clergerie.
See also the web page http://lingwb.gforge.inria.fr/ .
Alpage's linguistic workbench is a set of packages for corpus processing and parsing. Among these packages, the Sx Pipe package is of a particular importance.
Sx Pipe [97] is a modular and customizable chain aimed to apply to raw corpora a cascade of surface processing steps. It is used
Developed for French and for other languages, Sx Pipe includes, among others, various named entities recognition modules in raw text, a sentence segmenter and tokenizer, a spelling corrector and compound words recognizer, and an original context-free patterns recognizer, used by several specialized grammars (numbers, impersonal constructions, quotations...). In 2012, Sx Pipe has received a renewed attention in four directions:
Support of new languages, and most notably German (although this is still at a very preliminary stage of development;
Analysis of unknown words, in particular in the context of the ANR project EDyLex and of the collaboration with viavoo; this involves in particular (ii) new tools for the automatic pre-classification of unknown words (acronyms, loan words...) (ii) new morphological analysis tools, most notably automatic tools for constructional morphology (both derivational and compositional), following the results of dedicated corpus-based studies;
Development of new local grammars for detecting new types of entities, such as chemical formulae or dimensions, in the context of the PACTE project.