ROSANA Archive - Dr. Roland Stuckardt

ROSANA

ROSANA is a system for resolving anaphors in natural language text. The acronym ROSANA stands for robust syntax-based interpretation of anaphoric expressions At current, the system handles occurrences of different types of pronouns (common pronouns, reflexives, and possessives), definite common noun phrases, and names. For narrowing down the set of antecedent candidates, the system employs restrictions of the following kinds: morphosyntactic (agreement in person, number, and gender) / lexical, syntactic (coindexing restrictions derived from Chomsky’s Government and Binding Theory), discourse (cataphoric references confined to definite NP). For selecting a candidate, a preference ranking is performed which employs the following criteria: …

Supported Languages

ROSANA is available for a wide class of languages: ROSANA for English This is the original version of ROSANA, the design, implementation, and evaluation of which constitutes the central part of my Ph. D. project. ROSANA for English works on the syntactic analyses generated by the robust FDG (Functional Dependency Grammar) parser of Timo Järvinen and Pasi Tapanainen. Details regarding the design and empirical evaluation of this version of ROSANA are given in various research papers (available ffor download in the publication section). ROSANA for German The system ROSANA-Deutsch works on the syntactic analyses generated by Connexor Machinese Syntax for …

Results of Corpus-Based Evaluation

The empirical results given in this section pertain to the original ROSANA for English system. The results described below have been obtained by a formal evaluation of ROSANA on a collection of 66 news agency releases comprising 24712 words, 1093 sentences (on average 22.61 words/sentence), and 951 pronouns (on average 0.87 pronouns/sentence). The documents have been analyzed as they are, i.e. without any orthographical or grammatical streamlining. The collection of texts was divided into two subsets of approximately the same size, the training and the evaluation corpora, respectively.The following figures describe the performance of ROSANA on the evaluation corpus. Quality …

Evaluation Criteria

In the formal evaluation of ROSANA, three interpretation disciplines are distinguished: OV: Identification of occurrences („referring“ linguistic expressions): KV: Coreference class determination (similar to the MUC CO task): PS: determination of non-pronominal lexical substitutes for pronouns (i.e. common nouns or names that belong to the same coreference class as the pronoun under consideration). In the following sections, the underlying formal evaluation measures and the respective results of ROSANA are discussed in more detail. The OV Task: Identifying Occurrences of Discourse Referents The identification of linguistic expressions that specify the semantic entities the discourse is about constitutes the basic requirement for …

Example of Interpretation Results

A Newswire Text Here is a sample news agency release similar to the documents that have been used during the formal evaluation of the ROSANA system. The text is of slightly smaller length than the average document of the evaluation corpus, but it is perfectly suitable to illustrate the characteristics of it. A Punch is Sometimes as Good as a Speech in Taiwanese Democracy.Taiwanese legislators are finding that good oratory is not theonly skill needed to survive in Taiwan’s blooming democracy:a powerful right-hook also helps.Opposition lawmaker Huang Chao-hui was back at work Wednesdayafter being felled by a punch in the …

Implementation

ROSANA for English The core system of ROSANA has been originally implemented in Allegro Common Lisp, Version 4.3 for Linux. It further proved to work under Xanalys LispWorks 4.2.0 for Linux. It is made up of around 9,500 lines of LISP source code. A seperate software module, comprising 2556 lines of LISP code, implements the criteria for formal, corpus-based evaluation. ROSANA for German ROSANA-Deutsch, too, runs under Allegro Common Lisp for Linux as well as Xanalys LispWorks 4.2.0 for Linux. Its core system is made up of around 11,000 lines of LISP source code. Speed of Processing 165 words per …

Distribution and Documentation

Non-commercial, non-profit research licenses of ROSANA and/or ROSANA-Deutsch are available upon request. As a prerequisite for obtaining access to one of the ROSANA distributions, two hardcopies of the respective license agreement form have to be completed, signed, and mailed in. It has to be kept in mind that the current distributions of ROSANA are rather experimental. ROSANA for English Details of the licensing conditions for original ROSANA are given in the License Agreement for ROSANA-ML and ROSANA [PDF]. A first impression can be obtained by studying the document Getting Started with ROSANA [PDF], which provides details of how to run …

Application Projects

ROSANA for German is employed for coreference and anaphor resolution in the BMBF SUMMaR project of the University of Potsdam, Department of Linguistics. In this project, a robust high-quality text summarization system is developed that combines symbolic, linguistic methods with statistic approaches.

Background Information

A detailed description of ROSANA and the underlying methodology is provided in my dissertation (see Publications section) and, in particular, in the article Roland Stuckardt. Design and Enhanced Evaluation of a Robust Anaphor Resolution Algorithm. In: Computational Linguistics, Vol. 27, Number 4, Dezember 2001, 479-506. [PDF]