Anaphernresolution Archive - Seite 2 von 2

Example of Interpretation Results

A Newswire Text Here is a sample news agency release similar to the documents that have been used during the formal evaluation of the ROSANA system. The text is of slightly smaller length than the average document of the evaluation corpus, but it is perfectly suitable to illustrate the characteristics of it. A Punch is Sometimes as Good as a Speech in Taiwanese Democracy.Taiwanese legislators are finding that good oratory is not theonly skill needed to survive in Taiwan’s blooming democracy:a powerful right-hook also helps.Opposition lawmaker Huang Chao-hui was back at work Wednesdayafter being felled by a punch in the …

Experiments

Full details of the experimental variations are given in the book article [PDF] published in 2005. The most important stage of experimental variation regards the determination of the empirically optimal set of attributes over which the decision trees are learned (i. e., the signature of the feature vectors). The considered attributes (synonymously referred to as features) are mainly based on syntactic, morphological, and surface information, thus meeting the requirements of knowledge poorness and robustness. It is distinguished between non-relational features (pertaining to individual anaphoric or candidate occurrences O) and relational features (pertaining to pairs of anaphors and antecedent candidates (A,C)). …

Implementation

ROSANA for English The core system of ROSANA has been originally implemented in Allegro Common Lisp, Version 4.3 for Linux. It further proved to work under Xanalys LispWorks 4.2.0 for Linux. It is made up of around 9,500 lines of LISP source code. A seperate software module, comprising 2556 lines of LISP code, implements the criteria for formal, corpus-based evaluation. ROSANA for German ROSANA-Deutsch, too, runs under Allegro Common Lisp for Linux as well as Xanalys LispWorks 4.2.0 for Linux. Its core system is made up of around 11,000 lines of LISP source code. Speed of Processing 165 words per …

Results of Corpus-Based Evaluation

Text Corpus The training and evaluation of the ROSANA-ML system has been performed on the same corpus that has been employed for evaluating ROSANA, viz. a corpus of 66 news agency press releases, comprising 24,712 words, 1093 sentences (on average 22.61 words/sentence), 406 third-person non-possessives (PER3), and 246 third-person possessive pronouns (POS3). For cross-validation, random partitions of this corpus have been generated (see the details in the book article [PDF] published in 2005.). In all experiments, the training data generation and the application of the trained system take place under conditions of potentially noisy data, i.e. without a-priori intellectual correction …

Distribution and Documentation

Non-commercial, non-profit research licenses of ROSANA and/or ROSANA-Deutsch are available upon request. As a prerequisite for obtaining access to one of the ROSANA distributions, two hardcopies of the respective license agreement form have to be completed, signed, and mailed in. It has to be kept in mind that the current distributions of ROSANA are rather experimental. ROSANA for English Details of the licensing conditions for original ROSANA are given in the License Agreement for ROSANA-ML and ROSANA [PDF]. A first impression can be obtained by studying the document Getting Started with ROSANA [PDF], which provides details of how to run …

Implementation

The core system of ROSANA-ML has been implemented in Common Lisp and has been run under Allegro Common Lisp, Version 4.3 for Linux as well as under Xanalys Lispworks 4.2.0 for Linux. It is made up of 12,461 lines of LISP source code (including a basic graphical user interface for Xanalys Lispworks). The architecture of the system is modular, i.e. it supports the adaptation to different syntactic analysis frontends. A seperate software module, comprising 2,846 lines of LISP code, implements the formal measures for the corpus-based evaluation. At current, the core system of ROSANA-ML is neither technically coupled with the …

Application Projects

ROSANA for German is employed for coreference and anaphor resolution in the BMBF SUMMaR project of the University of Potsdam, Department of Linguistics. In this project, a robust high-quality text summarization system is developed that combines symbolic, linguistic methods with statistic approaches.

Distribution and Documentation

A non-commercial, non-profit research license of ROSANA-ML is available upon request. Details of the licensing conditions are given in the License Agreement for ROSANA-ML and ROSANA [PDF]. As a prerequisite for obtaining access to the ROSANA-ML distribution, two hardcopies of this form have to be completed, signed, and mailed in. It has to be kept in mind that the current distribution of ROSANA-ML is rather experimental. A first impression can be gained by studying the document Getting Started with ROSANA-ML [PDF], which provides details of how to run and test ROSANA-ML on the corpus included in the distribution.

Background Information

A detailed description of ROSANA and the underlying methodology is provided in my dissertation (see Publications section) and, in particular, in the article Roland Stuckardt. Design and Enhanced Evaluation of a Robust Anaphor Resolution Algorithm. In: Computational Linguistics, Vol. 27, Number 4, Dezember 2001, 479-506. [PDF]

Background Information

A detailed description of ROSANA-ML and the underlying methodology is provided in the article Roland Stuckardt. A Machine Learning Approach to Preference Strategies for Anaphor Resolution. In: António Branco, Tony McEnery, Ruslan Mitkov (Eds.), Anaphora Processing: Linguistic, Cognitive, and Computational Modelling. John Benjamins, January 2005. [PDF]