Robuste Anaphernresolution: ROSANA, ROSANA-ML

Zei System zur Resolution anaphorischer Ausdrücke (Koreferenzresolution) wurden entwickelt. Software-Distributionen für die nichtkommerzielle Nutzung stehen zur Verfügung: ROSANA: robuste syntaxbasierte Anaphernresolution; manuell konfigurierte Resolutionsstrategien ROSANA-ML: robuste syntaxbasierte Anaphernresolution unter Anwendung maschinell gelernter C4.5-Entscheidungsbäume für Antezedenspräferenzentscheidungen Detaillierte Beschreibungen der Systeme sowie die Lizenzbedingungen finden sich in den entsprechenden Menue-Unterpunkten. Robust Anaphor Resolution: ROSANA, ROSANA-ML Two systems for the resolution of anaphoric expressions (coreference resolution) have been developed. Both are made available for non-commercial, non-profit research purposes: ROSANA: robust syntax-based anaphor resolution; manually designed resolution strategies ROSANA-ML: robust syntax-based anaphor resolution employing machine-learned C4.5 decision trees for antecedent preference decisions Background information …

Robuste Anaphernresolution: ROSANA, ROSANA-ML Weiterlesen

Bibliographie Anaphernresolution

Hiyan Alshawi. Memory and Context for Language Interpretation. Cambridge University Press, 1987. Chinatsu Aone and Scott William Bennett. Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies In: Proceedings of the 33rd Annual Meeting of the ACL, Santa Cruz, New Mexico, 1995, 122-129. Chinatsu Aone and Scott William Bennett. Applying Machine Learning to Anaphora Resolution In: S. Wermter and E. Riloff and G. Scheler (eds). Connectionist, statistical and symbolic approaches to learning for Natural Language Processing, Springer Verlag, Berlin, 1996, 302-314. Nicholas Asher, Hajime Wada. A Computational Account of Syntactic, Semantic and Discourse Principles for Anaphor Resolution. Journal of Semantics …

Bibliographie Anaphernresolution Weiterlesen

ROSANA

ROSANA is a system for resolving anaphors in natural language text. The acronym ROSANA stands for robust syntax-based interpretation of anaphoric expressions At current, the system handles occurrences of different types of pronouns (common pronouns, reflexives, and possessives), definite common noun phrases, and names. For narrowing down the set of antecedent candidates, the system employs restrictions of the following kinds: morphosyntactic (agreement in person, number, and gender) / lexical, syntactic (coindexing restrictions derived from Chomsky’s Government and Binding Theory), discourse (cataphoric references confined to definite NP). For selecting a candidate, a preference ranking is performed which employs the following criteria: …

ROSANA Weiterlesen

ROSANA-ML

ROSANA-ML is a system for the resolution of anaphors in natural language text based on machine-learned decision trees. The acronym ROSANA-ML stands for robust syntax-based anaphor interpretation employing machine-learned decision trees At current, the system focuses on the resolution of third person non-possessive and possessive pronouns. In implementing and evaluating ROSANA-ML, it is investigated what may be gained by employing machine-learned preference strategies as part of a robust anaphor resolution approach according to the Lappin & Leass (1994) paradigm in which the antecedent filtering strategies are manually designed. The manually crafted algorithm ROSANA is taken as the starting point. Empirical …

ROSANA-ML Weiterlesen

Supported Languages

ROSANA is available for a wide class of languages: ROSANA for English This is the original version of ROSANA, the design, implementation, and evaluation of which constitutes the central part of my Ph. D. project. ROSANA for English works on the syntactic analyses generated by the robust FDG (Functional Dependency Grammar) parser of Timo Järvinen and Pasi Tapanainen. Details regarding the design and empirical evaluation of this version of ROSANA are given in various research papers (available ffor download in the publication section). ROSANA for German The system ROSANA-Deutsch works on the syntactic analyses generated by Connexor Machinese Syntax for …

Supported Languages Weiterlesen

Methodology

In the figure displayed below, the machine learning approach to anaphor resolution followed by ROSANA-ML is outlined. It is distinguished between the training phase, which is shown in the upper part of the figure, and the application (anaphor resolution) phase sketched in the lower part of the figure.     Training Phase During the training phase, based on a training text corpus, a set of feature vectors is generated which consists of feature tuples derived from the (anaphor, antecedent candidate) pairs that are considered during the antecedent selection phase of the anaphor resolution algorithm ROSANA. This output is written to …

Methodology Weiterlesen

Results of Corpus-Based Evaluation

The empirical results given in this section pertain to the original ROSANA for English system. The results described below have been obtained by a formal evaluation of ROSANA on a collection of 66 news agency releases comprising 24712 words, 1093 sentences (on average 22.61 words/sentence), and 951 pronouns (on average 0.87 pronouns/sentence). The documents have been analyzed as they are, i.e. without any orthographical or grammatical streamlining. The collection of texts was divided into two subsets of approximately the same size, the training and the evaluation corpora, respectively.The following figures describe the performance of ROSANA on the evaluation corpus. Quality …

Results of Corpus-Based Evaluation Weiterlesen

Algorithms

Training Data Generation Step 1 – antecedent filtering -, in which different kinds of restrictions for eliminating impossible antecedents (in particular, agreement in person/number/gender and syntactic disjoint reference) are applied, is immediately taken over from the original ROSANA algorithm. In step 2, however, no salience ranking of the remaining antecedent candidates is performed. Rather, each remaining anaphor-candidate pair (A,C) is mapped to a feature vector fv(A,C), the attributes f1,…,fk of which comprise individual and relational features derived from the descriptions of the occurrences A and C. The signature of the feature vectors, i.e. the inventory of features to be taken …

Algorithms Weiterlesen

Evaluation Criteria

In the formal evaluation of ROSANA, three interpretation disciplines are distinguished: OV: Identification of occurrences („referring“ linguistic expressions): KV: Coreference class determination (similar to the MUC CO task): PS: determination of non-pronominal lexical substitutes for pronouns (i.e. common nouns or names that belong to the same coreference class as the pronoun under consideration). In the following sections, the underlying formal evaluation measures and the respective results of ROSANA are discussed in more detail. The OV Task: Identifying Occurrences of Discourse Referents The identification of linguistic expressions that specify the semantic entities the discourse is about constitutes the basic requirement for …

Evaluation Criteria Weiterlesen

Languages

Because of the currently employed syntactic analysis frontend (Timo Järvinen´s and Pasi Tapanainen´s FDG (Functional Dependency Grammar) Parser for English), the current version of ROSANA-ML processes texts in English. The core algorithm of ROSANA-ML, however, is applicable to the wide class of languages.

Languages Weiterlesen