Results of Corpus-Based Evaluation

The empirical results given in this section pertain to the original ROSANA for English system.

The results described below have been obtained by a formal evaluation of ROSANA on a collection of 66 news agency releases comprising 24712 words, 1093 sentences (on average 22.61 words/sentence), and 951 pronouns (on average 0.87 pronouns/sentence). The documents have been analyzed as they are, i.e. without any orthographical or grammatical streamlining. The collection of texts was divided into two subsets of approximately the same size, the training and the evaluation corpora, respectively.The following figures describe the performance of ROSANA on the evaluation corpus.

Quality of anaphor interpretation results

Three different evaluation disciplines have been considered. The results are described by figures that distinguish between precision (P) and recall (R).

  • Identification of occurrences („referring“ linguistic expressions): (P,R) = (0.94,0.96)
  • Coreference class determination (similar to the MUC CO task): (P,R) = (0.81,0.68)
  • determination of non-pronominal lexical substitutes for pronouns (i.e. common nouns or names that belong to the same coreference class as the pronoun under consideration): (P.R) = (0.7,0.65

A more fine-grained description of the results and the underlying evaluation methodology and an example walk-through of the processing of a text drawn from the news agency release corpus follow below.

