ROSANA is a system for resolving anaphors in natural language text. The acronym ROSANA stands for
robust syntax-based interpretation of anaphoric expressions
At current, the system handles occurrences of different types of pronouns (common pronouns, reflexives, and possessives), definite common noun phrases, and names. For narrowing down the set of antecedent candidates, the system employs restrictions of the following kinds:
- morphosyntactic (agreement in person, number, and gender) / lexical,
- syntactic (coindexing restrictions derived from Chomsky’s Government and Binding Theory),
- discourse (cataphoric references confined to definite NP).
For selecting a candidate, a preference ranking is performed which employs the following criteria:
- inertia of thematic or syntactic role,
- hierarchy of syntactic functions / subject preference,
- cataphor penalty,
As input, the system relies on descriptions of discourse referent occurrences (i.e. „referring“ linguistic expressions) which comprise the information that is necessary for the application of the above constraints and preferences (morphological features, lemma, role; pointer to the location in the syntax tree).
Since the system applies binding restrictions which work on phrase structure, a submodule (the surface structure tree constructor) is included which maps dependency trees to a phrase structure representation which exhibits the necessary structural diversity for binding constraint application. This enables ROSANA to work on dependency syntactic descriptions.
The implementation of the configurational coindexing restrictions emerging from Binding Theory embodies the heart of the system. An adequate implementation strategy is followed that is sensitive to decision interdependencies between individual antecedent decesions.In this way, a theory adequate implementation of the syntactic coindexing constraints is achieved which avoids the exponential time complexity of Chomsky´s original suggestion, the free indexing rule.
The central issue that is addressed by ROSANA is the robust processing of so-called fragmentary surface-syntactic descriptions, which are resulting because of limitations of the processing resources or due to deficient (i.e. ungrammatical or unorthographical) textual input. In particular, the techniques developed for ROSANA address the central problem of handling fragmentary syntax trees resulting from structural ambiguity (PP, relative clause, and adverbial clause attachment) during the verification of the binding theoretic constraints which, in their original statement, rely on the availability of a unique, complete phrase structure tree for each analyzed sentence.A specific rule base enables ROSANA to verify the syntactic coindexing restrictions non-heuristically even in a large class of cases of non-trivial syntactic fragmentation.
Further robustness techniques are applied at various other stages of processing.
At present, ROSANA has been configured to work on the potentially fragmentary syntactic descriptions which are produced by Connexor Machinese Syntax and by its ancestor system, viz. Timo Järvinen´s and Pasi Tapanainen´s FDG (Functional Dependency Grammar) Parser for English. Since these parsers fulfil all conditions of robust text processing, the overall system falls in the small class of linguistic text analysis software suitable for the robust processing of application-relevant bulk data.