ROSANA-ML is a system for the resolution of anaphors in natural language text based on machine-learned decision trees. The acronym ROSANA-ML stands for
robust syntax-based anaphor interpretation employing machine-learned decision trees
At current, the system focuses on the resolution of third person non-possessive and possessive pronouns.
In implementing and evaluating ROSANA-ML, it is investigated what may be gained by employing machine-learned preference strategies as part of a robust anaphor resolution approach according to the Lappin & Leass (1994) paradigm in which the antecedent filtering strategies are manually designed. The manually crafted algorithm ROSANA is taken as the starting point. Empirical studies have shown that, for achieving optimal interpretation results, the antecedent preference strategies, which come as sets of weighted salience factors, should be designed genre-specifically, since text genres seem to differ with respect to the characteristic properties of their typical coherence structures. Hence, there is no once-for-all optimal design of preference heuristics. Consequently, antecedent preference strategies are ideal targets for applying machine learning techniques.
Thus, it is explored what may be gained by combining the best of two worlds: designing the universally valid antecedent filtering strategies manually – once and for all -, and deriving the genre-specific antecedent preference strategies automatically by applying machine learning techniques. An anaphor resolution system ROSANA-ML, which follows this paradigm, is designed and implemented.