Full details of the experimental variations are given in the book article [PDF] published in 2005. The most important stage of experimental variation regards the determination of the empirically optimal set of attributes over which the decision trees are learned (i. e., the signature of the feature vectors). The considered attributes (synonymously referred to as features) are mainly based on syntactic, morphological, and surface information, thus meeting the requirements of knowledge poorness and robustness. It is distinguished between non-relational features (pertaining to individual anaphoric or candidate occurrences O) and relational features (pertaining to pairs of anaphors and antecedent candidates (A,C)). The following description covers the complete inventory of features, only subsets of which are employed in some of the experiments carried out.
The following non-relational attributes have been considered during the experiments: type(O) denotes the type of the respective occurrence O, in particular PER3/POS3 (third person non-possessive/possessive pronouns), VNOM (ordinary noun phrases), and NAME (proper names); regarding the anaphor (O = A), the choice is restricted to PER3 and POS3 in the current experiments. The feature synfun(O) describes the syntactic function of O. synlevel(O) captures a coarse notion of (non-relational) syntactic prominence , which is measured by counting the number of principal categories occurring on the path between o and the root of the respective parse fragment. Features number(O) and gender(O) capture the respective morphological characteristics of anaphor A and candidate C. Furthermore, surface context information about the three neighbours to the left and to the right of A and C is taken into account, comprising the syntactic category (syncateg(O)) and, again, the syntactic function (synfun(O)) of the respective token(s).
Four relational features are considered: dist(A,C) (sentence distance, only distinguishing between three cases: same sentence, previous sentence, two or more sentences away), dir(A,C) (whether C topologically precedes A or vice versa), synpar(A,C) (identity of syntactic function) , and syndom(A,C) (relative syntactic position of the clauses of anaphor A and candidate C if they occur in the same sentence) .