A Newswire Text
Here is a sample news agency release similar to the documents that have been used during the formal evaluation of the ROSANA system. The text is of slightly smaller length than the average document of the evaluation corpus, but it is perfectly suitable to illustrate the characteristics of it.
A Punch is Sometimes as Good as a Speech in Taiwanese Democracy.
Taiwanese legislators are finding that good oratory is not the
only skill needed to survive in Taiwan's blooming democracy:
a powerful right-hook also helps.
Opposition lawmaker Huang Chao-hui was back at work Wednesday
after being felled by a punch in the latest of the legislative
brawls that have marked the island's transition from virtual
dictatorship to democracy since 1987.
Huang, of the Democratic Progressive Party, had sought hospital
treatment for suspected concussion Tuesday after Lin Ming-yi of
the ruling Nationalists punched him during a debate.
Lin said he could not stomach Huang's taunts that Nationalist
lawmakers have recently started attending more legislative
sessions only to try to ensure victory in elections later
this year.
Lin later apologized for his violent outburst, and Huang was
released from a hospital with bruises on his head.
On Monday, more than 10 lawmakers traded punches during a brawl
started when opposition New Party legislator Ju Gau-jeng,
nicknamed ``Rambo'' by Taiwanese newspapers, jumped onto the
legislative speaker's desk.
Tensions are running high in the legislature because lawmakers
are debating a bill to govern a presidential election next March.
The vote is seen as a milestone in Taiwan's march to democracy
because it will mark the first time that the island's president
is elected by universal suffrage.
Interpretation Results for the OV Task
The alignment of the discourse referent occurrences identified by ROSANA with the discourse referent occurrences annotated in the key yields the following result:
ZUORDNUNG OKKURRENZEN:
- NUR ANA:
25866: good
26041: opposition
- NUR KEY:
25911: work
25921: latest
This means that there are merely two precision errors (non-occurrences wrongly suggested by the anaphor resolution system) and two recall errors (true occurrences that haven’t been identified by the anaphor resolution system). All errors can be traced back to miscategorizations during the morphoogical and syntactical pre-analysis. E.g. the first precision error 25866: good
is due to an incorrect assignment of part-of-speech (interpreting the word good
as noun instead of as an adverb). Similarly, the recall error 25911: work
is due to an incorrect parsing decision to interpret the word work
as non-head of a compound NP, and, hence, as a non-occurrence.
Since 70 occurrences have been identified in accordance with the key annotations, precision and recall figures of 70/(70+2) = 0.9722
are obtained:
EVALUATIONSERGEBNIS OKKURRENZEN:
- NUR ANA: 2
- NUR KEY: 2
- ANA UND KEY: 70
=> PRECISION: 0.9722
=> RECALL: 0.9722
Interpretation Results for the KV Task
The following table shows the individual antecedent decisions that have been met by ROSANA:
A.R.-ENTSCHEIDUNGEN
++ 2 25894: democracy ---> 1 25872: democracy
++ 3 25918: punch ---> 1 25862: punch
++ 3 25926: that ---> 3 25925: brawl
++ 3 25936: democracy ---> 2 25894: democracy
++ 4 25966: him ---> 4 25942: huang
++ 5 25975: he ---> 5 25973: lin
++ 5 25979: huang ---> 4 25942: huang
++ 6 26005: lin ---> 5 25973: lin
++ 6 26009: his ---> 6 26005: lin
++ 6 26014: huang ---> 5 25979: huang
+- 6 26023: his ---> 6 26009: his
++ 9 26095: taiwan ---> 2 25892: taiwan
+- 9 26096: march ---> 8 26084: march
++ 9 26098: democracy ---> 3 25936: democracy
++ 9 26100: it ---> 9 26088: vote
++ 9 26108: island ---> 3 25930: island
The decisions are classified according to their correctness (first column: ++
= correct, +-
= incorrect). There are merely two incorrect decisions plus an additional one (not shown in the table) that is due to a syntactical misinterpretation of two NP as standing in an appositional relationship, which wrongly has been interpreted as an inductor of a coreference relation decision. Consequently, there are three precision errors in the KV task of identifying sets of cospecifying occurrences. In the equivalence class centered computation of the evaluation measures, these errors show up as partitions of ROSANA generated classes by key classes:
----------------------------
25942: huang
25966: him
25979: huang
26014: huang
------------------
25948: party
----------------------------
1 / 4
----------------------------
25973: lin
25975: he
26005: lin
26009: his
------------------
26023: his
----------------------------
1 / 4
----------------------------
26084: march
------------------
26096: march
----------------------------
1 / 1
In a similar way, five recall errors are identified by counting the partitions of the key classes relatively to the response classes:
----------------------------
25862: Punch
25918: punch
------------------
25898: right-hook
----------------------------
1 / 2
----------------------------
25892: Taiwan's
26095: Taiwan's
------------------
25930: island's
26108: island's
----------------------------
1 / 3
----------------------------
25907: Chao-hui
------------------
25942: Huang
25966: him
25979: Huang's
26014: Huang
------------------
26023: his
----------------------------
2 / 5
----------------------------
25960: Ming-yi
------------------
25973: Lin
25975: he
26005: Lin
26009: his
----------------------------
1 / 4
This means that the following precision and recall figures of (17-3)/17)
= 0.8235
and (19-5)/19)
= 0.7368
, respectively, are obtained:
A.R.-ERGEBNIS-PARTITIONIERUNG
- SCHNITTE: 3
- MOEGLICH: 17
=> PRECISION: 0.8235
KEY-PARTITIONIERUNG
- SCHNITTE: 5
- MOEGLICH: 19
=> RECALL: 0.7368
The following table singles out the individual performance for different types of anaphoric expressions:
ANKNUEPFUNGS-ENTSCHEIDUNGEN
| PRECIS | ++ | +- | +? | +_ | +* | ?+ | ?_ |
-----+------+--------+------+------+------+------+------+------+------+
PRON | PE-3 | 1.0000 | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
| PE12 | -- | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| PO-3 | 0.5000 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| PO12 | -- | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| REFL | -- | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| RELA | 1.0000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
+------+--------+------+------+------+------+------+------+------+
| 0.8333 | 5 | 1 | 0 | 0 | 0 | 0 | 0 |
-----+------+--------+------+------+------+------+------+------+------+
NOMN | VNOM | 0.8333 | 5 | 1 | 0 | 43 | 1 | 0 | 2 |
| NAME | 1.0000 | 4 | 0 | 0 | 9 | 0 | 0 | 0 |
+------+--------+------+------+------+------+------+------+------+
| 0.9000 | 9 | 1 | 0 | 52 | 1 | 0 | 2 |
-----+------+--------+------+------+------+------+------+------+------+
Interpretation Results for the PS Task
The following table shows the nonpronominal substitutes that ROSANA suggests for the set of identified pronoun occurrences:
LEXIKALISCHE SUBSTITUTION
++ 3 25926: that == 3 25925: brawl
++ 4 25966: him == 4 25942: huang
++ 5 25975: he == 5 25973: lin
++ 6 26009: his == 6 26005: lin
+- 6 26023: his == 6 26005: lin
++ 9 26100: it == 9 26088: vote
Again, the decisions are classified according to their correctness (first column: ++
= correct, +-
= incorrect). There is merely one wrong decision that can be immediately traced back to the corresponding incorrect antecedent choice for the possessive pronoun 6 26023: his
By applying the precision and recall measures applicable in the PS task evaluation, the following figures are obtained:
| PRECIS | RECALL | ++ | +- | +? | +_ | +* | ?+ | ?_ |
-----+--------+--------+------+------+------+------+------+------+------+
PE-3 | 1.0000 | 1.0000 | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
PE12 | -- | -- | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
PO-3 | 0.5000 | 0.5000 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
PO12 | -- | -- | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
REFL | -- | -- | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
RELA | 1.0000 | 1.0000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
-----+--------+--------+------+------+------+------+------+------+------+
DURCHSCHNITT JE PRONOMEN
=> PRECISION: 0.8333
=> RECALL: 0.8333
In total, for 5/6
= 0.8333
of all pronominal occurrences, the suggested non-pronominal antecedent is correct.