Example of Interpretation Results

A Newswire Text

Here is a sample news agency release similar to the documents that have been used during the formal evaluation of the ROSANA system. The text is of slightly smaller length than the average document of the evaluation corpus, but it is perfectly suitable to illustrate the characteristics of it.

A Punch is Sometimes as Good as a Speech in Taiwanese Democracy.
Taiwanese legislators are finding that good oratory is not the
only skill needed to survive in Taiwan's blooming democracy:
a powerful right-hook also helps.
Opposition lawmaker Huang Chao-hui was back at work Wednesday
after being felled by a punch in the latest of the legislative
brawls that have marked the island's transition from virtual
dictatorship to democracy since 1987.
Huang, of the Democratic Progressive Party, had sought hospital
treatment for suspected concussion Tuesday after Lin Ming-yi of
the ruling Nationalists punched him during a debate.
Lin said he could not stomach Huang's taunts that Nationalist
lawmakers have recently started attending more legislative
sessions only to try to ensure victory in elections later
this year.
Lin later apologized for his violent outburst, and Huang was
released from a hospital with bruises on his head.
On Monday, more than 10 lawmakers traded punches during a brawl
started when opposition New Party legislator Ju Gau-jeng,
nicknamed ``Rambo'' by Taiwanese newspapers, jumped onto the
legislative speaker's desk.
Tensions are running high in the legislature because lawmakers
are debating a bill to govern a presidential election next March.
The vote is seen as a milestone in Taiwan's march to democracy
because it will mark the first time that the island's president
is elected by universal suffrage.

Interpretation Results for the OV Task

The alignment of the discourse referent occurrences identified by ROSANA with the discourse referent occurrences annotated in the key yields the following result:

ZUORDNUNG OKKURRENZEN:

- NUR ANA:
25866: good
26041: opposition

- NUR KEY:
25911: work
25921: latest

This means that there are merely two precision errors (non-occurrences wrongly suggested by the anaphor resolution system) and two recall errors (true occurrences that haven’t been identified by the anaphor resolution system). All errors can be traced back to miscategorizations during the morphoogical and syntactical pre-analysis. E.g. the first precision error 25866: good is due to an incorrect assignment of part-of-speech (interpreting the word good as noun instead of as an adverb). Similarly, the recall error 25911: work is due to an incorrect parsing decision to interpret the word work as non-head of a compound NP, and, hence, as a non-occurrence.

Since 70 occurrences have been identified in accordance with the key annotations, precision and recall figures of 70/(70+2) = 0.9722 are obtained:

EVALUATIONSERGEBNIS OKKURRENZEN:
- NUR ANA: 2
- NUR KEY: 2
- ANA UND KEY: 70
=> PRECISION: 0.9722
=> RECALL: 0.9722

Interpretation Results for the KV Task

The following table shows the individual antecedent decisions that have been met by ROSANA:

 A.R.-ENTSCHEIDUNGEN
++ 2 25894: democracy ---> 1 25872: democracy
++ 3 25918: punch ---> 1 25862: punch
++ 3 25926: that ---> 3 25925: brawl
++ 3 25936: democracy ---> 2 25894: democracy
++ 4 25966: him ---> 4 25942: huang
++ 5 25975: he ---> 5 25973: lin
++ 5 25979: huang ---> 4 25942: huang
++ 6 26005: lin ---> 5 25973: lin
++ 6 26009: his ---> 6 26005: lin
++ 6 26014: huang ---> 5 25979: huang
+- 6 26023: his ---> 6 26009: his
++ 9 26095: taiwan ---> 2 25892: taiwan
+- 9 26096: march ---> 8 26084: march
++ 9 26098: democracy ---> 3 25936: democracy
++ 9 26100: it ---> 9 26088: vote
++ 9 26108: island ---> 3 25930: island

The decisions are classified according to their correctness (first column: ++ = correct, +- = incorrect). There are merely two incorrect decisions plus an additional one (not shown in the table) that is due to a syntactical misinterpretation of two NP as standing in an appositional relationship, which wrongly has been interpreted as an inductor of a coreference relation decision. Consequently, there are three precision errors in the KV task of identifying sets of cospecifying occurrences. In the equivalence class centered computation of the evaluation measures, these errors show up as partitions of ROSANA generated classes by key classes:

  ----------------------------
25942: huang
25966: him
25979: huang
26014: huang
------------------
25948: party
----------------------------
1 / 4

----------------------------
25973: lin
25975: he
26005: lin
26009: his
------------------
26023: his
----------------------------
1 / 4

----------------------------
26084: march
------------------
26096: march
----------------------------
1 / 1

In a similar way, five recall errors are identified by counting the partitions of the key classes relatively to the response classes:

  ----------------------------
25862: Punch
25918: punch
------------------
25898: right-hook
----------------------------
1 / 2

----------------------------
25892: Taiwan's
26095: Taiwan's
------------------
25930: island's
26108: island's
----------------------------
1 / 3

----------------------------
25907: Chao-hui
------------------
25942: Huang
25966: him
25979: Huang's
26014: Huang
------------------
26023: his
----------------------------
2 / 5

----------------------------
25960: Ming-yi
------------------
25973: Lin
25975: he
26005: Lin
26009: his
----------------------------
1 / 4

This means that the following precision and recall figures of (17-3)/17) = 0.8235 and (19-5)/19) = 0.7368, respectively, are obtained:

 A.R.-ERGEBNIS-PARTITIONIERUNG
- SCHNITTE: 3
- MOEGLICH: 17
=> PRECISION: 0.8235

KEY-PARTITIONIERUNG
- SCHNITTE: 5
- MOEGLICH: 19
=> RECALL: 0.7368

The following table singles out the individual performance for different types of anaphoric expressions:

 ANKNUEPFUNGS-ENTSCHEIDUNGEN

| PRECIS | ++ | +- | +? | +_ | +* | ?+ | ?_ |
-----+------+--------+------+------+------+------+------+------+------+
PRON | PE-3 | 1.0000 | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
| PE12 | -- | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| PO-3 | 0.5000 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| PO12 | -- | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| REFL | -- | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| RELA | 1.0000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
+------+--------+------+------+------+------+------+------+------+
| 0.8333 | 5 | 1 | 0 | 0 | 0 | 0 | 0 |
-----+------+--------+------+------+------+------+------+------+------+
NOMN | VNOM | 0.8333 | 5 | 1 | 0 | 43 | 1 | 0 | 2 |
| NAME | 1.0000 | 4 | 0 | 0 | 9 | 0 | 0 | 0 |
+------+--------+------+------+------+------+------+------+------+
| 0.9000 | 9 | 1 | 0 | 52 | 1 | 0 | 2 |
-----+------+--------+------+------+------+------+------+------+------+

Interpretation Results for the PS Task

The following table shows the nonpronominal substitutes that ROSANA suggests for the set of identified pronoun occurrences:

 LEXIKALISCHE SUBSTITUTION
++ 3 25926: that == 3 25925: brawl
++ 4 25966: him == 4 25942: huang
++ 5 25975: he == 5 25973: lin
++ 6 26009: his == 6 26005: lin
+- 6 26023: his == 6 26005: lin
++ 9 26100: it == 9 26088: vote

Again, the decisions are classified according to their correctness (first column: ++ = correct, +- = incorrect). There is merely one wrong decision that can be immediately traced back to the corresponding incorrect antecedent choice for the possessive pronoun 6 26023: his

By applying the precision and recall measures applicable in the PS task evaluation, the following figures are obtained:

      | PRECIS | RECALL |  ++  |  +-  |  +?  |  +_  |  +*  |  ?+  |  ?_  |
-----+--------+--------+------+------+------+------+------+------+------+
PE-3 | 1.0000 | 1.0000 | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
PE12 | -- | -- | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
PO-3 | 0.5000 | 0.5000 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
PO12 | -- | -- | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
REFL | -- | -- | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
RELA | 1.0000 | 1.0000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
-----+--------+--------+------+------+------+------+------+------+------+

DURCHSCHNITT JE PRONOMEN
=> PRECISION: 0.8333
=> RECALL: 0.8333

In total, for 5/6 = 0.8333 of all pronominal occurrences, the suggested non-pronominal antecedent is correct.

Schreibe einen Kommentar