Towards Multilingual Coreference Resolution

Zhekova, Desislava

Zitierlink URN

https://nbn-resolving.de/urn:nbn:de:gbv:46-00103541-11

Towards Multilingual Coreference Resolution

Veröffentlichungsdatum

2013-12-20

Autoren

Betreuer

Gutachter

Zusammenfassung

The current work investigates the problems that occur when coreference resolution is considered as a multilingual task. We assess the issues that arise when a framework using the mention-pair coreference resolution model and memory-based learning for the resolution process are used. Along the way, we revise three essential subtasks of coreference resolution: mention detection, mention head detection and feature selection. For each of these aspects we propose various multilingual solutions including both heuristic, rule-based and machine learning methods. We carry out a detailed analysis that includes eight different languages (Arabic, Catalan, Chinese, Dutch, English, German, Italian and Spanish) for which datasets were provided by the only two multilingual shared tasks on coreference resolution held so far: SemEval-2 and CoNLL-2012. Our investigation shows that, although complex, the coreference resolution task can be targeted in a multilingual and even language independent way. We proposed machine learning methods for each of the subtasks that are affected by the transition, evaluated and compared them to the performance of rule-based and heuristic approaches. Our results confirmed that machine learning provides the needed flexibility for the multilingual task and that the minimal requirement for a language independent system is a part-of-speech annotation layer provided for each of the approached languages. We also showed that the performance of the system can be improved by introducing other layers of linguistic annotations, such as syntactic parses (in the form of either constituency or dependency parses), named entity information, predicate argument structure, etc. Additionally, we discuss the problems occurring in the proposed approaches and suggest possibilities for their improvement.

Schlagwörter

Coreference Resolution

;

Anaphora

;

Machine Learning

;

Natural Language Processing

Institution

Universität Bremen

Fachbereich

Fachbereich 10: Sprach- und Literaturwissenschaften (FB 10)

Dokumenttyp

Dissertation

Zweitveröffentlichung

Nein

Sprache

Englisch

Dateien

Name

00103541-1.pdf

Size

4.13 MB

Format

Adobe PDF

Checksum

(MD5):53bb96ce94d1376f62fa240a4e9f2515