Adopting ISO 24617-8 for Discourse Relations Annotation in Polish: Challenges and Future Directions
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
NOVA CLUNL
Abstract
This paper explores a discourse relations annotation project carried out under the CLARIN-PL initiative, leveraging the ISO 24617-8 standard. The goal is to boost research interoperability and foster multilingual research. Our team of three linguist-annotators tackled the annotation of a corpus spanning several genres, including e.g., literature and press articles in the Polish language. This effort was guided by a project expert and external linguists from the CLARIN-PL language technology research infrastructure. Several significant challenges emerged during the process. Ambiguities within the ISO standard’s relation categories, poorly-defined definitions for certain relation categories, and the difficulty of identifying and annotating implicit discourse relations, which lack explicit discourse connectives or signaling devices, were among the key issues. To overcome these problems, we implemented strategies such as regular team meetings, collaborative annotation forms, and preliminary revisions to the annotation scheme. This paper presents the project, the annotation process, and offers initial annotation data on the discourse relations and connectives identified within the corpus. Looking forward, we discuss potential enhancements to the process, including additional revisions to the guidelines and conclude with an overview of the project’s contributions and a discussion of our future development plans.
Description
Keywords
discourse markers, discourse relations, discourse annotation, ISO 24617-8, Polish
Citation
Proceedings of the 4th Conference on Language, Data and Knowledge, 12–15 September 2023 Vienna, Austria, pp. 482–492
Collections
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Poland