Adopting ISO 24617-8 for Discourse Relations Annotation in Polish: Challenges and Future Directions

dc.contributor.authorŻurowski, Sebastian
dc.contributor.authorZiembicki, Daniel
dc.contributor.authorTomaszewska, Aleksandra
dc.contributor.authorOgrodniczuk, Maciej
dc.contributor.authorDrozd, Agata
dc.date.accessioned2023-10-30T06:46:21Z
dc.date.available2023-10-30T06:46:21Z
dc.date.issued2023-08
dc.description.abstractThis paper explores a discourse relations annotation project carried out under the CLARIN-PL initiative, leveraging the ISO 24617-8 standard. The goal is to boost research interoperability and foster multilingual research. Our team of three linguist-annotators tackled the annotation of a corpus spanning several genres, including e.g., literature and press articles in the Polish language. This effort was guided by a project expert and external linguists from the CLARIN-PL language technology research infrastructure. Several significant challenges emerged during the process. Ambiguities within the ISO standard’s relation categories, poorly-defined definitions for certain relation categories, and the difficulty of identifying and annotating implicit discourse relations, which lack explicit discourse connectives or signaling devices, were among the key issues. To overcome these problems, we implemented strategies such as regular team meetings, collaborative annotation forms, and preliminary revisions to the annotation scheme. This paper presents the project, the annotation process, and offers initial annotation data on the discourse relations and connectives identified within the corpus. Looking forward, we discuss potential enhancements to the process, including additional revisions to the guidelines and conclude with an overview of the project’s contributions and a discussion of our future development plans.pl
dc.description.sponsorshipThe work was financed by the European Regional Development Fund as a part of the 2014–2020 Smart Growth Operational Programme, CLARIN — Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00–00C002/19 and the Polish Ministry of Education and Science grant 2022/WK/09.pl
dc.identifier.citationProceedings of the 4th Conference on Language, Data and Knowledge, 12–15 September 2023 Vienna, Austria, pp. 482–492pl
dc.identifier.isbn978-989-54081-5-3
dc.identifier.urihttp://repozytorium.umk.pl/handle/item/6928
dc.language.isoengpl
dc.publisherNOVA CLUNLpl
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Poland*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/pl/*
dc.subjectdiscourse markerspl
dc.subjectdiscourse relationspl
dc.subjectdiscourse annotationpl
dc.subjectISO 24617-8pl
dc.subjectPolishpl
dc.titleAdopting ISO 24617-8 for Discourse Relations Annotation in Polish: Challenges and Future Directionspl
dc.typeinfo:eu-repo/semantics/bookPartpl

Files

Original bundle

Loading...
Thumbnail Image
Name:
s_zurowski_061.pdf
Size:
595.83 KB
Format:
Adobe Portable Document Format

License bundle

Loading...
Thumbnail Image
Name:
license.txt
Size:
1.34 KB
Format:
Item-specific license agreed upon to submission
Description: