Home

Adopting ISO 24617-8 for Discourse Relations Annotation in Polish: Challenges and Future Directions

Repozytorium Uniwersytetu Mikołaja Kopernika

Pokaż prosty rekord

dc.contributor.author Żurowski, Sebastian
dc.contributor.author Ziembicki, Daniel
dc.contributor.author Tomaszewska, Aleksandra
dc.contributor.author Ogrodniczuk, Maciej
dc.contributor.author Drozd, Agata
dc.date.accessioned 2023-10-30T06:46:21Z
dc.date.available 2023-10-30T06:46:21Z
dc.date.issued 2023-08
dc.identifier.citation Proceedings of the 4th Conference on Language, Data and Knowledge, 12–15 September 2023 Vienna, Austria, pp. 482–492
dc.identifier.isbn 978-989-54081-5-3
dc.identifier.uri http://repozytorium.umk.pl/handle/item/6928
dc.description.abstract This paper explores a discourse relations annotation project carried out under the CLARIN-PL initiative, leveraging the ISO 24617-8 standard. The goal is to boost research interoperability and foster multilingual research. Our team of three linguist-annotators tackled the annotation of a corpus spanning several genres, including e.g., literature and press articles in the Polish language. This effort was guided by a project expert and external linguists from the CLARIN-PL language technology research infrastructure. Several significant challenges emerged during the process. Ambiguities within the ISO standard’s relation categories, poorly-defined definitions for certain relation categories, and the difficulty of identifying and annotating implicit discourse relations, which lack explicit discourse connectives or signaling devices, were among the key issues. To overcome these problems, we implemented strategies such as regular team meetings, collaborative annotation forms, and preliminary revisions to the annotation scheme. This paper presents the project, the annotation process, and offers initial annotation data on the discourse relations and connectives identified within the corpus. Looking forward, we discuss potential enhancements to the process, including additional revisions to the guidelines and conclude with an overview of the project’s contributions and a discussion of our future development plans.
dc.description.sponsorship The work was financed by the European Regional Development Fund as a part of the 2014–2020 Smart Growth Operational Programme, CLARIN — Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00–00C002/19 and the Polish Ministry of Education and Science grant 2022/WK/09.
dc.language.iso eng
dc.publisher NOVA CLUNL
dc.rights Attribution-NonCommercial-NoDerivs 3.0 Poland
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/pl/
dc.subject discourse markers
dc.subject discourse relations
dc.subject discourse annotation
dc.subject ISO 24617-8
dc.subject Polish
dc.title Adopting ISO 24617-8 for Discourse Relations Annotation in Polish: Challenges and Future Directions
dc.type info:eu-repo/semantics/bookPart


Pliki:

Należy do następujących kolekcji

Pokaż prosty rekord

Attribution-NonCommercial-NoDerivs 3.0 Poland Ta pozycja jest udostępniona na licencji Attribution-NonCommercial-NoDerivs 3.0 Poland