dc.contributor.author |
Żurowski, Sebastian |
dc.contributor.author |
Ziembicki, Daniel |
dc.contributor.author |
Tomaszewska, Aleksandra |
dc.contributor.author |
Ogrodniczuk, Maciej |
dc.contributor.author |
Drozd, Agata |
dc.date.accessioned |
2023-10-30T06:46:21Z |
dc.date.available |
2023-10-30T06:46:21Z |
dc.date.issued |
2023-08 |
dc.identifier.citation |
Proceedings of the 4th Conference on Language, Data and Knowledge, 12–15 September 2023 Vienna, Austria, pp. 482–492 |
dc.identifier.isbn |
978-989-54081-5-3 |
dc.identifier.uri |
http://repozytorium.umk.pl/handle/item/6928 |
dc.description.abstract |
This paper explores a discourse relations annotation project carried out under the CLARIN-PL initiative, leveraging the ISO 24617-8 standard. The goal is to boost research interoperability and foster multilingual research. Our team of three linguist-annotators tackled the annotation of a corpus spanning several genres, including e.g., literature and press articles in the Polish language. This effort was guided by a project expert and external linguists from the CLARIN-PL language technology research infrastructure. Several significant challenges emerged during the process. Ambiguities within the ISO standard’s relation categories, poorly-defined definitions for certain relation categories, and the difficulty of identifying and annotating implicit discourse relations, which lack explicit discourse connectives or signaling devices, were among the key issues. To overcome these problems, we implemented strategies such as regular team meetings, collaborative annotation forms, and preliminary revisions to the annotation scheme. This paper presents the project, the annotation process, and offers initial annotation data on the discourse relations and connectives identified within the corpus. Looking forward, we discuss potential enhancements to the process, including additional revisions to the guidelines and conclude with an overview of the project’s contributions and a discussion of our future development plans. |
dc.description.sponsorship |
The work was financed by the European Regional Development Fund as a part of the 2014–2020 Smart Growth Operational Programme, CLARIN — Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00–00C002/19 and the Polish Ministry of Education and Science grant 2022/WK/09. |
dc.language.iso |
eng |
dc.publisher |
NOVA CLUNL |
dc.rights |
Attribution-NonCommercial-NoDerivs 3.0 Poland |
dc.rights.uri |
http://creativecommons.org/licenses/by-nc-nd/3.0/pl/ |
dc.subject |
discourse markers |
dc.subject |
discourse relations |
dc.subject |
discourse annotation |
dc.subject |
ISO 24617-8 |
dc.subject |
Polish |
dc.title |
Adopting ISO 24617-8 for Discourse Relations Annotation in Polish: Challenges and Future Directions |
dc.type |
info:eu-repo/semantics/bookPart |