![]() ![]() ![]() In this paper, we describe the Nordic Dialect Corpus, which has recently been completed. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12) Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013) 2012 Proceedings of the 19th Nordic Conference of Computational Linguistics ( NODALIDA 2013) The consistency and the parsability of this treebank is shown to be comparable to other large treebank initiatives. Finally, we present the first results of data-driven dependency parsing of Norwegian, contrasting four state-of-the-art dependency parsers trained on the treebank. We then present the selection of texts and distribution between genres, as well as the annotation process and an evaluation of the inter-annotator agreement. This paper presents the core principles behind the syntactic annotation and how these principles were employed in certain specific cases. It is the first publically available treebank for Norwegian. #Kristin hagen textlab manual#The Norwegian Dependency Treebank is a new syntactic treebank for Norwegian Bokmäl and Nynorsk with manual syntactic and morphological annotation, developed at the National Library of Norway in collaboration with the University of Oslo. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16) Proceedings of the 21st Nordic Conference on Computational Linguistics 2016Ĭonstructing a Norwegian Academic Wordlist Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) 2017Ī modernised version of the Glossa corpus search system The LIA Treebank of Spoken Norwegian Dialects The results show that the taggers based on either conditional random fields or neural networks perform much better than the rest, with the LSTM tagger getting the highest score. We go into some of the challenges posed by the task of tagging spoken, as opposed to written, language, and in particular a wide range of dialects as is found in the recordings of the LIA (Language Infrastructure made Accessible) project. The taggers all rely on different machine learning mechanisms: decision trees, hidden Markov models (HMMs), conditional random fields (CRFs), long-short term memory networks (LSTMs), and convolutional neural networks (CNNs). This paper describes an evaluation of five data-driven part-of-speech (PoS) taggers for spoken Norwegian. Proceedings of the Twelfth Language Resources and Evaluation Conference 2020Ĭomparing Methods for Measuring Dialect Similarity in Norwegian ![]() We have developed a spoken language parser on the basis of the annotated material and report on its accuracy both on a test set across the dialects and by holding out single dialects. We follow earlier efforts for Norwegian, in particular the LIA Treebank of spoken dialects transcribed in the Nynorsk variety of Norwegian, in the annotation principles to ensure interusability of the resources. The nature of the spoken data gives rise to various challenges both in segmentation and annotation. It consists of dialect recordings made between 20 which have been digitised, segmented, transcribed and subsequently annotated with morphological and syntactic analysis. This paper presents the NDC Treebank of spoken Norwegian dialects in the Bokmål variety of Norwegian. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |