IMCS UL participates in the LREC 2022 conference
LREC 2022 (13th Conference on Language Resources and Evaluation) is taking place in Marseille on June 20-25, 2022. LREC is the major event on Language Resources and Evaluation for Human Language Technologies. The Institute of Mathematics and Computer Science, University of Latvia, participates in the conference with three presentations.
On June 21, there will be a report on the Latvian Language Learner Corpus LaVA, which includes more than 1000 essays written by language learners studying at Latvian universities (corpus size – 190k words). By analyzing the mistakes of the language learners marked in the texts, a set of self-assessment exercise was created.
Poster presentation on the Latvian National Corpora Collection (LNCC) - a diverse collection of corpora representing both written and spoken language is planned on Midsummer's Day, June 23. All corpora of LNCC are annotated with a uniform morpho-syntactic annotation scheme, enabling federated search and consistent linguistics analysis in more than 20 corpora (1.3B tokens).
Online hands-on workshop on korpuss.lv platform and searching the corpora, organized by the Institute of Mathematics and Computer Sciences of the University of Latvia in cooperation with CLARIN Latvia. Instructors: Ilze Auziņa and Baiba Saulīte.
The project "Research on Modern Latvian Language and Development of Language Technology” is implemented within the framework of the National Research Programme "Letonika – Fostering a Latvian and European Society".
Project No: VPP-LETONIKA-2021/1-0006
Implementation period: 20.12.2021–19.12.2024
Project funding: 1 068 000 EUR
Funded by: Latvian Council of Science of the Ministry of Education and Science
Project partners: Institute of Mathematics and Computer Science of the University of Latvia (leading partner), University of Latvia (Latvian Language Institute of the University of Latvia and Faculty of Humanities of the University of Latvia), Institute of Literature, Folklore and Art of the University of Latvia, Liepāja University
Project leader: Ilze Auziņa
The aim of the project is to advance research on the grammatical, lexical-semantic, phonetic and phonological system of the modern Latvian language, and Latvian sign language using data-driven methods, as well as to develop sustainable Latvian language resources and tools. In order to achieve the goal, the Latvian speech corpus, the pilot corpus of Latvian sign language will be developed, and Tezaurs.lv and “Dictionary of Contemporary Latvian” will be improved. Based on Latvian grammar studies, “Latvian Treebank” will be enhanced. These resources will be integrated into a single Latvian language research infrastructure, as well into the CLARIN-LV repository. During the project, a LATE platform for speech transcription and subtitling will be created.
Project's research group: 10 lead participants and 33 participants (including 14 student-participants).