Skip to main content
  • LV
  • ENG
DIGITAL HUMANITIES IN LATVIA
  • home
  • events
  • institutions
    • National Library of Latvia
    • Institute of Literature, Folklore and Art, UL
    • Artificial Intelligence Laboratory, IMCS UL
    • The University of Latvia Livonian Institute
    • Rēzekne Academy of Technologies
    • Riga Technical University
    • Tilde
  • resource library
  • Workshop series
  • videos
  • Baltic DH Forum
    • Programme
    • Practical info
  • BSSDH
    • 2025
      • About
      • Programme
      • Lectures and Workshops
      • Venue
      • Registration
      • Gallery
    • 2024
      • About
      • Programme
      • Lectures and Workshops
      • Venue
      • Registration
      • Gallery
    • 2023
      • About
      • Programme
      • Lectures and Workshops
      • Venue
      • Registration
      • Gallery
    • 2022
      • About
      • Programme
      • Lectures and Workshops
      • Venue
      • Registration
      • Gallery
    • 2019
      • About
      • Programme
      • Lectures and Workshops
      • Venue
      • Registration
      • Gallery
    • 2018
      • Programme
      • Lectures and Workshops
      • How to apply
      • Venue
      • Gallery
  • Projects
    • DHELI
    • VPP LATE
    • DH VPP
    • Going beyond search (NordPlus)
    • Language Technology Initiative
  • about us

A unique corpus for Latvian language learners is being created in the Laboratory of Artificial Intelligence

December 7, 2020 at 8:21 am


Since September 2018, the Latvian Language Learners' Corpus (LaVA; http://lava.korpuss.lv/) has been worked on by the Artificial Intelligence Laboratory of the Institute of Mathematics and Informatics of the University of Latvia (LU MII AiLab). It will be a new basis for the study of the peculiarities of Latvian language acquisition, for the quantitative and qualitative analysis of the mistakes made by language learners. Also, taking into account the mistakes of learners and the influence of the mother tongue, methodological materials for language learning will be developed.


LaVA includes works by foreign students studying at a Latvian higher education institution who are learning Latvian as a foreign language in the first or second semester. The texts have been created in the study process and have been obtained from Rīga Stradiņš University, the University of Latvia, the Liepāja University, the Rēzekne Academy of Technologies and the Latvian Academy of Culture. The corpus is expected to consist of approximately 1,000 student papers and 100,000 vocabulary.

Project manager, leading researcher Ilze Auziņa: “In the last 15–20 years, language builder corpora have become very popular – researchers use them to study the impact of the mother tongue on foreign language learning, as well as the language learning process in general, they also help plan the learning process. At present, the field of language learners' corpora is dominated by English, however, other language learners' corpora are also being formed, such as German, Portuguese and Russian language learners' corpora. LaVA is now being set up, whose data will be used to develop online assignments and self-tests. ”

A language corpus is a structured set of texts or transcripts of speech intended for linguistic analysis and the development of language technologies. It includes authentic language material that reflects the actual use of the language. The language learners' body contains systematic data on language learners – texts and / or decoded audio files, which usually also mark the mistakes made by language learners.

The corpus of Latvian language learners is being formed in the Fundamental and Applied Research project “Development of the Latvian language learners corps: methods, tools and use” (No. lzp-2018 / 1-0527).

LU MII AiLab is one of the most important organisations in Latvia, which has been engaged in research in computer linguistics and language technologies for 28 years. The laboratory conducts research in various areas of automated natural language processing and machine learning, develops machine-readable dictionaries (the most popular of which is Tēzaurs.lv) and machine-readable speech and text corpora (Korpuss.lv).

The information was prepared by Kristīne Pokratniece, AiLab

Recent Posts

  • 6-7 May Jan Hajič, PhD, will give lectures and workshops on Digital Musicology and Computer Analysis of Gregorian Chants in Riga
    16 Apr, 2025
  • Call for applications for the 7th Baltic Summer School of Digital Humanities
    4 Mar, 2025
  • CFP of 60th International Academic Conference in Honour of Prof. Arturs Ozols “The language system, morphemics and derivational morphology”
    5 Nov, 2024
  • CFP: Grammar and corpora, 10th international conference on Grammar and Corpora
    5 Nov, 2024
  • The 6th Baltic Summer School of Digital Humanities is taking place
    23 Jul, 2024
  • Workshop "Opening the Trilogy: Folklore Taxonomies & Annotated Texts for Reproducible Research"
    6 Jun, 2024
  • Webinar “Tracing Nazi-Fascist violence to reinterpret World War II history. Deportation and war massacres in Italy between 1943 and 1945 from the archive to digital”
    7 May, 2024
digitalhumanities.lv 
supported by the project "Towards Development of Open and FAIR Digital Humanities Ecosystem in Latvia" (No. VPP-IZM-DH-2022/1-0002)