Skip to main content
  • LV
  • ENG
DIGITAL HUMANITIES IN LATVIA
  • home
  • events
    • Baltic DH Forum
      • Programme
      • Practical info
  • resource library
  • Workshop series
  • videos
  • BSSDH
    • 2025
      • About
      • Programme
      • Lectures and Workshops
      • Venue
      • Registration
      • Gallery
    • 2024
      • About
      • Programme
      • Lectures and Workshops
      • Venue
      • Registration
      • Gallery
    • 2023
      • About
      • Programme
      • Lectures and Workshops
      • Venue
      • Registration
      • Gallery
    • 2022
      • About
      • Programme
      • Lectures and Workshops
      • Venue
      • Registration
      • Gallery
    • 2019
      • About
      • Programme
      • Lectures and Workshops
      • Venue
      • Registration
      • Gallery
    • 2018
      • Programme
      • Lectures and Workshops
      • How to apply
      • Venue
      • Gallery
  • Projects
    • digiLATE
    • Language Technology Initiative
    • VPP LATE
    • DHELI
    • DH VPP
    • Going beyond search (NordPlus)
  • ERICs
  • about us

A unique corpus for Latvian language learners is being created in the Laboratory of Artificial Intelligence

December 7, 2020 at 8:21 am


Since September 2018, the Latvian Language Learners' Corpus (LaVA; http://lava.korpuss.lv/) has been worked on by the Artificial Intelligence Laboratory of the Institute of Mathematics and Informatics of the University of Latvia (LU MII AiLab). It will be a new basis for the study of the peculiarities of Latvian language acquisition, for the quantitative and qualitative analysis of the mistakes made by language learners. Also, taking into account the mistakes of learners and the influence of the mother tongue, methodological materials for language learning will be developed.


LaVA includes works by foreign students studying at a Latvian higher education institution who are learning Latvian as a foreign language in the first or second semester. The texts have been created in the study process and have been obtained from Rīga Stradiņš University, the University of Latvia, the Liepāja University, the Rēzekne Academy of Technologies and the Latvian Academy of Culture. The corpus is expected to consist of approximately 1,000 student papers and 100,000 vocabulary.

Project manager, leading researcher Ilze Auziņa: “In the last 15–20 years, language builder corpora have become very popular – researchers use them to study the impact of the mother tongue on foreign language learning, as well as the language learning process in general, they also help plan the learning process. At present, the field of language learners' corpora is dominated by English, however, other language learners' corpora are also being formed, such as German, Portuguese and Russian language learners' corpora. LaVA is now being set up, whose data will be used to develop online assignments and self-tests. ”

A language corpus is a structured set of texts or transcripts of speech intended for linguistic analysis and the development of language technologies. It includes authentic language material that reflects the actual use of the language. The language learners' body contains systematic data on language learners – texts and / or decoded audio files, which usually also mark the mistakes made by language learners.

The corpus of Latvian language learners is being formed in the Fundamental and Applied Research project “Development of the Latvian language learners corps: methods, tools and use” (No. lzp-2018 / 1-0527).

LU MII AiLab is one of the most important organisations in Latvia, which has been engaged in research in computer linguistics and language technologies for 28 years. The laboratory conducts research in various areas of automated natural language processing and machine learning, develops machine-readable dictionaries (the most popular of which is Tēzaurs.lv) and machine-readable speech and text corpora (Korpuss.lv).

The information was prepared by Kristīne Pokratniece, AiLab

Recent Posts

  • CLARIN & DARIAH Latvia's Spring Conference 2026: Digital Infrastructure for the Humanities
    10 Feb, 2026
  • Save the date for Baltic Summer School of Digital Humanities 2026
    8 Jan, 2026
  • A seminar will take place titled “Quantitative Approaches to Historical Visual Culture – an Example of Soviet Newsreels”
    30 Oct, 2025
  • The 7th Baltic Summer School of Digital Humanities has concluded
    12 Aug, 2025
  • 7th Baltic Summer School of Digital Humanities to Take Place in Riga
    28 Jul, 2025
  • 6-7 May Jan Hajič, PhD, will give lectures and workshops on Digital Musicology and Computer Analysis of Gregorian Chants in Riga
    16 Apr, 2025
  • Call for applications for the 7th Baltic Summer School of Digital Humanities
    4 Mar, 2025

Archive

  • 2026
    • February
    • January
  • 2025
    • October
    • August
    • July
    • April
    • March
  • 2024
    • November
    • July
    • June
    • May
    • April
  • 2023
    • December
    • November
    • August
    • July
    • June
    • May
    • April
    • February
  • 2022
    • October
    • September
    • July
    • June
    • April
    • March
    • February
    • January
  • 2021
    • October
    • May
    • April
    • March
    • February
  • 2020
    • December
    • March
    • February
    • January
  • 2019
    • October
    • September
    • April
  • 2018
    • October
    • July
    • June
    • May
    • March
    • January
  • 2017
    • October
    • September
digitalhumanities.lv 
supported by the project “University of Latvia and Institutes in the European Research Area – Excellence, Activity, Mobility, Capacity” (No. 1.1.1.5/3/25/I/011).