Digital Humanities and Small Languages: Expectations and Reality

Liina Lindström 

Kristel Uiboaed

July 17, 9.50 - 11.20 (1 hr 30 min)

SUMMARY: Digital humanities are gradually becoming an inseparable part of all branches of humanities research. Methods used in other fields are increasingly implemented in humanities, and new methods are constantly being developed. On the one hand, this is a natural development in interdisciplinary research but on the other hand it is an inevitable outcome of the availability of massive data in the humanities already in digital form. Using digital data presupposes digital methods and skills to handle this data. This creates new opportunities for more versatile research and opens the humanities to greater collaboration with other fields. Objectively, this should suffice to enforce these positive trends in humanities research. However, there are constant obstacles in doing so. Technical skills and know-how for working with data are still generally not part of humanities curricula. Methods and tools developed in other fields cannot always be directly transferred to solving research questions in the humanities and often need to be adapted before implementation.  Additional problems may rise when the data comes from poorly-resourced and less studied language. Tools developed for English, for example, cannot be directly transferred to languages with more complex morphology. Furthermore, there are fewer people working in the field and the workload cannot always be distributed as needed, i.e. the researcher should do everything by her/himself. This imposes a restriction on research topics that can be exhaustively studied.

In this presentation we discuss problems that are common to digital humanities research, concentrating on characteristics of research in small and complex languages like Estonian. We introduce data and tools we have implemented in our DH projects at the University of Tartu and give an overview of obstacles and problems we have faced in this process. As our background is in linguistics, we mostly discuss linguistic data and language technology tools. In the end we summarize the main problems and provide some solutions to improve DH research and mutually lucrative interdisciplinary collaboration.

Liina Lindström is an associate professor of Estonian language at the University of Tartu. She has studied Estonian word order, syntax of spoken Estonian as well as syntax of Estonian dialects, using linguistic corpora. She has leaded a compilation and annotation of the Corpus of Estonian Dialects, which is a huge collection of spoken dialect data in digital form. More recently, she has been in charge of developing possibilities to teach and learn digital humanities at the University of Tartu.

Geography and the Pope: Exploring Digital Tools and the Papal Documents 2013–2016

Fredrik Norén

July 17, 11.35 – 12.35 (1 hr)
SUMMARY: This is a method-oriented presentation of a work in progress that seeks to explore what an atlas of the popes would look like. How can geography and the Vatican be studied together through texts? Can one study places in texts to map latent meanings of political and religious ambitions, and anticipate evolving trends? Is spatial analysis a way to better understand a closed institution such as the Holy See? The Vatican is often associated with conservative stability. The papacy has, after all, managed to prevail while states and supranational organizations have come and gone. At the same time, the Vatican has shown remarkable capacity to gradually adapt to scientific paradigms as well as a changing world. What happens when a new pope is elected to the Office of Peter? How does the perception of places change? Does the relationship between cities, countries, and regions constitute fixed patterns, or are these geographical structures evolving as a new pope is elected?
The basis of the analysis is all English translated papal documents from Benedict XVI (2005–2013) and Francis (2013–), retrieved from the Vatican web page. The presentation emphasizes the methodological process as a way to explore geography in text. More specifically, it uses different digital methods and tool to study geographical dimensions of the papal collection. For this purpose, the presentation will discuss methods such as named entity recognition, network clustering, topic modeling, and word embedding.

Starting in 2014, Fredrik Norén is a doctoral student in media and communication studies, geared towards media history and digital humanities. He has an academic background in film studies and political science from Lund University (2004-2011). After graduation, Norén was employed as a trainee in a program for future leaders and special advisors for the municipal sector, after which he worked as a project manager for the municipality of Klippan (2011-2014).

GIS application for understanding of historical space: a case study of the Dubingiai micro-region in Lithuania

Rimvydas Laužikas

July 17, 12.35 – 13.35 (1 hr)
SUMMARY: Understanding space and time is the basic structure for investigating past societies, but our modern geographical knowledge, geographical stereotypes, scientific meta-theories, and historical narratives are powerful firewalls that can get in the way of retrieval, interpretation and reuse of spatial information from historical documents. The article will introduce the GIS application  possibilities for understanding of basic concepts of mapping medieval and renaissance territories, borders, micro-regions, and administrative regions, and theoretical approaches to symbolic-iconographic interpretations of medieval and renaissance maps. This paper is based on an interdisciplinary case study of investigations of historical spatiality in the Dubingiai micro-region (near Vilnius, Lithuania) as part of “The Beginnings of Lithuanian Statehood According to the Exploration of Dubingiai micro-region (1st to 15th centuries)” research project.

Rimvydas Laužikas is a Professor of digital Social Science and Humanities and the Head of Department of Museology in the Faculty of Communication of Vilnius University. His education is in the interdisciplinary SSH fields of educational sciences, archaeology and communication and information sciences. Rimvydas’s research interests cover medieval and early modern archaeology, digital SSH, information and communication of cultural heritage, and the history of gastronomy. He has been actively involved in national projects in the fields of his interests and has also participated in several international projects, and been active in international organizations, networks and working groups (such as COST ARKWORK, Digital preservation Europe, Connecting Archaeology and Architecture to Europeana, Local content in a Europeana cloud, Europeana Food and Drink). He has written numerous articles on the archaeology of XV-XVIII century Lithuanian church and manors, using computers in SSH, digitisation, information and communication of cultural heritage, standardization, museology, and history of gastronomy.
Data journalism: USA’s experience

Kārlis Dagilis

July 18, 15.20 - 16.50 (1 hr 30 min)

SUMMARY: Some say that there are no bigger lies than statistics. However, for journalists whose main task is to seek the truth numbers are everything. Diving deep into complex data sets can help find a story or reveal a problem otherwise well hidden. While investigative journalists in the United States have been using open data sets for decades, it is now more than ever this method takes a significant role in producing groundbreaking stories about our life in the digital world. Moreover, it has exceeded the borders of journalism or communication science.  Nowadays it is a highly interdisciplinary field, impossible to imagine without computer science.

This lecture will focus on data journalism academic experience at The University of Maryland, give a broader look at data journalism today in the United States and reveal experience working with The Washington Post to uncover Russia's interference into U.S. presidential elections.

Karlis Dagilis is a lecturer in Radio Journalism at the University of Latvia. For more than 17 years he has been working as a journalist for Latvian national broadcasting organizations Latvijas Radio (LR) and Latvijas Televīzija (LTV). Karlis is also a founder of multimedia radio station for youth Pieci.lv. During his Hubert H. Humphrey fellowship program in 2016/2017, he did extensive studies of data journalism at The University of Maryland, which has one of the leading journalism colleges in United States. Furthermore, he collaborated with the Pulitzer Prize winning journalist Dana Priest from The Washington Post to shed the light of Russia's interference into U.S. presidential elections. As of fall 2018, Karlis will start his PhD studies at The Philip Merrill College of journalism.

Analysing Online Content: How Can We Tell if it is the Real Thing?

Elīna Lange-Ionatamišvili

July 20, 15.30 - 17.00 (1 hr 30 min)
SUMMARY: Authenticity of online content is one of the most discussed topics these days, in particular due to rapid technological developments.  New technology allows for cheap and widely accessible ways of creating fake discourse and channelling it to audiences without geographic limitations. During this lecture you will get an insight into the work of the NATO Strategic Communications Centre of Excellence on detecting human and robotic activity in creating fake discourse online, including methods to identify organised human trolling, robotic activity mimicking human activity and experimenting with new possibilities to manipulate video material with the purpose of impersonation.

Elīna Lange-Ionatamišvili is a Senior Expert at the NATO Strategic Communications Centre of Excellence (NATO StratCom COE) in Riga, Latvia. She holds MA in Communications Science (2006) and has spent large part of her career working for the Ministry of Defence of Latvia and NATO. Large part of Elīna’s work at the NATO StratCom COE is related to the analysis of Russia’s information confrontation in the context of the new generation warfare and on strategic communications terminology. She is also a Trainer on Behavioural Dynamics Institute’s Target Audience Analysis Methodology.


Workshop 1. Data visualisation basics

Nika Aleksejeva

July 17, 14.20 - 18.20 (3 hr 45 min)

SUMMARY: The workshop aims to share practical skills of visualizing data in most common cases. It will be achieved by mixing theoretical claims with practical assignments that will help to deal with a final project of a day. The workshop will map out the diverse world of visual communication and pin-point data visualization in it. The main emphasis of the workshop is put on a skill to choose the most appropriate data visualization format or a chart type for a case. It largely depends on the communication goal and the nature of a data set, therefore basic skills of working with data tables will be requested. The workshop will also share some tips and tricks for making successful data-driven story. The practical work during the workshop will focus on data visualization decision-making based on a data set and intended communication goal for a target audience. Depending on participants’ preferences, the actual data visualization can be done with one of digital tools suggested by the trainer or on paper. The expected outcome of the workshop is an evaluation of the final data visualization projects and lessons learned during the workshop.

Nika Aleksejeva is a data visualization and data journalism trainer. Her workshops cover data storytelling, data visualization and functional information design. Nika has over 10 years of experience in mass media and communication. Her professional background is journalism. After working at a Latvian business magazine Kapitāls, she joined Infogram, a popular data visualization service that empowers non-designers to create beautiful data visualizations. In 2014, she launched an international Infogram Ambassador Network that united about 100 data enthusiasts all over the world. In 2016, she became a School of Data fellow, with a mission to share data literacy and skills to present data-driven stories with European journalists. Currently she works to empower Latvian society with basic data literacy and data visualisation skills at School of Data Latvia.

Workshop 2. Introduction to Corpus Linguistics

Ilze Auziņa 

Baiba Saulīte

July 18, 9.30 - 15.00 (4 hr 30 min)
SUMMARY: This course will introduce summer school participants to language corpora and demonstrate the use of corpus linguistics in the humanities. 
Following issues are to be discussed: 
  • The history of corpus linguistics 
  • Definition and content of a corpus
  • Types of text corpora (general, specialized, monolingual, parallel, text, speech etc.)
  • Quantitative data
  • Corpora and computational linguistics
We will develop practical skills for extracting, annotating, and analysing corpus data of various kinds, including:
  • The use of corpora for different purposes
  • Corpus queries and regular expressions
  • Freely available corpora tools
  • Morphologically annotated corpora (part of speech analysis and tagging of a corpus)
  • Searching in morphologically annotated corpora
The course includes both theory and practice of corpus-based linguistic research.
In order to analyze a corpus and search for certain words or phrases a specific corpus software is necessary. Therefore, the students will be introduced to several corpora tools - concordancers, that are freely available, for example, Antconc.

Ilze Auziņa, PhD, is a leading researcher at the Institute of Mathematics and Computer Science, University of Latvia. Ilze is Latvian linguist, defended her PhD thesis on computational phonology investigating syllable structure, grapheme-phoneme correspondences, phonotactics of Latvian. She co-authored “The grammar on modern Latvian” on phonetics and phonology. Ilze has more than 20 years experience in phonological un phonetic analysis of Latvian. Ilze also has an experience in the speech data processing and analysis, development of speech synthesis system and automatic phonetic transcription system. She has carried out several specialized corpora development projects (as the project coordinator and the leading researcher), for example, The Corpus of the Transcripts of the Saeima’s (Parliament of Latvia) Sessions, An annotated longitudinal Latvian children's speech corpus, The Latvian Speech Recognition Corpus.

Baiba Saulīte, PhD, is a leading researcher at the Institute of Mathematics and Computer Science, University of Latvia. Baiba is Latvian linguist, defended her PhD thesis on word order and information structure in Latvian. She co-authored “The grammar on modern Latvian” on syntax and information structure. Baiba has over 10 years experience in morphological, syntactic and semantic analysis of Latvian. Her research in computational linguistics focuses on multi-layered semantically annotated language resources for Latvian (anchored in widely acknowledged multilingual representations like AMR, PropBank, FrameNet, Universal Dependencies, etc.) needed for natural language processing. She is currently working on the analysis of deverbal derivatives in Latvian.

Workshops 3/4. Stylometry Day in Riga

Jan Rybicki

July 19, 9.30 - 18.15 (7 hr. 30 min)
SUMMARY: “Stylometry Day in Riga” is aimed at introducing participants to the field of computational stylistics. An introductory lecture in the morning will show the main tenets and methods of the field, together with examples of research in authorship attribution and distant reading. In the following hands-on workshop, the participants will be acquainted with stylo, a package for the statistical programming environment R; this package is a way to avoid R’s steep learning curve so that humanists can easily perform advanced quantitative analyses of texts. While stylo has its own built-in visualization tools, the second part of the workshop will also introduce gephi, a piece of network analysis software. In the afternoon session, the participants will be able to perform their first own analyses on their own collections of texts or on those provided for them, beginning with inputting electronic texts through tokenization, distance measure calculation, cluster analysis, all the way to various modes of visualization. No programming skills are required!

If the participants wish to work on their own computers, they are strongly recommended to download and install R and gephi (and check if they are functioning correctly on their computers).

  1. R: cran.r-project.org/
  2. Gephi: gephi.org/
  3. Download link for sample text collection for first analysis: https://1drv.ms

If the participants plan to try out the new methods on their own texts, these should be in plain text (.txt) format, UTF-8 encoded. Preferably, the file names should follow the pattern: author_title_date.txt (keep the underscores). It makes sense to bring texts by at least five authors, at least two texts each (from short story to novel or full piece of drama) – but the more the merrier.

Dr. Jan Rybicki is Assistant Professor at the Institute of English Studies of the Jagiellonian University in Kraków, Poland. With a background in literary translation and comparative literature, he now mostly deals with distant reading, stylometry and authorship attribution, and some of his studies in the latter field, namely on Harper Lee and Elena Ferrante, have made headlines in the media. His latest publications range from a distant reading of Polish literature: "Pierwszy rzut oka na stylometryczną mapę literatury polskiej," (Teksty Drugie, 2014) and "A Second Glance at a Stylometric Map of Polish Literature" (Forum poetyki, 2017); through a new outlook on the sermons of the American Great Awakening: "Is God Really Angry at Sinners? A Stylometric Study of Jonathan Edwards's Representations of God," (2017, with Michał Choiński); to a study of gender in English literature: "Vive la différence: Tracing the (Authorial) Gender Signal by Multivariate Analysis of Word Frequencies” (Digital Scholarship in the Humanities, 2016). Together with Maciej Eder and Mike, Kestemont, he is a co-author of stylo, the stylometric package for R ("Stylometry with R: A Package for Computational Text Analysis," R Journal, 2016).
Workshop 5. nodegoat Data Modelling Workshop

Pim van Bree 

Geert Kessels (LAB1100)

July 20, 9.30 - 15.00 (4 hr 30 min)

SUMMARY: A well thought-out database for digital history projects allows for various modes of analysis, visualisation, and interconnectivity. Each database with historical data requires a thorough understanding of the underlying conceptual data model and logical data model. Moreover, the interface at hand has to be scrutinised as well. This workshop will deal with the following three distinct levels of any data modelling process:
1. Creating a conceptual data model
What are the types of information that can be identified in the research process, and how do they relate to one another?
2. Creating a logical data model
How will different kinds of information be stored and how to deal with vague / ambiguous / uncertain / contradictory / unique / irregular data?
3. Using a database application
Which options does the database application offer and how can the conceptualised data model best be implemented?
During the workshop, we will first focus on getting a good understanding of these three distinct levels and explore how these levels inform each other. After this, participants will be able to create/refine a data model of their own and learn how to implement this in nodegoat.
No prior knowledge is required to attend this workshop. Participants are required to bring their own laptop to the workshop. No new software has to be installed, as you only need to use a (modern) web-browser.

LAB1100 is a research and development firm established in 2011 by Pim van Bree and Geert Kessels. Their joint skill set in new media, history, and software development allows them to conceptualise and develop complex software applications. Working together with universities, research institutes, and musea, LAB1100 has built the digital research platform nodegoat and produces interactive data visualisations.

Pim van Bree received his MA in New Media and Digital Culture at the University of Amsterdam. He graduated with a thesis on the actor network of transnational online dating, investigating the crossroads between the local, national, global, and the online assemblage. His work experience in the field of new media: digital strategist at Tribal DDB Amsterdam and software developer at KIWA.

Geert Kessels received his BA in History from Radboud University Nijmegen and completed the research master program in History at the University of Amsterdam. He graduated with a thesis on the influences of German Idealism on the Slovak romantic intellectual Ľudovit Štúr. During his studies he completed an internship at the Study Platform on Interlocking Nationalisms and worked as a project manager for EUROCLIO - The European Association of History Educators.