In this introductory session, students will become acquainted with the versatile Jupyter Notebooks platform for Python programming. We will begin by exploring Python's fundamental data structures, including lists, dictionaries, and tuples. Next, we will delve into text analysis by learning how to read, filter, and convert text files, as well as work with folders. By introducing the Natural Language Toolkit (NLTK) library, students will gain hands-on experience in processing and analyzing textual data. This comprehensive first day will provide a strong foundation in Python and set the stage for more advanced topics on Day 2.
Uldis Bojārs is a computer scientist interested in the fields of Semantic Web, Open Data and Digital Libraries. He has a Ph.D. in Computer Science from the National University of Ireland, Galway, focusing on the Semantic Web and its applications. Uldis is an Associate Professor at the Faculty of Computing, University of Latvia, where he shares his knowledge and expertise with students, and a Data Semantics Development Manager at the National Library of Latvia, where he works on library linked data projects, enhancing information accessibility and resource sharing. At the University of Latvia, Uldis' teaching activities include the Python programming language, emphasizing its practical applications in diverse areas, including natural language processing.
Building upon the foundation established on Day 1, this session will focus on data manipulation and visualization using the powerful libraries Pandas and Matplotlib. Students will learn how to load, filter, and transform data using Pandas DataFrames, as well as perform basic statistical analysis. We will then explore data visualization techniques, such as bar charts, scatter plots, and line graphs, by leveraging the capabilities of the Matplotlib library. By the end of Day 2, participants will have gained valuable skills in Python programming, enabling them to analyze and visualize real-world data sets with confidence and ease.
On the first day, participants will be introduced to the fundamentals of text content analysis, focusing on techniques to preprocess and analyze textual data. The session will cover text cleaning, tokenization, stemming, and lemmatization using libraries such as Gensim and Pandas. Attendees will learn how to convert raw text into structured data, suitable for further analysis. The day will conclude with an introduction to text feature extraction methods, including Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF), to prepare participants for Day 2's machine learning applications.
Participants of the workshop should have basic knowledge of Python: strings, lists, dictionaries, and and control structures. Experience with Jupyter Notebooks through Google Colab, Visual Studio Code or PyCharm would be helpful but not required.
Building on Day 1's foundation, the second day will delve into advanced discourse analysis techniques and machine learning applications, utilizing Scikit-learn and Gensim. Participants will explore various algorithms for topic modeling, such as Latent Dirichlet Allocation (LDA), and sentiment analysis, including Naïve Bayes and Support Vector Machines (SVM). Attendees will learn how to evaluate and fine-tune their models for optimal performance. The workshop will culminate in the creation of interactive visualizations using the Plotly library, empowering participants to effectively communicate their findings and insights to a broader audience.
Collecting Web Data for Humanities Research
In order to analyse large amounts of data, we need to collect and store it first. This workshop will focus on collecting data from the Web using several methods: Web Scraping with Python, API to get access to the data directly, and different tools that help automate this process. During the workshop, we will extract data from web sites, and news articles.
Marija Isupova is a software engineer at NATO StratCom COE, where she is researching methods of understanding the information environment landscape in online and social media. She has a statistics and data analytics background using Python and R. She holds a Master's degree in Computer Science from the University of Latvia.
Students’ Poster Session
Students’ Poster Session will give an opportunity for BSSDH 2023 students to present their work, projects, and research ideas to get feedback from BSSDH organisers. Students are encouraged to present ideas and work in progress where there is most urgency to get feedback, although it is possible to present completed projects as well. The poster presentation should be contained in one slide and take up to 5 minutes of presentation, which will be followed by 10 minutes of feedback from organizers and questions from the audience.
Haralds Matulis is a Digital Humanities (DH) projects coordinator at Institute of Literature, Folklore and Art of the University of Latvia (ILFA). Having background in philosophy, social anthropology and literature, he is now working with computational methods and their application to humanities and culture domain data, as well as advancing and developing DH Tools and Resources infrastructure in Latvia. Currently he is studying digital humanities at Helsinki university master program and researching diaries with computational methods.
The session aims to showcase basic functionality of two free tools: Tableau Public for tabular data visualization and Gephi for network visualization. First, the session will guide participants step-by-step through each tool based on particular data sets. Then participants will independently work on similar data sets aiming to replicate the demonstrated approaches and perfecting the visualizations. The session is suitable for participants who have not worked with the tools before. More advanced participants will benefit from feedback about their individual work. Both softwares should be pre-installed on participant's laptops. Tablet computers and netbooks are not suitable devices for the session. Install Tableau Public. Install Gephi.
In her upcoming talk, Maija Kāle will explore how the language we use to talk about food influences our perceptions of it through the lens of food computing, which uses computational methods to analyse food-related data. Maija will discuss the obstacles that the global community and the environment face in establishing sustainable food systems, and how food computing can provide a comprehensive infrastructure for food that benefits all stakeholders. Maija's talk will focus on the future of food computing and the challenges of creating tailored solutions for complex food systems. With high levels of misinformation and disinformation around food and nutrition, trust is a crucial element in finding solutions to health problems. Food computing can provide personalised solutions that can dispel myths and falsehoods or detect inaccuracies and fraud in food systems. Maija will outline her vision for the future of food computing to address these challenges and build more trustworthy food systems.
Maija Kāle is a Latvian researcher, recently received her PhD from the Faculty of Computer Science at the University of Latvia. Her main research interests are in the area of food computing, which includes the analysis of food systems, big data related to food, and the language used to describe food. Maija's work aims to improve the understanding of food-related data and its impact on society, both from an environmental and personal health perspective. In her day-to-day work, Maija works at the Nordic Council of Ministers office in Latvia as a sustainability and digitalisation advisor, working on topics such as the future of urban agriculture, the bio-economy and digital inclusion.
Up until recently, reading practices have been hard to map, especially if one is interested in investigating larger patterns. Robert Darnton once famously claimed: “Reading remains the most difficult stage to study in the circuit followed by books.” With the emergence of digital reading data, however, this is about to change. In this talk, I will show how data from the commercial book streaming platform Storytel can be used to track audiobook reading practices simultaneously at scale (~430,000 readers) and in detail (per hour). By investigating nearly 75 million logged sessions of book streaming during one year, I analyse how the current boom in streamed audiobooks are affecting our contemporary book and reading culture.
Karl Berglund is an assistant professor in literature at Uppsala University, Sweden. He has engaged mainly in computational methods for large-scale research on book and reading culture, and in merging such methods with perspectives derived from the longer traditions of sociology of literature and book history. Berglund has published three books in Swedish, and articles in e.g. PMLA (forthcoming) LOGOS, Cultural Analytics, and European Journal of Cultural Studies. His forthcoming book Reading Audio Readers: Book Consumption in the Streaming Age will be published by Bloomsbury in late 2023/early 2024.
More meaning or less? Fitting computational methods in humanities research goals
Many Digital Humanities studies rely on a limited number of established, widely used computational tools, such as Word Embeddings, Topic Modelling or Sentiment Analysis. While the statistical performance of these methods is reliably evaluated from computational perspective, it is often overlooked how the methods fit with the research goals in humanities or social sciences. This can lead to a difficulty in interpreting and evaluating the results. The keynote discusses and shows examples of finding a use for novel computational methods within a research workflow to overcome the limitations of generic computational tools.
Antti Kanner is a long-standing member of the team behind the Digital Humanities Hackathon in Helsinki and gives courses in digital methods in the University of Helsinki. His dissertation “Meaning in Distributions – A study on computational methods in lexical semantics” (2022) builds on experience from collaboration with computer scientists, historians, political scientists, and media scholars and examines what is the locus of linguistic semantic theories in common Digital Humanities workflows. He is currently involved in the projects FILTER which studies Finnic oral poetry with digital methods and RETOSTRA which studies rhetorical strategies in social media platforms.
This presentation will discuss using metadata of documents, such as title, genre, author, and publication date, to visualise and interpret literary transformation on the example of Polish Literary Bibliography. One of the major claims concerning Polish literary life is that after 1989 we witnessed its sudden de-centralisation and dispersion, followed, in the mid-1990s, by a subsequent re-centralisation of the capitalist book market. We may capture this evolution through basic statistics, i.e. the number of book publishers and the books they publish per year, or the number of literary journals. I will discuss different ways of mapping this process through basic statistics and network analysis of literary worlds, based on bibliographical relationships.
Dr Maciej Maryl is an assistant professor and founding Director of the Digital Humanities Centre at the Institute of Literary Research of the Polish Academy of Sciences. He is involved in European research infrastructures for digital humanities as a member of OPERAS Executive Assembly, SSH Open Cluster Governing Board, and co-chair of DARIAH Digital Methods and Practices Observatory WG. His interests include data science applied to cultural data, innovation in scholarly communication and meta-research on digital practices in the humanities.