Skip to ContentSkip to Navigation
Jantina Tammes School of Digital Society, Technology and AI
Digital prosperity for all
Jantina Tammes School of Digital Society, Technology and AI Community JTS Themes AI and Language Calendar

Language and AI Colloquium I: 'Building and Evaluating Language Models: From Data to Benchmarks' by Bram Vanroy

When:Fr 11-10-2024 14:00 - 15:00
Where:Collaboratory A Harmonie Building 1313.0125

This initiative aims at bringing together all those interested or working on the intersections of natural language and Artificial Intelligence.

The Language and AI colloquia will take place from October 2024 to May 2025. Each colloquium will take place on Friday, it will start at 14:00 and will end at 15:00. For the interested people, it would be possible to arrange 1-on-1 meetings with the speakers.

Bram Vanroy

The speaker for the first Language and AI Colloquium is Bram Vanroy from KU Leuven with: 'Building and Evaluating Language Models: From Data to Benchmarks'.

'Large language models' (LLMs) and 'AI' are the buzzwords of the day, so much so that even "transformer" has made its way into the odd casual conversation. But what actually goes into these models? And how do we know if one model is truly better than the last?

This talk focuses on what happens before and after the training phase of LLMs: the craft of reliable dataset creation and assessing the model’s performance. We'll dive into the data pipelines responsible for creating high-quality datasets for the key stages of model development: pretraining (next-word prediction), supervised finetuning (chat/instruction), and preference tuning (alignment). We will discuss techniques such as web crawling, quality filtering, and synthetic data generation and scoring. Such data processing is gained more and more attention; after all, if we put garbage into a model, it will spit it back out. You’ll also learn how model performance is evaluated across a variety of benchmarks, from straightforward question-answering to assessments of "emotional intelligence" and crowd-sourced user evaluation.

Full calendar

11 October 2024

I: Building and Evaluating Language Models: From Data to Benchmarks
Bram Vanroy, KU Leuven
(Collaboratory room A - Faculty of Arts)

29 November 2024

II: Identification of clinical disease trajectories in neurodegenerative disorders with natural language processing
Inge Holtman, UMCG Groningen
(House of Connections - 1st floor)

24 January 2025

III: Shall AI Compare Thee to a Summer’s Day? An Exploration of Creative Mechanisms in Large Language Models
Tim van de Cruys, KU Leuven
(House of Connections - 1st floor)

28 March 2025

IV: Experiments on the intersection of texts and structured data: combining language technology and semantic web for digital humanities research
Marieke van Erp, KNAW Humanities Cluster
(Collaboratory room A - Faculty of Arts)

23 May 2025

V: Towards Evidence-Based Fact-Checking for Real-World Claims
Max Glockner, TU Darmstadt
(House of Connections - 1st floor)

Volg ons optwitter linkedin youtube