Language and AI Colloquium I: 'Building and Evaluating Language Models: From Data to Benchmarks' by Bram Vanroy
When: | Fr 11-10-2024 14:00 - 15:00 |
Where: | Collaboratory A Harmonie Building 1313.0125 |
This initiative aims at bringing together all those interested or working on the intersections of natural language and Artificial Intelligence.
The Language and AI colloquia will take place from October 2024 to May 2025. Each colloquium will take place on Friday, it will start at 14:00 and will end at 15:00. For the interested people, it would be possible to arrange 1-on-1 meetings with the speakers.
Bram Vanroy
The speaker for the first Language and AI Colloquium is Bram Vanroy from KU Leuven with: 'Building and Evaluating Language Models: From Data to Benchmarks'.
'Large language models' (LLMs) and 'AI' are the buzzwords of the day, so much so that even "transformer" has made its way into the odd casual conversation. But what actually goes into these models? And how do we know if one model is truly better than the last?
This talk focuses on what happens before and after the training phase of LLMs: the craft of reliable dataset creation and assessing the model’s performance. We'll dive into the data pipelines responsible for creating high-quality datasets for the key stages of model development: pretraining (next-word prediction), supervised finetuning (chat/instruction), and preference tuning (alignment). We will discuss techniques such as web crawling, quality filtering, and synthetic data generation and scoring. Such data processing is gained more and more attention; after all, if we put garbage into a model, it will spit it back out. You’ll also learn how model performance is evaluated across a variety of benchmarks, from straightforward question-answering to assessments of "emotional intelligence" and crowd-sourced user evaluation.
Full calendar
11 October 2024 |
I: Building and Evaluating Language Models: From Data to Benchmarks
|
29 November 2024 |
II: Identification of clinical disease trajectories in neurodegenerative disorders with natural language processing
|
24 January 2025 |
III: Shall AI Compare Thee to a Summer’s Day? An Exploration of Creative Mechanisms in Large Language Models
|
28 March 2025 |
IV: Experiments on the intersection of texts and structured data: combining language technology and semantic web for digital humanities research
|
23 May 2025 |
V: Towards Evidence-Based Fact-Checking for Real-World Claims
|