SERG Seminar

Speaker: Jonathan Katzy, Andreea Costea
When: March 19, 2025, 13:45 - 14:45
Where: Social Data Lab, B28

In this edition of our weekly SERG seminar we will have two speakers:

Jonathan Katzy, will present “The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models”. The recent rise in the popularity of large language models has spurred the development of extensive code datasets needed to train them. This has left limited code available for collection and use in the downstream investigation of specific behaviors, or evaluation of large language models without suffering from data contamination. To address this problem, we release The Heap, a large multilingual dataset covering 57 programming languages that has been deduplicated with respect to other open datasets of code, enabling researchers to conduct fair evaluations of large language models without significant data cleaning overhead.

Andreea Costea, who recently joined TU Delft as an Assistant Professor in the Programming Languages group, focusing on program verification, automated programming, and program repair. Her technical interests revolve around what makes software trustworthy, while at the same time exploring ways to make education and work more fun.

Abstract:
With the rise of LLMs in automatic programming, there is growing interest in ensuring the trustworthiness of their output. However, guaranteeing correctness is challenging, especially since detailed specifications of intended behavior are often missing. This talk explores how to bridge that gap by aligning LLM-generated code with automatically derived formal specifications (from natural language via LLMs) and tests. While this approach doesn’t offer absolute guarantees, it enhances trust by establishing conformance between programs, specifications, and tests—helping to uncover the likely intended behavior.

If you are interested to give a 15 min + 10 talk or host a 25 min discussion session in one of our next meetings, please contact Carolin Brandt via Mattermost or email.