The Stack Overflow Podcast   /     How do you evaluate an LLM? Try an LLM.

Summary

On this episode: Stack Overflow senior data scientist Michael Geden tells Ryan and Ben about how data scientists evaluate large language models (LLMs) and their output. They cover the challenges involved in evaluating LLMs, how LLMs are being used to evaluate other LLMs, the importance of data validating, the need for human raters, and more needs and tradeoffs involved in selecting and fine-tuning LLMs.

Subtitle
On this episode: Stack Overflow senior data scientist Michael Geden tells Ryan and Ben about how data scientists evaluate large language models (LLMs) and their output. They cover the challenges involved in evaluating LLMs, how LLMs are being used to ev
Duration
00:32:55
Publishing date
2024-04-16 04:20
Link
https://stackoverflow.blog/podcast/
Contributors
  Ben Popper, Ryan Donovan, Michael Geden
author  
Enclosures
https://chrt.fm/track/G8F1AF/injector.simplecastaudio.com/6fa1d34c-502b-4abf-bd82-483804006e0b/episodes/5fdcebc8-e177-4d07-a7b1-1d1ec35e7d92/audio/128/default.mp3?aid=rss_feed&awCollectionId=6fa1d34c-502b-4abf-bd82-483804006e0b&awEpisodeId=5fdcebc8-e177
audio/mpeg

Shownotes

Connect with Michael on LinkedIn

Shoutout to user1083266, who earned a Stellar Question badge with How to store image in SQLite database.