29 August 2025

Cat and Mouse: Challenges in Adversarial Web Scraping

This is the talk I gave at ElixirConf 2025, titled “Cat and Mouse: Challenges in Adversarial Web Scraping.”

Description:

The time comes in every developer’s career when they need to scrape a web page. If you’re lucky, a simple HTTP request gets you what you need, or maybe you have to spoof some browser headers. But if that’s not enough, what can you do?

And from the other side, as a site operator, how can you prevent your site from being scraped by any script kiddie who knows what a user agent is?

In this talk, we’ll explore the dark art of scraping the web from both perspectives: the bots, and the services that try to confound them. We’ll look at a number of techniques for detecting non-human traffic, and show how a respectful, ethical scraper might get around them. (Hint: You can’t use OTP’s built-in HTTP stack for this!) We’ll also look at the gold standard for bot detection, and test the limits for how sites can prevent automated access.