Insights & Engineering

Deep dives into web data infrastructure, extraction techniques, and the future of structured data at scale.

All Guides Science of Web Scraping Use Cases Engineering Other

Guides

Scraping with Cheerio: How to Easily Collect Data from Web Pages

With Cheerio you can start collecting the data in a matter of minutes. No hassle, no learning curve required.

Raluca Penciuc7 min readApr 27, 2026

Read Article

How to Scrape Redfin: Python Guide to Property Data

TL;DR: Redfin exposes hidden API endpoints that return structured JSON for property listings, making it possible to skip fragile HTML parsing entirely. This guide walks you through building a Python scraper that extracts rental and sale data, searches by location, monitors new listings via XML sitemaps, and exports clean results to CSV or JSON.

Suciu Dan11 min read

Apr 27, 2026

Guides

XPath Web Scraping: A Hands-On Guide with Python Examples

TL;DR: XPath is a query language for navigating HTML/XML trees by path, attribute, or text content. This guide covers XPath syntax, axes, and functions, then shows working Python scrapers with lxml and Selenium. You will also get a consolidated cheat sheet and a troubleshooting section for the most common XPath mistakes.

Suciu Dan9 min read

Apr 29, 2026

Science of Web Scraping

HTTP Response Headers in cURL: Every Flag, Technique, and Scripting Recipe

TL;DR: cURL hides response headers by default. Use -i to see headers alongside the body, -I for a HEAD request that returns headers only, -v for full request/response debugging, and -D to save headers to a file. For modern scripting, cURL 7.83+ lets you extract individual headers or dump all of them as JSON with the -w write-out option.

Suciu Dan11 min read

Apr 29, 2026

Science of Web Scraping

What Is a Headless Browser? Architecture, Use Cases, and Top Tools

TL;DR: A headless browser is a web browser that runs without a visible graphical interface, controlled entirely through code or command-line instructions. Developers use headless browsers for automated testing, web scraping, performance monitoring, and increasingly to power AI agents. This guide covers how they work internally, when to choose one over a regular browser, and which frameworks are worth your time.

Suciu Dan12 min read

Apr 29, 2026

Guides

Scrapy-Playwright Tutorial: Scrape JS-Heavy Sites

TL;DR: Scrapy-Playwright lets you render JavaScript-heavy pages directly inside Scrapy spiders by controlling real Chromium, Firefox, or WebKit browsers through Playwright. This tutorial walks you through installation, configuration, page interactions, AJAX interception, anti-detection, and a production-ready project structure so you can scrape dynamic sites without leaving the Scrapy ecosystem.

Raluca Penciuc17 min read

Apr 28, 2026

Guides

How to Scrape Expedia with Python: Hotels, Prices & Ratings (2026 Guide)

Scrape Expedia hotel listings with Python using JS rendering, proxies, CSS selectors, and pagination, then clean and export data to CSV.

Mihai Maxim11 min read

Apr 27, 2026

2 326 27 28