Insights & Engineering

Deep dives into web data infrastructure, extraction techniques, and the future of structured data at scale.

Latest Articles

How to Scrape Redfin: Python Guide to Property Data

TL;DR: Redfin exposes hidden API endpoints that return structured JSON for property listings, making it possible to skip fragile HTML parsing entirely. This guide walks you through building a Python scraper that extracts rental and sale data, searches by location, monitors new listings via XML sitemaps, and exports clean results to CSV or JSON.

Suciu Dan11 min read
Apr 27, 2026

XPath Web Scraping: A Hands-On Guide with Python Examples

TL;DR: XPath is a query language for navigating HTML/XML trees by path, attribute, or text content. This guide covers XPath syntax, axes, and functions, then shows working Python scrapers with lxml and Selenium. You will also get a consolidated cheat sheet and a troubleshooting section for the most common XPath mistakes.

Suciu Dan9 min read
Apr 29, 2026

What Is a Headless Browser? Architecture, Use Cases, and Top Tools

TL;DR: A headless browser is a web browser that runs without a visible graphical interface, controlled entirely through code or command-line instructions. Developers use headless browsers for automated testing, web scraping, performance monitoring, and increasingly to power AI agents. This guide covers how they work internally, when to choose one over a regular browser, and which frameworks are worth your time.

Suciu Dan12 min read
Apr 29, 2026

Scrapy-Playwright Tutorial: Scrape JS-Heavy Sites

TL;DR: Scrapy-Playwright lets you render JavaScript-heavy pages directly inside Scrapy spiders by controlling real Chromium, Firefox, or WebKit browsers through Playwright. This tutorial walks you through installation, configuration, page interactions, AJAX interception, anti-detection, and a production-ready project structure so you can scrape dynamic sites without leaving the Scrapy ecosystem.

Raluca Penciuc17 min read
Apr 28, 2026