Insights & Engineering

Deep dives into web data infrastructure, extraction techniques, and the future of structured data at scale.

All Guides Science of Web Scraping Use Cases Engineering Other

Web Scraping with Node-Unblocker: A Practical Guide

TL;DR: Node-unblocker turns an Express app into a URL-prefix HTTP proxy you can hack on. This web scraping node unblocker guide walks through installing it, wiring up request and response middlewares, rotating instances, deploying on Docker or Heroku, and recognizing the point where a managed scraping API is the saner answer.

Sorin-Gabriel Marica11 min read

May 1, 2026

Guides

Web Scraping in Ruby: The Ultimate Tutorial

What do you get when you take Ruby, a bunch of useful gems and a few hours? The answer - a pretty good web scraper. Here's a step-by-step guide:

Raluca Penciuc9 min read

Apr 22, 2026

Guides

Web Scraping with PHP: Libraries, Code & Best Practices (2026)

TL;DR: PHP is a perfectly capable language for web scraping, thanks to built-in extensions like cURL and DOMDocument, plus a rich Composer ecosystem that includes Guzzle, Symfony DomCrawler, and Symfony Panther for headless browsing. This guide walks you through the full workflow: fetching pages, parsing HTML, storing results in CSV/JSON/MySQL, handling errors, and avoiding blocks.

Sorin-Gabriel Marica16 min read

Apr 28, 2026

Science of Web Scraping

What Are Rotating Proxies? Guide to IP Rotation for Web Scraping

TL;DR: So what are rotating proxies, in one line? Proxy servers that assign a different IP to each request from a managed pool, which is how scrapers slip past per-IP rate limits, CAPTCHAs, and geo-filters. This guide covers how rotation works, the four pool types, setup code in three languages, and how to pick a provider.

Raluca Penciuc10 min read

May 13, 2026

Guides

XPath Cheat Sheet for Web Scraping: Syntax, Axes, and Real Code

TL;DR: This XPath cheat sheet covers the syntax, predicates, axes, and functions you actually need for web scraping, plus a CSS-to-XPath translation table and runnable Puppeteer and Scrapy examples. Use it as a desk reference next time a CSS selector quietly breaks on a site you depend on.

Mihai Maxim13 min read

May 13, 2026

Science of Web Scraping

How to Build a Python Web Crawler: From Start to Scale

TL;DR: A python web crawler automates the tedious work of following links across a website to discover and collect content. This guide walks you through building one from scratch with requests and BeautifulSoup, then graduating to Scrapy for concurrent crawling, item pipelines, and structured data exports. You will also learn how to crawl responsibly, rotate proxies to avoid blocks, and handle JavaScript-rendered pages.

Suciu Dan27 min read

Apr 30, 2026

1 2 3111326 27 28