Insights & Engineering

Deep dives into web data infrastructure, extraction techniques, and the future of structured data at scale.

Latest Articles

7 Best SERP APIs in 2026: Pricing & Features Compare

TL;DR: There is no official Google SERP API, so third-party providers fill the gap. Pricing ranges from roughly $0.30 to $15 per thousand searches, and the right choice depends on your volume, budget, and the SERP features you need to extract. This guide compares the top providers side by side, breaks down true cost at scale, and gives you a decision framework to shortlist the best SERP API for your project.

Andrei Ogiolan17 min read
May 1, 2026

XPath vs CSS Selectors: Choosing the Right One

TL;DR: XPath and CSS selectors both locate DOM elements, but they solve different problems. CSS selectors are faster and more readable for straightforward selections. XPath wins when you need to traverse the DOM in any direction, match text content, or handle complex conditional logic. Most production projects benefit from using both strategically.

Mihai Maxim12 min read
May 1, 2026

How to Set Up Axios Proxy in Node.js: Auth, Rotation, SOCKS5

TL;DR: Axios routes requests through a proxy by accepting a proxy object with host, port, and optional auth fields. This guide covers how to set up Axios proxy configuration from scratch: basic wiring, authenticated proxies, HTTPS tunneling, a rotation system using interceptors, SOCKS5 via socks-proxy-agent, and diagnosing common errors. Every snippet is copy-pasteable Node.js code.

Suciu Dan10 min read
May 1, 2026

Puppeteer Download File: 4 Methods for Node.js

TL;DR: A Puppeteer download file workflow has four good shapes: click a button and let Chrome write to a folder you control, run fetch() inside the page and pipe base64 back to Node, drive the Chrome DevTools Protocol with download progress events, or skip the browser and pull the URL with Axios using cookies harvested from the Puppeteer session. Pick by file size, auth, and how the site exposes the link.

Mihnea-Octavian Manolache34 min read
May 2, 2026

How to Use a Proxy in Node-Fetch: A Practical Guide

TL;DR: Node-Fetch has no built-in proxy switch, so you wire an HTTP, HTTPS, or SOCKS5 agent into the request through its agent option. This guide walks through how to use a proxy in Node-Fetch end to end: authenticated HTTP and HTTPS proxies, SOCKS5, rotation, retries, TLS edge cases, troubleshooting, and the modern undici route for Node 18+ native fetch.

Mihnea-Octavian Manolache11 min read
May 1, 2026

Web Scraping JavaScript Tables in Python: From Hidden APIs to Playwright

TL;DR: Web scraping JavaScript tables in Python rarely needs a headless browser. Open DevTools, find the JSON endpoint that hydrates the grid, replay it with requests, paginate it, and fall back to Playwright only when the network call is signed, encrypted, or otherwise sealed shut.

Andrei Ogiolan11 min read
May 7, 2026