Insights & Engineering

Deep dives into web data infrastructure, extraction techniques, and the future of structured data at scale.

Latest Articles

Python Extract Text From HTML

TL;DR: To Python extract text from HTML, parse the markup with a real parser (BeautifulSoup, lxml.html, or html-text), strip scripts, styles, and site chrome, then normalize whitespace and Unicode before saving. This guide compares the main libraries, fixes the common cleanup traps, and ends with a runnable crawler that writes JSONL plus per-page .txt files.

Mihai Maxim22 min read
May 12, 2026

Web Scraping with Scrapy: 2026 Playbook

TL;DR: This is an opinionated, end-to-end guide to web scraping with Scrapy in 2026. You will install Scrapy, prototype selectors in the shell, build a multi-page e-commerce spider, clean items with Item Loaders, persist to a database, harden settings against bans, and bolt on Scrapy-Playwright for JavaScript-rendered pages.

Mihai Maxim15 min read
May 13, 2026

How to Execute Java Script With Scrapy

Are you having trouble scraping dynamic websites with Scrapy? In this article, we will explore several solutions for handling javascript rendering. Learn how to use plugins like Splash and Selenium to take your Scrapy project to the next level.

Mihai Maxim5 min read
Apr 22, 2026

Axios Set Headers in 2026: The Developer Playbook

TL;DR: Axios set headers across five layers, per-request config, global defaults, axios.create() instances, request and response interceptors, and the response itself. This guide walks each layer with runnable v1 snippets, then fixes the four bugs that bite everyone: multipart boundaries, CORS cookies, self-signed certs, and header casing.

Mihnea-Octavian Manolache15 min read
May 12, 2026

Best Rotating Residential Proxies In 2026 For Web Scraping

TL;DR: The best rotating residential proxies in 2026 are not the ones with the biggest billboard pool size. They are the ones whose session control, geo-targeting, ethical sourcing, and per-GB economics actually match the targets you scrape. This guide gives you a vendor-neutral evaluation framework, a comparison table of 12 providers, and a use-case map so you can shortlist two or three before you ever touch a credit card.

Anda Miuțescu35 min read
May 14, 2026