Proxies, at their core, are meant to mask your real IP. Additionally, they're crucial for accessing geo-restricted content since websites think that your requests come from different regions. The best-known example of this are streaming sites. The Office, for example, is available on Netflix only in the UK and Ireland, but if you get a UK proxy, you can watch the show from anywhere.
In web scraping, the proxy pool is one of the most critical components. They're used to make it look like the bot's requests come from different locations and at different times. This is your first line of defense from IP blocks. Additionally, if it fails, your real IP won't be barred from accessing the website.
Web scrapers, and bots in general, can send tons of requests very quickly. That's what makes them so desirable for data gathering. But, this speed is often their downfall. Websites can determine if the requests are coming from a legitimate person or a bot by their behavior. For example, a human won't ever request 25 pages in less than a second.
Add a proxy in the middle, or better yet, a whole proxy pool, and suddenly you have more options. By distributing the requests to a handful of proxies, you're changing what the website sees. Namely, instead of one IP sending 100 requests at once, it's now 10 IPs sending 10 requests each. Ideally, you send each request through a different proxy.
You may think that constantly switching IPs is a huge chore, and you'd be right. That's why proxy service providers came up with proxy rotation - an automated system that changes IPs for you. Rotating proxies is the best way of making sure you're using all your IPs to their full potential.
In fact, widespread scraping is nearly impossible without a large, rotating proxy pool. Pacing yourself matters, and paying attention to how many requests you are allowed to submit in a certain amount of time can make or break your progress. The broader your proxy pool is, the more requests you can send without drawing suspicion. The result is clear - you're much less likely to be blocked.