First things first, let’s create a new directory that will hold our files. Next, open the project in your favorite IDE (mine is Visual Studio Code) and open a new terminal. To open a new terminal from within VSCode, go to Terminal > New terminal. We’ll create a new virtual environment inside the project and activate it:
~ » python3 -m venv env && source env/bin/activate
In your project, let’s create a new ‘scraper.py’ file and add some code to it. The basic structure of a scraper with Selenium, from a functional programming perspective is:
from selenium import webdriver
def scrape_page(url):
driver = webdriver.Chrome()
driver.get(url)
return driver.page_source
And that is it. In 5 lines of code:
- We’re firing up an automated browser
- We’re accessing our target
- And we’re collecting its resources.
But remember we want to use ISP proxies with selenium. Such that our browser is not the stealthiest, but let’s say more undetectable. Luckily, things are quite simple in Python (and that’s why I love it). Here is how we introduce proxies in Selenium:
from selenium import webdriver
def scrape_page(url, proxy):
options = webdriver.ChromeOptions()
options.add_argument('--proxy-server=%s' % proxy)
driver = webdriver.Chrome(options=options)
driver.get(url)
return driver.page_source
print(scrape_page('http://httpbin.org/ip', '75.89.101.60:80'))
We only added more lines inside the function. The last one is to call the function. If you run the script now, we should probably be able to see that the request originates from 75.89.101.60. For the purpose of this example, I’ve used a free proxy server from here. But if you want to build a real scraper, I suggest you look into more reliable sources. Preferable, proxy providers that offer ISP proxies as well.