Find out how to use cURL with Proxy
Andrei Ogiolan on Dec 05 2022
What is cURL?
In order to reach the scope of this article which is learning how to use cURL with a Proxy, we first need to introduce cURL. Client URL(cURL) is, for short, an easy-to-use command line designed for developers to fetch some data from a server.
How to use cURL?
As I already mentioned above, using cURL is pretty straightforward and can extract information with just a one-line command. Firstly you need to open a terminal and type curl followed by a website link, for example:
$ curl 'https://www.webscrapingapi.com/'
Congratulations, you made your first request using cURL. This simple command requests information from the server just like a traditional browser does and returns the HTML of the page. Not every website will give you back HTML, there are endpoints that send back data as a JSON object. Take this example:
$ curl 'https://jsonplaceholder.typicode.com/todos/3'
Type this command in your terminal and you should get back this response:
"title": "fugiat veniam minus",
Most APIs will give you back either HTML or JSON when you are running cURL commands against them. Well, this is not everything cURL can do for us. In reality, it is a very sophisticated tool that requires a lot of time to master. If you want to learn more about cURL, I strongly recommend you take a look at cURL documentation for a better understanding of its parameters. Alternatively, you can run the following command:
$ curl --help
This will show you some options you can set to cURL:
Usage: curl [options...] <url>
-d, --data <data> HTTP POST data
-f, --fail Fail silently (no output at all) on HTTP errors
-h, --help <category> Get help for commands
-i, --include Include protocol response headers in the output
-o, --output <file> Write to file instead of stdout
-O, --remote-name Write output to a file named as the remote file
-s, --silent Silent mode
-T, --upload-file <file> Transfer local FILE to destination
-u, --user <user:password> Server user and password
-A, --user-agent <name> Send User-Agent <name> to server
-v, --verbose Make the operation more talkative
-V, --version Show version number and quit
This is not the full help, this menu is stripped into categories.
Use "--help category" to get an overview of all categories.
For all options use the manual or "--help all".
As you can probably see these are not even all the options you can set to cURL, it is a menu divided into categories. You probably guessed that in order to get all the options you would like to run:
$ curl --help all
However, using cURL alone has some limitations regarding the number of servers we can choose to fetch data from. For example, some servers can use geolocalization and refuse to give us the data we are looking for because of our location. This is the moment we need a proxy, which acts like a middleman between us and the target server.
What is a proxy?
The concept of a proxy server is not hard to understand at all. As already mentioned above, a proxy server is like an intermediary between a client requesting a resource and the server providing that resource. Proxies are designated for us to be able to get data from anywhere. In order to understand better this concept, let's assume that we have a server called Bob that has some data we are interested in, but Bob provides that data only if we are in Europe, but we are in the United States.
How do we deal with that? We send our request to a proxy server which is located in Europe and not to Bob and tell the proxy that we want to get some data from Bob. The proxy will send the request to Bob and Bob will return to the proxy server data since the proxy is in Europe. Then the proxy server will send us back the data from Bob.
This is the main flow of how proxies work. Another great use case for a proxy is for example when we want to get data that contains prices in a specific currency in order to avoid confusion. For a further understanding of proxies, I strongly recommend you to have a look at Wikipedia.
In order to use a proxy, you will most likely need a host, a port, a user, a password, and a target URL you want to get data. For this example, I will use a proxy provided by WebScrapingAPI for making requests which you can find more information about it here. WebScrapingAPI is not a proxy provider, it is a web scraping service that provides proxies instead. In our examples, our setup will be the following:
- Proxy hostname: proxy.webscrapingapi.com
- Proxy port: 80
- Proxy username: webscrapingapi.proxy_type=datacenter.device=desktop
- Proxy password: <YOUR-API-KEY-HERE> // you can get one by registering here
- Target URL: http://httpbin.org/get
Please note that there may be some proxy providers which require other schema of authentication.
How to use cURL with a proxy?
Since we have covered cURL and proxies, now we are ready to combine them together and make requests by using a proxy which is a pretty straightforward process. We first need to authenticate and then we can use the proxy.
Proxy authentication in cURL
Proxy authentication in cURL is pretty simple and can be done for our example from above as follows:
$ curl -U webscrapingapi.proxy_type=datacenter.device=desktop:<YOUR-API-KEY> --proxy @proxy.webscrapingapi.com:80 http://httpbin.org/get
Running that command, httpbin will give us back our IP address, and some other properties:
"Accept-Encoding": "gzip, deflate, br",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.5173.0 Safari/537.36",
As you can already probably see, the origin you receive back is not your IP address and it is the address of the proxy server instead. Furthermore, you can run the command even without revealing your password in the terminal. This can be done as follows:
$ curl -U webscrapingapi.proxy_type=datacenter.device=desktop --proxy @proxy.webscrapingapi.com:80 http://httpbin.org/get
And then you will get a prompt to enter your password:
Enter proxy password for user 'webscrapingapi.proxy_type=datacenter.device=desktop':
Now you can type your API key there without exposing it in the terminal, making the whole process more secure. Furthermore, typing your credentials , host, and port every single time you want to run a cURL command via a proxy may not feel that ideal, especially when you want to run many commands via a proxy and you are using the same proxy provider.
Of course, you can store your credentials on a separate file stored on your machine and copy paste them every time, but there is a more natural approach you can take which is passing them via environment variables which we will talk about below.
Using cURL with a proxy via environment variables
An environment variable is like an object which stores an editable value in the memory which can be used by one or more software. In this particular case, we can pass to cURL a variable called http_proxy or https_proxy which contains our proxy details and we will not need to specify on every run of the command. You can do that by running this command:
$ export http_proxy="http://webscrapingapi.proxy_type=datacenter.device=desktop:<YOUR-API-KEY>@proxy.webscrapingapi.com:80"
Please note that you must call your variable http_proxy or https_proxyfor cURL to understand what you are talking about. That is it, now you do not need to pass your credentials on every run anymore and now you can just run cURL as simple as this:
$ curl http://httpbin.org/get
That will give us the following output:
"Accept-Encoding": "gzip, deflate, br",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
As you probably see the IP address is the proxy’s address which confirms that you have made a great job setting your proxy. At this point we can run any cURL command without specifying the proxy details, cURL will take care of that for us.
Disabling the proxy for a specific command
However, if you need to send a specific request without a proxy, you do not need to worry about deleting the value of http_proxy variable. Being a sophisticated tool with a lot of options, cURL can take care of that for us by its proxy parameter which tells it to not use any proxy when making the request. It can be done as follows:
$ curl --noproxy "*" http://httpbin.org/get
And that will give us back our IP address, not the proxies.
In conclusion, using cURL with a proxy is a great way to bypass geolocation filters, extends the number of resources we can fetch from webservers and is a good starting point for getting into topics such as web-scraping where we need to use proxies in order to be able to get certain data or to receive it in the format we want.I hope you found this article useful for you to learn how to use cURL with a proxy and you will play around with it and build your own scripts which extract data from servers that use geolocation filters.
News and updates
Stay up-to-date with the latest web scraping guides and news by subscribing to our newsletter.
Explore the complexities of scraping Amazon product data with our in-depth guide. From best practices and tools like Amazon Scraper API to legal considerations, learn how to navigate challenges, bypass CAPTCHAs, and efficiently extract valuable insights.
Learn what’s the best browser to bypass Cloudflare detection systems while web scraping with Selenium.