Find out how to use cURL with Proxy
Andrei Ogiolan on Dec 05 2022
What is cURL?
In order to reach the scope of this article which is learning how to use cURL with a Proxy, we first need to introduce cURL. Client URL(cURL) is, for short, an easy-to-use command line designed for developers to fetch some data from a server.
How to use cURL?
As I already mentioned above, using cURL is pretty straightforward and can extract information with just a one-line command. Firstly you need to open an terminal and type curl followed by an website link, for example:
$ curl 'https://www.webscrapingapi.com/'
Congratulations, you made your first request using cURL. This simple command requests information from the server just like a traditional browser does and returns the HTML of the page. Not every website will give you back HTML, there are endpoints which send back data as a JSON object. Take this example:
$ curl 'https://jsonplaceholder.typicode.com/todos/3'
Type this command in your terminal and you should get back this response:
"title": "fugiat veniam minus",
Most API’s will give you back either HTML or JSON when you are running cURL commands against them. Well this is not everything cURL can do for us. In reality it is a very sophisticated tool which requires a lot of time to master. If you want to learn more about cURL, I strongly recommend you to take a look at cURL documentation for a better understanding of its parameters. Alternatively you can run the following command:
$ curl --help
This will show you some options you can set to cURL:
Usage: curl [options...] <url>
-d, --data <data> HTTP POST data
-f, --fail Fail silently (no output at all) on HTTP errors
-h, --help <category> Get help for commands
-i, --include Include protocol response headers in the output
-o, --output <file> Write to file instead of stdout
-O, --remote-name Write output to a file named as the remote file
-s, --silent Silent mode
-T, --upload-file <file> Transfer local FILE to destination
-u, --user <user:password> Server user and password
-A, --user-agent <name> Send User-Agent <name> to server
-v, --verbose Make the operation more talkative
-V, --version Show version number and quit
This is not the full help, this menu is stripped into categories.
Use "--help category" to get an overview of all categories.
For all options use the manual or "--help all".
As you can probably see these are not even all the options you can set to cURL, it is a menu divided into categories. You probably guessed that in order to get all the options you would like to run:
$ curl --help all
However, using cURL alone has some limitations regarding the number of servers we can choose to fetch data from. For example, some servers can use geolocalization and refuse to give us the data we are looking for because of our location. This is the moment we need a proxy, which acts like a middleman between us and the target server.
What is a proxy?
The concept of a proxy server is not hard to understand at all. As already mentioned above , proxy server is like an intermediary between a client requesting a resource and the server providing that resource.Proxies are designated for us to be able to get data from anywhere. In order to understand better this concept , lets assume that we have a server called Bob which has some data we are interested in, but Bob provide that data only if we are in Europe, but we are in United States.
How we deal with that? We send our request to a proxy server which is located in Europe and not to Bob and tell the proxy that we want to get some data from Bob. The proxy will send itself the request to Bob and Bob will return to the proxy server data since the proxy is in Europe. Then the proxy server will send us back the data from Bob.
This is the main flow of how proxies work. Another great use case for a proxy is for example when we want to get data which contains prices in a specific currency in order to avoid confusion. For a further understanding of proxies, I strongly recommend you to have a look on Wikipedia.
In order to use a proxy , you will most likely need a host , a port , an user , a password and a target URL you want to get data from. For this example , I will use a proxy provided by WebScrapingAPI for making requests for which you can find more information about it here. WebScrapingAPI is not a proxy provider, it is a web scraping service which provides proxies instead .In our examples, our setup will be the following:
- Proxy hostname: proxy.webscrapingapi.com
- Proxy port: 80
- Proxy username: webscrapingapi.proxy_type=datacenter.device=desktop
- Proxy password: <YOUR-API-KEY-HERE> // you can get one by registering here
- Target URL: http://httpbin.org/get
Please note that there may be some proxy providers which require other schema of authentication.
How to use cURL with a proxy?
Since we have covered cURL and proxies , now we are ready to combine them together and make requests by using a proxy which is a pretty straightforward process.We first need to authenticate and then we can use the proxy.
Proxy authentication in cURL
Proxy authentication in cURL is pretty simple and can be done for our example from above as following :
$ curl -U webscrapingapi.proxy_type=datacenter.device=desktop:<YOUR-API-KEY> --proxy @proxy.webscrapingapi.com:80 http://httpbin.org/get
Running that command , httpbin will give us back our IP address , and some other properties:
"Accept-Encoding": "gzip, deflate, br",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.5173.0 Safari/537.36",
As you can already probably see, the origin you receive back is not your IP address and it is the address of the proxy server instead. Furthermore, you can run the command even without revealing your password in the terminal. This can be done as following:
$ curl -U webscrapingapi.proxy_type=datacenter.device=desktop --proxy @proxy.webscrapingapi.com:80 http://httpbin.org/get
And then you will get a prompt to enter your password:
Enter proxy password for user 'webscrapingapi.proxy_type=datacenter.device=desktop':
Now you can type your API key there without exposing it in the terminal which makes the whole process more secure.Furthermore, typing your credentials , host and port every single time you want to run a cURL command via a proxy may not feel that ideal , especially when you want to run many commands via a proxy and you are using the same proxy provider.
Of course, you can store your credentials on a separate file stored on your machine and copy paste them everytime, but there is a more natural approach you can take which is passing them via environment variables which we will talk about below.
Using cURL with a proxy via environment variables
An environment variable is like an object which stores an editable value in the memory which can be used by one or more softwares. In this particular case we can pass to cURL a variable called http_proxy or https_proxy which contains our proxy details and we will not need to specify on every run of the command. You can do that by running this command:
$ export http_proxy="http://webscrapingapi.proxy_type=datacenter.device=desktop:<YOUR-API-KEY>@proxy.webscrapingapi.com:80"
Please note that you have to call your variable http_proxy or https_proxy in order for cURL to understand what you are talking about. That is it, now you do not need to pass your credentials on every run anymore and now you can just run cURL as simple as this:
$ curl http://httpbin.org/get
That will gives us the following output:
"Accept-Encoding": "gzip, deflate, br",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
As you probably see the IP address is the proxy’s address which confirms that you have made a great job setting your proxy. At this point we can run any cURL command without specifying the proxy details, cURL will take care of that for us.
Disabling the proxy for a specific command
However, if you need to send a specific request without a proxy, you do not need to worry about deleting the value of http_proxy variable. Being a sophisticated tool with a lot of options , cURL can take care of that for us by its noproxy parameter which tells it to not use any proxy when making the request. It can be done as following:
$ curl --noproxy "*" http://httpbin.org/get
And that will give us back our own IP address and not the proxy’s one.
In conclusion, using cURL with a proxy is a great way to bypass geolocation filters, extends the amount of resources we can fetch from webservers and it is a good starting point for getting into topics such as web-scraping where we need to use proxies in order to be able to get certain data or to receive it in the format we want.I hope you found this article useful for you to learn how to use cURL with a proxy and you will play around with it and build your own scripts which extracts data from servers which use geolocation filters.