Find out how to use cURL in Python

Andrei Ogiolan on Dec 12 2022

What is cURL?

In order to reach the scope of this article which is learning how to use cURL in Python, we first need to introduce cURL. Client URL(cURL) is, for short, an easy-to-use command line designed for developers to fetch data from a server.

How to use cURL?

As I already mentioned above, using cURL is pretty straightforward and can extract information with just a one-line command. Firstly you need to open an terminal and type curl followed by an website link, for example:

$ curl 'https://www.webscrapingapi.com/'

Congratulations, you made your first request using cURL. This simple command requests information from the server just like a traditional browser does and returns the HTML of the page. Not every website will give you back HTML, there are endpoints which send back data as a JSON object. Take this example:

$ curl 'https://jsonplaceholder.typicode.com/todos/1'

Type this command in your terminal and you should get back this response:

{
  "userId": 1,
  "id": 1,
  "title": "delectus aut autem",
  "completed": false
}

Most API’s will give you back either HTML or JSON when you are running cURL commands against them. Well this is not everything cURL can do for us. In reality it is a very sophisticated tool. If you want to learn more about cURL, I strongly recommend you to take a look at cURL documentation for a better understanding of its parameters. Alternatively you can run the following command:

$ curl --help

This will show you some options you can set to cURL:

Usage: curl [options...] <url>
 -d, --data <data>   HTTP POST data
 -f, --fail          Fail silently (no output at all) on HTTP errors
 -h, --help <category>  Get help for commands
 -i, --include       Include protocol response headers in the output
 -o, --output <file>  Write to file instead of stdout
 -O, --remote-name   Write output to a file named as the remote file
 -s, --silent        Silent mode
 -T, --upload-file <file>  Transfer local FILE to destination
 -u, --user <user:password>  Server user and password
 -A, --user-agent <name>  Send User-Agent <name> to server
 -v, --verbose       Make the operation more talkative
 -V, --version       Show version number and quit

This is not the full help, this menu is stripped into categories.
Use "--help category" to get an overview of all categories.
For all options use the manual or "--help all".

As you can probably see these are not even all the options you can set to cURL, it is a menu divided into categories. You probably guessed that in order to get all the options you would like to run:

$ curl --help all

How to use cURL in Python?

For this step,there are 2 prerequisites. First one is pretty obvious, you should install Python on your machine. You can do that by navigating to Python’s official website and install the proper version for your operating system. Make sure to be a new version as older ones will likely not include pip, which is required for most packages we will want to use. Run the following command afterwards:

 $ pip --version

Upon successful installation, this should give you back the version of pip you have installed.

Otherwise, you will more likely get this message:

"pip" is not considered to be an external or internal command. A batch file is a program to operate.

It is very important for you to have pip installed in order to proceed as you will need it to install packages.

The second prerequisite is that you should be familiar with Python’s syntax or at least a beginner's level of experience with any other programming language.

Why use cURL in Python?

You are probably wondering: is it not enough just to use cURL from the command line? We are running just one line of command and the API gives us back the information.This is right, but in the real world we will want to process somehow data we are receiving back from the server and this is why we need a programming language.That is the moment where Python comes into play.

Why use Python?

Python is a high level programming language used for many purposes.It’s simple syntax and simplicity makes it very easy for beginners to pick it up. On top of that it has a huge community ready to help you, so in case you encounter any issues you should not hesitate to ask a question. One great place where you can jump in and ask a question is StackOverflow and someone will surely reach out to you.

How to integrate cURL in Python

Just like before cURL in Python is also very straightforward. Remember when we wrote one line command to get the data from the server. The difference is that now you need to write 2 lines of code for this simple call, for example:

import os
os.system(f'curl "https://www.webscrapingapi.com/product/"')

Now lets see the real advantage of using a programming language. We can build a function that can take a custom URL and run the curl command against it:

import os

def cURL(url):
   return os.system(f'curl "{url}"')

cURL('https://www.webscrapingapi.com/blog/')

You can replace https://www.webscrapingapi.com/blog/ with any other website you want to fetch data from. Congratulations, at this point you created a script which takes an url , runs the cURL command and displays the result in the console. You can run Python directly in your terminal, but for a better programming experience I strongly recommend you to use an Integrated Development Environment. There are many choices you can make, but for Python I recommend PyCharm which you can download from here

As I already mentioned above cURL is nowhere limited to that.It can do a huge amount of things apart from sending GET requests. It can also download files or send POST,PUT or DELETE requests.Here it is an example of a Python function which sends a POST request to https://httpbin.org/post:

import os

def cURL(method,url,data):
   return os.system(f'curl -X "{method}" --url "{url}" --data {data} ')

data = '{"foo":"bar"}'

cURL('POST', 'https://httpbin.org/post', data)

As you probably saw the command gets a little bit different when sending a POST request. In the previous request you did not have to use -X and –data parameters because cURL default method is GET. After running this command you should get back from the httpbin API a response containing your request, IP address, params and body you have submitted, which would be something like that.

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "{foo:bar}": ""
  }, 
  "headers": {
    "Accept": "*/*", 
    "Content-Length": "9", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "curl/X.XX.X", 
    "X-Amzn-Trace-Id": "Root=X-XXXXX-XXXXXX"
  }, 
  "json": null, 
  "origin": "0.0.0.0", 
  "url": "https://httpbin.org/post"
}

At this point you have already built a simple tool which can send GET, POST, PUT, PATCH, DELETE etc.. requests to fetch some data from an API. We can now just set the URL, method and body directly from our Python script instead of typing it manually. However, this is a minor benefit we have by using cURL in Python. The major benefit is that we can now process the data in the way we want , even if the API does not give us an option to get it in the way we would like. For instance, let’s say that we are given a list of all the users, but we want to split them into 2 groups and get the users from the second group.

We can do that thanks to Python by storing the response in a variable called users which we can convert in an array of JSON objects by using json.loads() method for which you can find more information about here. After that we can iterate through the users array and display only the users in the 2nd half of the list or with their id numbers greater than middle of the list. For a better understanding, this is how it translates in code:

import subprocess
import json

def cURL(url):
   return subprocess.check_output(['curl',url])

users = json.loads(cURL('https://jsonplaceholder.typicode.com/users'))

for user in users:
   if(user['id'] > len(users) / 2): print(user)

You probably noticed that instead of os, we are now using subprocess and json libraries. We are using the subprocess library because we want to be able to save the output of the command , unlike the os module which just runs the command without storing it. The code is a simple get request to an API which gives us back a list of users under the form of JSON objects.. The output should be:

{'id': 6, 'name': 'Mrs. Dennis Schulist', 'username': 'Leopoldo_Corkery', 'email': 'Karley_Dach@jasper.info', 'address': {'street': 'Norberto Crossing', 'suite': 'Apt. 950', 'city': 'South Christy', 'zipcode': '23505-1337', 'geo': {'lat': '-71.4197', 'lng': '71.7478'}}, 'phone': '1-477-935-8478 x6430', 'website': 'ola.org', 'company': {'name': 'Considine-Lockman', 'catchPhrase': 'Synchronised bottom-line interface', 'bs': 'e-enable innovative applications'}}

{'id': 7, 'name': 'Kurtis Weissnat', 'username': 'Elwyn.Skiles', 'email': 'Telly.Hoeger@billy.biz', 'address': {'street': 'Rex Trail', 'suite': 'Suite 280', 'city': 'Howemouth', 'zipcode': '58804-1099', 'geo': {'lat': '24.8918', 'lng': '21.8984'}}, 'phone': '210.067.6132', 'website': 'elvis.io', 'company': {'name': 'Johns Group', 'catchPhrase': 'Configurable multimedia task-force', 'bs': 'generate enterprise e-tailers'}}

{'id': 8, 'name': 'Nicholas Runolfsdottir V', 'username': 'Maxime_Nienow', 'email': 'Sherwood@rosamond.me', 'address': {'street': 'Ellsworth Summit', 'suite': 'Suite 729', 'city': 'Aliyaview', 'zipcode': '45169', 'geo': {'lat': '-14.3990', 'lng': '-120.7677'}}, 'phone': '586.493.6943 x140', 'website': 'jacynthe.com', 'company': {'name': 'Abernathy Group', 'catchPhrase': 'Implemented secondary concept', 'bs': 'e-enable extensible e-tailers'}}

{'id': 9, 'name': 'Glenna Reichert', 'username': 'Delphine', 'email': 'Chaim_McDermott@dana.io', 'address': {'street': 'Dayna Park', 'suite': 'Suite 449', 'city': 'Bartholomebury', 'zipcode': '76495-3109', 'geo': {'lat': '24.6463', 'lng': '-168.8889'}}, 'phone': '(775)976-6794 x41206', 'website': 'conrad.com', 'company': {'name': 'Yost and Sons', 'catchPhrase': 'Switchable contextually-based project', 'bs': 'aggregate real-time technologies'}}

{'id': 10, 'name': 'Clementina DuBuque', 'username': 'Moriah.Stanton', 'email': 'Rey.Padberg@karina.biz', 'address': {'street': 'Kattie Turnpike', 'suite': 'Suite 198', 'city': 'Lebsackbury', 'zipcode': '31428-2261', 'geo': {'lat': '-38.2386', 'lng': '57.2232'}}, 'phone': '024-648-3804', 'website': 'ambrose.net', 'company': {'name': 'Hoeger LLC', 'catchPhrase': 'Centralized empowering task-force', 'bs': 'target end-to-end models'}}

This is only one of the uses of cURL in Python. We do not depend on the API anymore to send data back in the way we want, we can process it in many different ways such as displaying whether a person is in a response list based on their name , email , phone number or any other unique property of a person.

For example let us say that we would like to check whether or not a name with a specific name exists into a list. For that we can create a function which can take the response which we converted into a JSON array ready for Python to parse.

import subprocess
import json

def cURL(url):
   return subprocess.check_output(['curl',url])

users = json.loads(cURL('https://jsonplaceholder.typicode.com/users'))

def check_if_user_exists(users,name):
   for user in users:
       if(user['name'] == name): print(f'An user called {name} exists in the list and has the id of {user["id"]}')

check_if_user_exists(users,'Clementina DuBuque')

This code block will then give us back this output:

An user called Clementina DuBuque exists in the list and has the id of 10

Congratulations. You made a script which fetch data from a server , store it and parse it afterwards. Advantages of using Python do not stop here, Python even has a special interface designed for using cURL which is called PycURL and we are going to discuss about it now.

What is PycURL?

As we mentioned before, PycURL is, for short, a Python tool which helps us to use cURL more naturally. One great advantage is that PycURL is heavily optimized and supports concurrency, meaning that it is very fast (faster than the popular Python request library). On the other hand, using PycURL is not as easy to use like what we have seen before, being a tool targeted for an advanced developer.However you should not feel intimidated by that because in the end you will gain a deeper understanding of networking and you will become more comfortable with Python.

How to install it?

Just like any other package, you can install it with pip:

$ pip install pycurl

You would also like to install certifi this time for security purposes when using pycurl. Certifi is a tool that validates SSL certificates while verifying the identity of TLS host.To learn more about certifi, I strongly suggest you to check their docs. You can install it in the same way:

$ pip install certifi

You can test now successful installation by running the following script:

import certifi
print(certifi.where())

The output should be the location where the package got installed:

/usr/local/lib/python3.10/site-packages/certifi/cacert.pem

How to use pycURL?

The difference from what we did before is that now we are only writing code without running commands directly. For short, we create an instance of pycurl.Curl() class , we fetch the data and write it in a buffer which we are later going to decode in order to be able to read the data you receive:

import pycurl
import certifi
from io import BytesIO

buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'https://docs.webscrapingapi.com/')
c.setopt(c.WRITEDATA, buffer)
c.setopt(c.CAINFO, certifi.where())
c.perform()
c.close()

body = buffer.getvalue()
print(body.decode('iso-8859-1'))

You guessed it, just like our previous examples, this fetch the HTML content of WebScrapingAPI webpage and print it in the command line.

A POST request is not much different, except that you need to tell to your instance of pycurl.Curl() class that you are going to use a post method, set your body and headers, if it is the case.This is how it looks like:

import pycurl
import certifi
import json
from io import BytesIO

buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'https://httpbin.org/post')
c.setopt(pycurl.HTTPHEADER, [ 'Content-Type: application/json' , 'Accept: application/json'])
data = json.dumps({"foo": "bar"})

c.setopt(pycurl.POST, 1)
c.setopt(pycurl.POSTFIELDS, data)
c.setopt(c.WRITEDATA, buffer)
c.setopt(c.CAINFO, certifi.where())

c.perform()
c.close()

body = buffer.getvalue()

print(body.decode('iso-8859-1'))

And we should get back, just like before a response response containing your request, IP address, params and body:

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "{foo:bar}": ""
  }, 
  "headers": {
    "Accept": "*/*", 
    "Content-Length": "9", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "curl/X.XX.X", 
    "X-Amzn-Trace-Id": "Root=X-XXXXX-XXXXXX"
  }, 
  "json": null, 
  "origin": "0.0.0.0", 
  "url": "https://httpbin.org/post"
}

As I already mentioned, there are just some basic things pycURL can do for us.It is a very complex and sophisticated tool which we can write many articles about. If you want to explore more things you can do with it, I strongly encourage you to check their docs.

Summary

In conclusion, using cURL in Python is very effective and saves a lot of time and on top of that can be a starting point for some interesting projects in topics such as data analysis or web scraping.The approach I recommend is first to get comfortable with cURL and Python and then move on and use pycURL. I hope you found this resource useful to learn how to use cURL in Python and you will play around with it and build some scripts.

News and updates

Stay up-to-date with the latest web scraping guides and news by subscribing to our newsletter.

We care about the protection of your data. Read our Privacy Policy.

Guides Scrapy Splash Tutorial: Mastering the Art of Scraping JavaScript-Rendered Websites with Scrapy and Splash

Learn how to scrape dynamic JavaScript-rendered websites using Scrapy and Splash. From installation to writing a spider, handling pagination, and managing Splash responses, this comprehensive guide offers step-by-step instructions for beginners and experts alike.

Ștefan Răcila

Aug 10 20236 min read

Use Cases Utilizing Web Scraping for Alternative Data in Finance: A Comprehensive Guide for Investors

Explore the transformative power of web scraping in the finance sector. From product data to sentiment analysis, this guide offers insights into the various types of web data available for investment decisions.