Scraping WebPages in Node with Cheerio: How to Do It?
Under this particular section, you will understand how to scrape a web page with the help of Cheerio. But before you opt for this append method, you need to have permission for it. Otherwise, you might find yourself violating privacy, breaching copyright, or the terms of services.
You will learn how you scrape the ISO 3166-1 alpha-3 code for all the nations and various other jurisdictions. You will find the country data under the codes area of the ISO 3166-1 alpha-3 page. So now, let's get started!
Step 1: Make a Working Directory
Here, you have to make a director for the project by running the command "mkdir learn-cheerio" on the terminal area. This particular command will develop a directory, which is known as "learn-cheerio," and you're also free to provide it with a
In this step, you will make a manual for your assignment by executing a command on the terminal. The command will create a manual called learn-cheerio. You can provide it with a separate name if you wish.
You will certainly see a folder with the name "learn-cheerio" made after properly running the selected elements or the "mkdir learn-cheerio" command. After the directory is created and you can successfully load external resources, you need to open the director and a text editor to initialize the project.
Step 2: Initializing the Project
To make sure that Cheerio implements properly with this project, you have to navigate the project directory and then initialize it. You just need to open the directory through the text editor you like and then initialize it by running the "npm init -y" command. Once you complete this process, you can make a "package.json file" at the heart of the project directory.
Step 3 - Install the Dependencies
Here, in this section, you will install the project dependencies by running the "npm I Axios cheerio pretty."
When you use this command, it will take some time to load, so please be patient. Once you run the command successfully, you can register three dependencies within the package.json file right under the dependencies section.
The 1st dependency is known as "Axios," the 2nd one is "Cheerio," and the last one is "Pretty. Axios is a well-known HTTP client that functions in the browser and node. You will need it because Cheerio is viewed as a markup parser.
So, to make sure that Cheerio gets to parse the makeup and then scrape the data that you need, you have to use
To ensure Cheerio gets to parse the markup and then scrape the data you need, you must utilize Axios to obtain the markup from the site. You can use a different HTTP client to fetch the markup if you wish. It doesn't necessarily have to be Axios.
"Pretty," on the other hand, is an npm package to beautify the markup so that it's completely readable when it's printed on the terminal.
Step 4: Inspect the Website Page You Wish to Scrape
Right before you scrape the data from the webpage, you need to first have a good understanding of the HTML resulting data structure of the page. Under this section
Before you scrape data from a webpage, it's vital to understand the HTML structure of the page from where you will be scraping data. On Wikipedia, go to the ISO 3166-1 alpha-3 code. Beneath the "current code" section, you will find a list of nations and their codes.
Now, you just have to open the DevTools by clicking on the key combination of "CTRL + SHIFT + I. Otherwise, you can right-click and then choose the "Inspect" option. Here is an image that shows how the "list" appears on the DevTools
Step 5: Write the Code to Scrape Out the Data
Now, you need to write down the code to scrape the data. To begin the work, you must run the "touch app.js" to assemble the app.js file. If you run this command successfully, you can create the app.js file within the project directory without any error.
Just like all the other Node packages, you have to get pretty, Cheerio, and anxious before you begin utilizing them. To do so, you need to add the following code:
const axios = require ["axios"]
const Cheerio = require ["cheerio"]
const pretty = require ["pretty"]
Make sure to provide these codes right at the top of the app.js file. Be sure to have good knowledge of cheerio right before you scrape out the data. You can parse-up the markup by manipulating the resulting data structure. Doing so will help you learn about the cheerio syntax and also the common process. Here is the markup of the UL element that contains the LI elements:
const URL markup = `
<ul class ="fruits">
<li class="frutis__mango"> Mango </li>
<li class="fruits__apple"> Apple </li>
</ul>
You can easily add this particular variable command to the app.js file.