Back to Blog
Guides
Raluca PenciucFeb 23, 20239 min read

The Ultimate Guide to Web Scraping Walmart

The Ultimate Guide to Web Scraping Walmart

Prerequisites

Before we begin, let's make sure we have the necessary tools in place.

First, download and install Node.js from the official website, making sure to use the Long-Term Support (LTS) version. This will also automatically install Node Package Manager (NPM) which we will use to install further dependencies.

For this tutorial, we will be using Visual Studio Code as our Integrated Development Environment (IDE) but you can use any other IDE of your choice. Create a new folder for your project, open the terminal, and run the following command to set up a new Node.js project:

npm init -y

This will create a package.json file in your project directory, which will store information about your project and its dependencies.

Next, we need to install TypeScript and the type definitions for Node.js. TypeScript offers optional static typing which helps prevent errors in the code. To do this, run in the terminal:

npm install typescript @types/node --save-dev

You can verify the installation by running:

npx tsc --version

TypeScript uses a configuration file called tsconfig.json to store compiler options and other settings. To create this file in your project, run the following command:

npx tsc -init

Make sure that the value for “outDir” is set to “dist”. This way we will separate the TypeScript files from the compiled ones. You can find more information about this file and its properties in the official TypeScript documentation.

Now, create an “src” directory in your project, and a new “index.ts” file. Here is where we will keep the scraping code. To execute TypeScript code you have to compile it first, so to make sure that we don’t forget this extra step, we can use a custom-defined command.

Head over to the “package. json” file, and edit the “scripts” section like this:

"scripts": {

    "test": "npx tsc && node dist/index.js"

}

This way, when you will execute the script, you just have to type “npm run test” in your terminal.

Finally, to scrape the data from the website we will be using Puppeteer, a headless browser library for Node.js that allows you to control a web browser and interact with websites programmatically. To install it, run this command in the terminal:

npm install puppeteer

It is highly recommended when you want to ensure the completeness of your data, as many websites today contain dynamic-generated content. If you’re curious, you can check out before continuing the Puppeteer documentation to fully see what it’s capable of.

Locating the data

Now that you have your environment set up, we can start looking at extracting the data. For this article, I chose to scrape data from this product page: https://www.walmart.com/ip/Keter-Adirondack-Chair-Resin-Outdoor-Furniture-Teal/673656371.

We’re going to extract the following data:

  • the product name;
  • the product rating number;
  • the product reviews count;
  • the product price;
  • the product images;
  • the product details.

You can see all this information highlighted in the screenshot below:

Walmart product page for an outdoor Adirondack chair, with red boxes highlighting the image gallery, product title and price, and the description section

By opening the Developer Tools on each of these elements you will be able to notice the CSS selectors that we will use to locate the HTML elements. If you’re fairly new to how CSS selectors work, feel free to reach out to this beginner guide.

Data extraction

Before writing our script, let’s verify that the Puppeteer installation went all right:

import puppeteer from 'puppeteer';

async function scrapeWalmartData(walmart_url: string): Promise<void> {

    // Launch Puppeteer

    const browser = await puppeteer.launch({

        headless: false,

    	  args: ['--start-maximized'],

    	  defaultViewport: null

    })

    // Create a new page

    const page = await browser.newPage()

    // Navigate to the target URL

    await page.goto(walmart_url)

    // Close the browser

    await browser.close()

}

scrapeWalmartData("https://www.walmart.com/ip/Keter-Adirondack-Chair-Resin-Outdoor-Furniture-Teal/673656371")

Here we open a browser window, create a new page, navigate to our target URL, and then close the browser. For the sake of simplicity and visual debugging, I open the browser window maximized in non-headless mode.

Now, let’s take a look at the website’s structure:

Walmart product page alongside browser inspector, highlighting the product title, price, and corresponding HTML elements

To get the product name, we target the “itemprop” attribute of the “h1” element. The result we’re looking for is its text content.

// Extract product name

const product_name = await page.evaluate(() => {

    const name = document.querySelector('h1[itemprop="name"]')

    return name ? name.textContent : ''

})

console.log(product_name)

For the rating number, we identified as reliable the “span” elements whose class name ends with “rating-number”.

// Extract product rating number

const product_rating = await page.evaluate(() => {

    const rating = document.querySelector('span[class$="rating-number"]')

    return rating ? rating.textContent : ''

})

console.log(product_rating)

And finally (for the highlighted section), for the number of reviews and for the product price we rely on the “itemprop” attribute, just like above.

// Extract product reviews count

const product_reviews = await page.evaluate(() => {

    const reviews = document.querySelector('a[itemprop="ratingCount"]')

    return reviews ? reviews.textContent : ''

})

console.log(product_reviews)

// Extract product price

const product_price = await page.evaluate(() => {

    const price = document.querySelector('span[itemprop="price"]')

    return price ? price.textContent : ''

})

console.log(product_price)

Moving on to the product images, we navigate further in the HTML document:

Walmart product page with the image thumbnail carousel highlighted, and browser inspector showing the selected image element

Slightly more tricky, but not impossible. We cannot uniquely identify the images by themselves, so this time we will target their parent elements. Therefore, we extract the “div” elements that have the attribute “data-testid” set to “media-thumbnail”.

Then, we convert the result to a Javascript array, so we can map each element to its “src” attribute.

// Extract product images

const product_images = await page.evaluate(() => {

    const images = document.querySelectorAll('div[data-testid="media-thumbnail"] > img')

    const images_array = Array.from(images)

    return images ? images_array.map(a => a.getAttribute("src")) : []

})

console.log(product_images)

And last but not least, we scroll down the page to inspect the product details:

Walmart product details section highlighted, with browser inspector showing the HTML for the description and bullet list content

We apply the same logic from extracting the images, and this time we simply make use of the class name “dangerous-html”.

// Extract product details

const product_details = await page.evaluate(() => {

    const details = document.querySelectorAll('div.dangerous-html')

    const details_array = Array.from(details)

    return details ? details_array.map(d => d.textContent) : []

})

console.log(product_details)

The final result should look like this:

Keter Adirondack Chair, Resin Outdoor Furniture, Teal

(4.1)

269 reviews

Now $59.99

[

'https://i5.walmartimages.com/asr/51fc64d9-6f1f-46b7-9b41-8880763f6845.483f270a12a6f1cbc9db5a37ae7c86f0.jpeg?odnHeight=80&odnWidth=80&odnBg=FFFFFF',  'https://i5.walmartimages.com/asr/80977b5b-15c5-435e-a7d6-65f14b2ee9c9.d1deed7ca4216d8251b55aa45eb47a8f.jpeg?odnHeight=80&odnWidth=80&odnBg=FFFFFF',

'https://i5.walmartimages.com/asr/80c1f563-91a9-4bff-bda5-387de56bd8f5.5844e885d77ece99713d9b72b0f0d539.jpeg?odnHeight=80&odnWidth=80&odnBg=FFFFFF',  'https://i5.walmartimages.com/asr/fd73d8f2-7073-4650-86a3-4e809d09286e.b9b1277761dec07caf0e7354abb301fc.jpeg?odnHeight=80&odnWidth=80&odnBg=FFFFFF',

'https://i5.walmartimages.com/asr/103f1a31-fbc5-4ad6-9b9a-a298ff67f90f.dd3d0b75b3c42edc01d44bc9910d22d5.jpeg?odnHeight=80&odnWidth=80&odnBg=FFFFFF',  'https://i5.walmartimages.com/asr/120121cd-a80a-4586-9ffb-dfe386545332.a90f37e11f600f88128938be3c68dca5.jpeg?odnHeight=80&odnWidth=80&odnBg=FFFFFF',

'https://i5.walmartimages.com/asr/47b8397f-f011-4782-bbb7-44bfac6f3fcf.bb12c15a0146107aa2dcd4cefba48c38.jpeg?odnHeight=80&odnWidth=80&odnBg=FFFFFF'

]

[

  'The Keter Adirondack chair lets you experience the easy-living comfort of the popular chair but with none of the worries of wood. Combining traditional styling and the look and feel of wood with durable and maintenance-free materials, this chair will find',

  'Keter Adirondack Chair, Resin Outdoor Furniture, Gray:   Made from an all-weather resistant resin for ultimate durability  Weather-resistant polypropylene construction prevents fading, rusting, peeling, and denting - unlike real wood  Quick and easy assembly  Rotating cup holder  Classic comfort redefined  Ergonomic design  Durable and weather-resistant  Worry-free relaxation  Dimensions: 31.9" L x 31.5" W x 38" H  Seat height is 15.4 in. for a deep bucket seat and tall backrest  Chair Weighs 22 lbs. - heavy enough to not blow over in the wind, yet light enough to easily rearrange your patio space  350 lbs. capacity '

]

Bypass bot detection

While scraping Walmart may seem easy at first, the process can become more complex and challenging as you scale up your project. The retail website implements various techniques to detect and prevent automated traffic, so your scaled-up scraper starts getting blocked.

Walmart uses the “Press & Hold” model of CAPTCHA, offered by PerimeterX, which is known to be almost impossible to solve from your code. Besides this, the website also uses protections offered by Akamai and ThreatMetrix and collects multiple browser data to generate and associate you with a unique fingerprint.

Among the collected browser data we find:

  • properties from the Navigator object (deviceMemory, hardwareConcurrency, languages, platform, userAgent, webdriver, etc.)
  • canvas fingerprinting
  • timing and performance checks
  • plugin and voice enumeration
  • web workers
  • screen dimensions checks
  • and many more

One way to overcome these challenges and continue scraping at a large scale is to use a scraping API. These kinds of services provide a simple and reliable way to access data from websites like walmart.com, without the need to build and maintain your own scraper.

WebScrapingAPI is an example of such a product. Its proxy rotation mechanism avoids CAPTCHAs altogether, and its extended knowledge base makes it possible to randomize the browser data so it will look like a real user.

The setup is quick and easy. All you need to do is register an account, so you’ll receive your API key. It can be accessed from your dashboard, and it’s used to authenticate the requests you send.

WebScrapingAPI dashboard welcome screen showing a three-step quickstart guide with API key, API playground, and documentation links

As you have already set up your Node.js environment, we can make use of the corresponding SDK. Run the following command to add it to your project dependencies:

npm install webscrapingapi

Now all it’s left to do is to adjust the previous CSS selectors to the API. The powerful feature of extraction rules makes it possible to parse data without significant modifications.

import webScrapingApiClient from 'webscrapingapi';

const client = new webScrapingApiClient("YOUR_API_KEY");

async function exampleUsage() {

    const api_params = {

        'render_js': 1,

    	  'proxy_type': 'residential',

    	  'timeout': 60000,

    	  'extract_rules': JSON.stringify({

            name: {

                selector: 'h1[itemprop="name"]',

                output: 'text',

        	},

        	rating: {

                selector: 'span[class$="rating-number"]',

                output: 'text',

        	},

        	reviews: {

                selector: 'a[itemprop="ratingCount"]',

                output: 'text',

        	},

        	price: {

                selector: 'span[itemprop="price"]',

                output: 'text',

        	},

        	images: {

                selector: 'div[data-testid="media-thumbnail"] > img',

                output: '@src',

                all: '1'

        	},

        	details: {

                selector: 'div.dangerous-html',

                output: 'text',

                all: '1'

        	}

        })

    }

    const URL = "https://www.walmart.com/ip/Keter-Adirondack-Chair-Resin-Outdoor-Furniture-Teal/673656371"

    const response = await client.get(URL, api_params)

    if (response.success) {

        console.log(response.response.data)

    } else {

        console.log(response.error.response.data)

    }

}

exampleUsage();

Conclusion

This article has presented you with an overview of web scraping Walmart using TypeScript along with Puppeteer. We have discussed the process of setting up the necessary environment, identifying and extracting the data, and provided code snippets and examples to help guide you through the process.

The advantages of scraping Walmart's data include gaining valuable insights into consumer behavior, market trends, price monitoring, and much more.

Furthermore, opting for a professional scraping service can be a more efficient solution, as it ensures that the process is completely automated and handles the possible bot detection techniques encountered.

By harnessing the power of Walmart's data, you can drive your business to success and stay ahead of the competition. Remember to always respect the website's terms of service and not to scrape too aggressively in order to avoid being blocked.

About the Author
Raluca Penciuc, Full-Stack Developer @ WebScrapingAPI
Raluca PenciucFull-Stack Developer

Raluca Penciuc is a Full Stack Developer at WebScrapingAPI, building scrapers, improving evasions, and finding reliable ways to reduce detection across target websites.

Start Building

Ready to Scale Your Data Collection?

Join 2,000+ companies using WebScrapingAPI to extract web data at enterprise scale with zero infrastructure overhead.