Back to Blog
Guides
Raluca PenciucLast updated on May 8, 202613 min read

Web Scraping Booking.com: Hotels, Prices, and Reviews (2026 Guide)

Web Scraping Booking.com: Hotels, Prices, and Reviews (2026 Guide)
TL;DR: This guide walks through web scraping Booking.com end to end in Python: pulling search listings, hotel pages, nightly prices, and guest reviews. You get two complementary methods: a Selenium Wire workflow for JS-rendered pages and a faster path that calls Booking.com's internal /dml/graphql endpoint directly, plus an anti-block playbook, currency handling, and a workaround for the roughly 1,000-result paging cap.

Booking.com is the kind of dataset travel and hospitality teams keep coming back to: live nightly rates, competitor positioning, supply by neighborhood, guest sentiment by property. The catch is that none of it is exposed through an open API for the general public, so if you want it programmatically you end up doing some form of web scraping Booking.com yourself. This tutorial shows two practical Python paths and ties them together with the production concerns that usually bite people on the second week.

At the time of writing, Booking.com is one of the largest accommodation platforms on the web, with millions of bookable properties across hotels, resorts, and short stays. (We'll keep specific listing counts approximate; the company's public numbers move around.) The platform is heavily JavaScript-driven and ships real anti-bot defenses, so naive requests.get scripts tend to fail before they get useful.

You'll see how to run a Selenium-based scraper for search results, how to reverse-engineer the same data out of the internal GraphQL endpoint, how to pull hotel detail pages, prices, and reviews, and how to scale past the result cap with sitemaps and query partitioning. Code is Python 3.10+ and assumes you're comfortable with DevTools and CSS selectors.

Why web scraping Booking.com is worth the effort

There are a handful of use cases where web scraping Booking.com pays for itself almost immediately. Rate intelligence teams compare nightly prices across competitor hotels in real time. Revenue managers track availability and discount patterns to time their own promotions. Market research and travel analytics teams use review volume, scores, and amenity coverage to benchmark a destination. And anyone building a metasearch or AI travel agent needs structured property data that the public site only renders inside JavaScript.

Across this guide we'll pull five concrete entity types: search-result listings (hotel cards on a query page), hotel detail pages (description, address, amenities, geolocation), per-night pricing and availability, guest reviews, and sitemap-based hotel inventory for bulk discovery. Each has its own quirks, and mixing them is what gives you a real dataset rather than a single screenshot of a SERP.

Picking a scraping approach: browser automation vs hidden API

There are two reasonable ways to do web scraping Booking.com at any kind of volume, and they're complementary rather than competing.

Selenium with Selenium Wire drives a real Chrome instance, executes the page's JavaScript, and lets you read the rendered DOM. It is the lowest-friction option when you don't yet know the page's hidden requests, and it tolerates layout drift well because you query the same DOM a user sees. The price is speed and resource usage: each page is a full browser tab. For curated lists of a few thousand hotels, that is fine. For continuous monitoring it gets expensive.

Calling the internal /dml/graphql endpoint with httpx skips the browser entirely. Booking.com's own front end fetches search results from this endpoint, so once you mirror the request shape you get the same JSON the site does, ten to fifty times faster than Selenium and with a tiny memory footprint. The trade-off is fragility: payloads and required headers change, and you must keep them in sync.

A solid default: prototype with Selenium, lock in the GraphQL request once you understand the data, and use the API path for production.

Setting up your Python environment

Use Python 3.10 or newer in a fresh virtualenv so the dependencies stay isolated:

mkdir booking_scraper && cd booking_scraper
python -m venv .venv && source .venv/bin/activate
pip install selenium selenium-wire webdriver-manager httpx parsel
touch app.py

selenium-wire is a drop-in replacement for selenium that exposes the underlying network requests, which we'll need for pagination synchronization. webdriver-manager auto-downloads the matching chromedriver binary, so you don't have to babysit driver versions across machines. httpx gives us an HTTP/2-capable client for Method 2, and parsel provides Scrapy-style CSS and XPath selectors for parsing hotel HTML. Our step-by-step Selenium tutorial is a useful warm-up if you've never used Selenium for scraping before.)

Method 1: Scraping search results with Selenium and Selenium Wire

This is the friendliest entry point into web scraping Booking.com: open a search URL in a real Chrome session, let JavaScript render the property cards, and walk the DOM. We use Selenium Wire rather than vanilla Selenium because the search page loads results through background XHR/fetch calls. Selenium Wire lets us inspect those individual requests and wait until a specific response actually returns, which matters for paginating without race conditions.

Loading the search page and isolating property cards

Always include explicit check-in and check-out dates in the URL. Without them, Booking.com falls back to default availability and your price column will not line up with what a user would see for a real booking window.

from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
url = ('https://www.booking.com/searchresults.html'
       '?ss=London&checkin=2026-05-10&checkout=2026-05-12&group_adults=2')
driver.get(url)

cards = driver.find_elements(By.CSS_SELECTOR, "div[data-testid='property-card']")
print(f'Found {len(cards)} property cards on page 1')

Booking.com is fairly consistent about using data-testid attributes on its result cards, which makes them more stable to target than class names that get auto-generated.

Extracting name, address, score, review count, price, and image

Each property card carries the same handful of data-testid hooks, so the per-card parser is mostly a small dictionary of selectors. CSS selectors are usually the right call here (concise and fast), but XPath is fine when you need a parent or sibling traversal. See our XPath vs CSS selectors guide if you're picking sides.

def parse_card(card):
    def text(sel):
        nodes = card.find_elements(By.CSS_SELECTOR, sel)
        return nodes[0].text.strip() if nodes else None

    def attr(sel, name):
        nodes = card.find_elements(By.CSS_SELECTOR, sel)
        return nodes[0].get_attribute(name) if nodes else None

    score_block = text("div[data-testid='review-score']") or ''
    score_lines = [s.strip() for s in score_block.split('\n') if s.strip()]
    score = score_lines[0] if score_lines else None
    review_count = next((l for l in score_lines if 'review' in l.lower()), None)

    return {
        'name':         text("div[data-testid='title']"),
        'url':          attr("a[data-testid='title-link']", 'href'),
        'address':      text("span[data-testid='address']"),
        'score':        score,
        'review_count': review_count,
        'price':        text("span[data-testid='price-and-discounted-price']"),
        'image':        attr("img[data-testid='image']", 'src'),
    }

listings = [parse_card(c) for c in cards]

Two things to flag about prices. First, the review-score block on Booking.com squashes the numeric score and the review-count text into one element, so we split it into lines and pick them out separately. Second, the price you scrape from a search card almost always excludes taxes and fees; the all-in total only appears once you progress further into the booking flow. Treat it as the headline rate, not the final charge, and document that downstream.

Clicking through pagination without race conditions

Each click on the next-page control fires a POST to /dml/graphql and waits for JSON to come back. If you click and immediately scrape the DOM, you read the previous page. Selenium Wire fixes this by letting you block on the actual response.

from selenium.webdriver.common.by import By

def total_pages(driver):
    nums = driver.find_elements(By.CSS_SELECTOR, "div[data-testid='pagination'] li")
    return max((int(n.text) for n in nums if n.text.isdigit()), default=1)

pages = total_pages(driver)
all_listings = [parse_card(c) for c in cards]

for page in range(2, pages + 1):
    del driver.requests  # clear so the next wait does not match an old response
    next_btn = driver.find_element(
        By.CSS_SELECTOR, "button[aria-label='Next page']")
    next_btn.click()
    driver.wait_for_request(r'/dml/graphql', timeout=10)
    cards = driver.find_elements(
        By.CSS_SELECTOR, "div[data-testid='property-card']")
    all_listings.extend(parse_card(c) for c in cards)

del driver.requests is the important line. Without it, wait_for_request happily matches the previous page's GraphQL call and you advance before the new data arrives. Pull the total page count from the pagination control rather than hard-coding it; busy queries can paginate twenty pages deep, quiet ones two.

Method 2: Calling Booking.com's GraphQL search endpoint directly

Once Selenium has shown you that the search page is fed by /dml/graphql, the faster move is to call that endpoint yourself and skip the browser. This is where web scraping Booking.com becomes genuinely scalable.

The discovery process is the same one you'd use for any [hidden JavaScript API]: open DevTools (F12), switch to the Network tab, filter by Fetch/XHR, then trigger a real search and click into page two. You'll see a POST to /dml/graphql carrying a JSON body with an operationName, a variables object (with the destination, dates, guest count, and an offset), and a query or extensions field that pins the query hash. Right-click the request and choose Copy as cURL, and that's your starting point.

Re-verify the exact field names against your own DevTools capture before shipping; Booking.com renames GraphQL operations periodically, and the safest reference is whatever the front end is sending today.

import httpx

ENDPOINT = 'https://www.booking.com/dml/graphql'
HEADERS = {
    'content-type':    'application/json',
    'origin':          'https://www.booking.com',
    'referer':         'https://www.booking.com/searchresults.html',
    'user-agent':      'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                       'AppleWebKit/537.36 (KHTML, like Gecko) '
                       'Chrome/124.0 Safari/537.36',
    'accept-language': 'en-US,en;q=0.9',
}

def search_page(client, payload, offset):
    body = {**payload}
    body['variables']['input']['pagination'] = {'offset': offset, 'rowsPerPage': 25}
    r = client.post(ENDPOINT, json=body, headers=HEADERS, timeout=30)
    r.raise_for_status()
    return r.json()

def search_all(payload, max_results=1000):
    results = []
    with httpx.Client(http2=True) as client:
        for offset in range(0, max_results, 25):
            page = search_page(client, payload, offset)
            hits = (page.get('data', {})
                        .get('searchQueries', {})
                        .get('search', {})
                        .get('results', []))
            if not hits:
                break
            results.extend(hits)
    return results

Two details that trip people up. The endpoint returns 25 results per call, controlled by an offset variable that you bump in 25-result steps. And the request must look like it came from the site itself: origin and referer set to booking.com, content-type: application/json, and an accept-language that matches your IP region. Strip those headers and you'll get a generic 400 or a soft block within a few requests. Use HTTP/2 (httpx does this when you pass http2=True) because Booking.com's edge appears to fingerprint clients that still negotiate HTTP/1.1 only.

Scraping individual hotel pages for description, address, and amenities

Search-result cards are only one slice of web scraping Booking.com; they give you a name and a price, but they don't give you the rich hotel detail travel teams actually want. For that, scrape the hotel URL directly. Hotel pages are mostly server-rendered, so a plain GET plus parsel is enough, no browser required.

import httpx
from parsel import Selector

def scrape_hotel(url):
    html = httpx.get(url, headers=HEADERS, http2=True, follow_redirects=True).text
    sel = Selector(text=html)
    map_link = sel.css("a[data-atlas-latlng]::attr(data-atlas-latlng)").get('')
    lat, lng = (map_link.split(',') + [None, None])[:2]
    return {
        'name':        sel.css('h2.pp-header__title::text').get(default='').strip(),
        'description': ' '.join(sel.css("div[data-testid='property-description'] *::text").getall()).strip(),
        'address':     sel.css("span[data-testid='address']::text").get(default='').strip(),
        'lat':         lat,
        'lng':         lng,
        'amenities':   [a.strip() for a in sel.css("div[data-testid='facility-list-most-popular'] li::text").getall() if a.strip()],
    }

The latitude and longitude are usually embedded in a data-atlas-latlng attribute on the map link, which is more reliable than parsing them out of inline scripts. Amenities are grouped into feature blocks; iterate the groups if you want them categorized rather than flattened.

Fetching nightly prices and availability

Per-night pricing is not in the hotel HTML; it lives behind a separate GraphQL query that returns a calendar-shaped response. Capture the request the same way as the search call: open the hotel page in DevTools, change the dates, and watch for the pricing/availability POST to /dml/graphql. The body includes the hotel identifiers (numeric hotel_id, country code, and currency) and a date range.

Hotel pages also embed a CSRF-style token in the HTML that the pricing query expects in the body or in a header. Extract it from the page once per hotel, then reuse it for each pricing call.

def scrape_pricing(client, hotel_id, csrf, checkin, checkout, currency='EUR'):
    payload = {
        'operationName': 'AvailabilityCalendar',  # verify in DevTools
        'variables': {
            'input': {
                'hotelId': hotel_id,
                'checkIn': checkin,
                'checkOut': checkout,
                'currency': currency,
            }
        },
        'extensions': {'csrf': csrf},
    }
    r = client.post(ENDPOINT, json=payload, headers=HEADERS, timeout=30)
    r.raise_for_status()
    return r.json()

Pulling guest reviews from the hidden reviews endpoint

Guest reviews load through a separate XHR when you click the Reviews tab on a hotel page. Open DevTools, switch to Fetch/XHR, click the tab, and copy the request. It paginates through a skip (or offset) integer in batches of roughly 25, and returns review text, score, language, reviewer country, and date.

Once you have one working call, you can fan out concurrently by running batches in an httpx.AsyncClient:

import asyncio, httpx

async def fetch_reviews(client, hotel_id, skip):
    r = await client.post(ENDPOINT, json=review_payload(hotel_id, skip), headers=HEADERS)
    return r.json()

async def all_reviews(hotel_id, total):
    async with httpx.AsyncClient(http2=True) as c:
        tasks = [fetch_reviews(c, hotel_id, s) for s in range(0, total, 25)]
        return await asyncio.gather(*tasks)

Keep concurrency in single digits per hotel; reviews are aggressively rate-limited.

Discovering hotels through sitemaps and the location autocomplete API

For bulk web scraping Booking.com inventory, rather than one-query-at-a-time scraping, start at https://www.booking.com/robots.txt. Booking.com publishes its Sitemap: entries there, including hotel, attraction, and airport sitemap indexes. Each sitemap index points at sub-sitemaps capped at 50,000 URLs (per the sitemap protocol), which is why the hotel index is split across many files. Walking the index gives you tens of millions of hotel URLs, with duplicates, that you can deduplicate on the URL slug or a parsed hotel id. Our sitemap scraping guide has a reusable pattern for this.

For targeted searches, Booking.com's own location autocomplete endpoint resolves a city or neighborhood string into the destination identifiers the search GraphQL call expects, which beats hard-coding them by hand.

Avoiding blocks: headers, proxies, rate limiting, and captchas

Successful web scraping Booking.com at any volume comes down to looking like a normal browser and backing off when the site asks you to. As of 2026, Booking.com's anti-bot stack appears to fingerprint both TLS and HTTP/2 behavior, so the basics are non-negotiable: an HTTP/2-capable client (httpx with http2=True), a realistic header set including accept-language and sec-ch-ua-*, and a stable user-agent that matches a current Chrome version. (Re-verify HTTP/2 sensitivity periodically; this changes.)

Use residential or ISP proxies rather than datacenter ranges; datacenter IPs hitting Booking.com start tripping captchas within a few dozen requests. Keep concurrency conservative (5 to 10 per IP), add jitter, and back off exponentially on 429 and 403. WebScrapingAPI's residential proxy network and Scraper API both handle the rotation, retry, and TLS-fingerprint pieces if you'd rather not reinvent that infrastructure. Anti-detect browsers are a last resort for the hardest pages.

Handling currency, language, and the 1,000-result paging cap

Booking.com infers the displayed currency from the geolocation of your exit IP, so a US-based scraper will see USD and an EU-based one will see EUR by default. For consistent currency, route through a country-targeted proxy or pass a selected_currency query parameter on each request. (Re-test this behavior periodically; the parameter name and IP-inference logic are the kind of thing that quietly changes.)

The platform also caps any single search at approximately 1,000 results. To enumerate inventory in a busy city, partition the query: scrape London by neighborhood (Shoreditch, Camden, Kensington), then by star rating, then by price band, and union the results on hotel id.

Wrapping up and next steps

For production runs, fold this code into Scrapy and let it handle retries, persistence, and distributed runs. Persist normalized output to Postgres or a columnar store, snapshot daily, and keep your scrapers honest with robots.txt and Booking.com's terms of service.

Key Takeaways

  • Web scraping Booking.com works best as two methods used together: Selenium Wire for prototyping and DOM stability, and the internal /dml/graphql endpoint via httpx for production speed.
  • Pull the full set of entities (search listings, hotel detail pages, nightly prices, and guest reviews), rather than just the search SERP, otherwise the dataset is too thin for rate intelligence.
  • Use data-testid selectors and wait_for_request on /dml/graphql to keep the search-page scraper resilient to layout drift and pagination race conditions.
  • Plan around platform constraints up front: residential proxies, HTTP/2 headers, IP-based currency selection, and the roughly 1,000-result paging cap that forces query partitioning.
  • Use sitemaps under /robots.txt for bulk hotel-URL discovery and the location autocomplete API for resolving destination identifiers.

FAQ

In most jurisdictions, scraping publicly visible hotel listings, prices, and aggregated reviews is generally treated as permissible when done at respectful rates and without bypassing authentication. That said, terms of service, the EU Database Directive, and GDPR (for any reviewer-identifiable information) all matter. Have legal counsel review your specific use case before commercial deployment, and avoid storing personal data.

How do I control the currency Booking.com returns to my scraper?

Two reliable levers: route requests through a proxy located in the country whose currency you want (Booking.com infers currency from the exit IP), or pass a selected_currency=EUR-style query parameter on each request to override the inferred default. Combine both for consistency, since the override is occasionally ignored when the IP and parameter conflict for hotels priced in a fixed local currency.

How can I extract more than 1,000 results for a busy city like London or New York?

Partition the query. Booking.com caps any single search at roughly 1,000 results, so the workaround is to slice the city into smaller subsets that each fit under the cap: by neighborhood, then by star rating, then by price band if needed. Union the resulting hotel ids and dedupe. For full inventory enumeration, fall back to walking the hotel sitemap index instead of the search interface.

Should I use Selenium or call Booking.com's GraphQL endpoint directly?

Use Selenium for discovery and small jobs; use the GraphQL endpoint for scale. Selenium is more forgiving when the front end changes because you query the rendered DOM. GraphQL is far faster and cheaper per request, but you have to keep request payloads and headers in sync with the live site. A common pattern is to maintain both and fail over from API to browser when the API breaks.

Why do the prices my scraper sees differ from what I see in the browser?

Almost always one of three things: your check-in/check-out dates are not pinned in the URL, your exit IP changed the currency or applied a regional discount, or the search-card price excludes taxes and fees that the browser displays only on the next step. Pin the dates, fix the currency, and label scraped prices clearly as nightly pre-tax rates.

Putting it all together

Web scraping Booking.com is a tractable problem once you stop treating it like a single page and start treating it like an ecosystem of endpoints. Selenium Wire gives you a forgiving on-ramp for search results and pagination, the internal /dml/graphql endpoint gives you the speed needed for continuous monitoring, and dedicated calls for hotel detail pages, nightly pricing, and reviews round out the dataset. Layer on sitemap discovery, query partitioning, and explicit currency control, and you have a scraper that scales beyond the toy single-query example.

The pieces most teams underestimate are the infrastructure ones: TLS and HTTP/2 fingerprinting, residential proxy quality, retry and backoff logic, and the patience to keep selectors and GraphQL payloads in sync as the site evolves.

If you'd rather not maintain that anti-block layer yourself, our team at WebScrapingAPI offers a Scraper API that returns raw HTML through rotating residential IPs with CAPTCHAs and TLS handling managed for you, plus a Browser API for the multi-step interactions Selenium handles in this guide. Drop either one in front of the code above and you can focus on the parsing and the data model, which is the part that actually differentiates your product.

About the Author
Raluca Penciuc, Full-Stack Developer @ WebScrapingAPI
Raluca PenciucFull-Stack Developer

Raluca Penciuc is a Full Stack Developer at WebScrapingAPI, building scrapers, improving evasions, and finding reliable ways to reduce detection across target websites.

Start Building

Ready to Scale Your Data Collection?

Join 2,000+ companies using WebScrapingAPI to extract web data at enterprise scale with zero infrastructure overhead.