How to Scrape Expedia with Python: Hotels, Prices & Ratings (2026 Guide)

TL;DR: Expedia relies on JavaScript rendering and anti-bot protections, so a plain requests call won't return hotel listings. This guide covers identifying CSS selectors with DevTools, building a working scraper via a scraping API, paginating across result pages, and exporting clean CSV data.

Expedia scraping is the automated extraction of hotel prices, ratings, availability, and location data from Expedia's search results — useful for price-monitoring tools, travel comparison apps, and competitive benchmarking. If you've tried a basic HTTP client and gotten back an empty page, you already know the problem: Expedia loads its hotel listings dynamically, so the data isn't in the raw HTML response.

This guide is for Python developers and data engineers who want a working, maintainable solution. We'll cover why Expedia is hard to scrape, how to identify CSS selectors using browser DevTools, how to build a scraper that handles JavaScript rendering and proxy rotation, and how to paginate across multiple result pages — plus how to clean the extracted data before writing it to CSV.

What You Can Extract from Expedia and Why It Matters

When you scrape Expedia hotel search results, a well-built scraper can pull the following fields from each listing card:

Hotel name — the property's display name as shown in search results
Nightly price — the rate for your selected dates, including any promotional pricing
Star rating — the official star classification (1–5)
Guest review score — aggregated user rating (e.g., 8.4/10)
Review count — the number of reviews behind the score
Location / neighbourhood — useful for geo-filtering and mapping

The dataset can be extended to capture promotional badges or thumbnail photos for richer downstream analysis.

Real-world use cases include price monitoring (tracking rate changes across dates and destinations), travel comparison apps (aggregating listings from multiple OTAs), and competitor benchmarking (understanding how a property's pricing stacks up against nearby hotels). Scraped Expedia data can also feed recommendation engines and customer-facing travel tools.

Legal and Ethical Considerations Before You Start

Before writing a single line of code, run through this checklist:

Check robots.txt — Visit https://www.expedia.com/robots.txt and respect disallowed paths. (Verify at publication time — directives change.)
Review the Terms of Service — Expedia's ToS restricts automated access. Personal research sits in a different risk category than commercial resale. Consult a lawyer if unsure.
Scrape only public data — Hotel listings shown to any anonymous visitor are public. Don't attempt to access account-gated content or submit forms automatically.
Rate-limit your requests — Add deliberate delays (minimum 2–5 seconds between requests). Hammering a server is both ethically poor and a fast path to getting blocked.
Store data responsibly — Retain only what you need and avoid republishing scraped content in ways that compete directly with Expedia's own products.

Why Expedia Is Difficult to Scrape

Expedia dynamically loads its hotel listings using JavaScript, which means a static scraper fetching raw HTML will miss the actual content. The server sends a mostly empty shell; the browser executes JavaScript to fetch and render the hotel cards. If you're not rendering that JavaScript, you're not seeing the data. JavaScript rendering is a fundamental challenge for web scraping, and Expedia is one of the more aggressive examples.

Beyond rendering, Expedia deploys IP blocking (repeated requests from the same IP trigger bans), browser fingerprinting (headless browsers are detectable via missing APIs and timing anomalies), and dynamic class names (CSS classes are generated at build time and change with each deployment, breaking hardcoded selectors without warning).

A plain requests fetch returns navigation and metadata but zero hotel listings. That gap is why you need either a headless browser or a scraping API.

Choosing Your Approach: DIY Headless Browser vs. Scraping APIFactor

The DIY route gives you maximum flexibility but requires you to set up a headless browser, manage a proxy pool, and maintain the environment as browser versions change. A scraping API abstracts all of that: you send a request with your target URL and extraction rules; the API handles rendering, proxy rotation, and retries.

For most Expedia scraping use cases — price monitoring, periodic data pulls, research — the API approach is faster to ship and cheaper to maintain over time. You avoid the overhead of keeping browser binaries up to date, sourcing reliable residential proxies, and debugging environment-specific rendering failures. The trade-off is that you depend on an external service, so evaluate uptime guarantees and pricing tiers before committing to either path.

Environment Setup and Prerequisites

You'll need Python 3.8 or later (python --version to check). Install the required libraries:

pip install webscrapingapi pandas

webscrapingapi is the official Python client for WebScrapingAPI — it wraps the HTTP request layer and handles authentication. pandas handles data cleaning and CSV export.

Grab your API key from the WebScrapingAPI dashboard and store it as an environment variable rather than hardcoding it in your script:

export WSAPI_KEY="your_api_key_here"

Then load it in Python with os.environ.get("WSAPI_KEY"). Keep your script file (e.g., expedia.py) in a dedicated project folder so relative paths for CSV export work consistently across runs. Python's built-in os module is all you need — no extra installation required. For a broader introduction to Python-based scraping patterns, see our Python web scraping guide.

How to Identify the Right CSS Selectors on Expedia

This is the step most tutorials skip. Here's a concrete DevTools walkthrough.

Open an Expedia search page and let it fully load.
Right-click a hotel card → "Inspect" to open DevTools with the element highlighted.
Identify the listing card container — the repeating <div> or <article> wrapping each hotel result. This is your root selector; it should appear once per listing.
Drill into child elements — find the elements holding hotel name, price, rating, and review count. Right-click each → "Copy > Copy selector."
Verify uniqueness — run document.querySelectorAll("YOUR_SELECTOR") in the DevTools console and confirm the count matches the number of hotel cards.
Use relative selectors — child selectors should be relative to the card container, not absolute from the document root.

Important: CSS class names on Expedia are generated dynamically and change with site deployments. Always verify selectors against a live page before a production run. Our CSS selectors cheat sheet covers selector syntax and specificity in detail.

Building the Expedia Hotel Search Scraper

The core of the scraper is two dictionaries — extract_rules and js_scenario — passed as parameters to the API client. Together they tell the API what to extract and how to render the page before extraction begins. Getting these two objects right is the most important step in the entire Expedia scraping Python workflow, because every downstream result depends on them.

Defining Extraction Rules and JS Rendering Instructions

extract_rules tells the API which CSS selectors to use and what to return. js_scenario provides instructions to the built-in headless browser: wait pauses execution for a given number of milliseconds; evaluate runs custom JavaScript in the page context (for scrolling, clicking, etc.).

import os, json
import pandas as pd
import webscrapingapi

API_KEY = os.environ.get("WSAPI_KEY")
client = webscrapingapi.WebScrapingAPIClient(API_KEY)

# Verify these selectors against a live Expedia page before use
CARD_SELECTOR = "[data-stid='lodging-card-responsive']"

extract_rules = {
    "hotels": {
        "selector": CARD_SELECTOR,
        "type": "list",
        "output": {
            "name":     {"selector": "[data-stid='content-hotel-title']",        "output": "text"},
            "price":    {"selector": "[data-stid='price-summary']",              "output": "text"},
            "rating":   {"selector": ".uitk-rating-medium",                      "output": "text"},
            "reviews":  {"selector": "[data-stid='reviews-summary']",            "output": "text"},
            "location": {"selector": "[data-stid='content-hotel-neighborhood']", "output": "text"},
        }
    }
}

# Wait 2 s → scroll to bottom → wait 2 s to trigger lazy-loaded cards
js_scenario = {"instructions": [
    {"wait": 2000},
    {"evaluate": "window.scrollTo(0, document.body.scrollHeight)"},
    {"wait": 2000}
]}

The two-phase wait pattern — pause before scrolling, then pause again after — is deliberate. Expedia uses JavaScript rendering to load hotel cards lazily as the viewport moves down the page. Skipping either wait risks returning an incomplete list of properties, especially on slower connections or when the destination has many results.

Making the API Request and Handling Responses

Key parameters: wait_for waits until a CSS selector appears before extracting; country_code sets the proxy exit country for price localisation; premium_proxy enables residential proxy rotation.

def scrape_expedia_hotels(destination, check_in, check_out, page=1):
    q = destination.replace(" ", "+")
    url = (f"https://www.expedia.com/Hotel-Search"
           f"?destination={q}&startDate={check_in}&endDate={check_out}&page={page}")
    try:
        response = client.get(url, params={
            "wait_for":      CARD_SELECTOR,
            "extract_rules": json.dumps(extract_rules),
            "js_scenario":   json.dumps(js_scenario),
            "country_code":  "us",
            "premium_proxy": "true",
        })
    except Exception as e:
        print(f"Request failed: {e}"); return []

    if response.status_code == 401:
        print("Invalid API key."); return []
    if response.status_code == 500:
        print(f"HTTP 500 on page {page} — retry with backoff."); return []
    if response.status_code != 200:
        print(f"Unexpected status {response.status_code}."); return []

    try:
        hotels = response.json().get("hotels", [])
    except ValueError:
        return []

    if not hotels:
        print(f"No results on page {page}. CSS selectors may have drifted.")
    return hotels

An empty hotels list with a 200 status almost always means your CSS selectors have drifted. HTTP 500 from Expedia is often transient — build retry logic with exponential backoff at the call site. Note that the page parameter is already threaded through the function signature, making it straightforward to call this function inside a pagination loop in the next section.

Scraping Multiple Pages of Hotel Results

Expedia uses a predictable page query parameter, making pagination straightforward. The loop below iterates until it gets an empty result set or hits the page limit:

import time

def scrape_all_pages(destination, check_in, check_out, max_pages=5, delay=3):
    all_hotels = []
    for page in range(1, max_pages + 1):
        hotels = scrape_expedia_hotels(destination, check_in, check_out, page=page)
        if not hotels:
            print(f"No results on page {page}. Stopping."); break
        all_hotels.extend(hotels)
        print(f"Page {page}: {len(hotels)} hotels (total: {len(all_hotels)})")
        if page < max_pages:
            time.sleep(delay)  # Respect rate limits
    return all_hotels

results = scrape_all_pages("Rome, Italy", "2026-10-05", "2026-10-10", max_pages=5, delay=3)

The delay parameter matters. Rapid-fire requests reliably trigger IP blocks. A 3-second pause is a reasonable floor; randomise within a 2–5 second range for larger runs to avoid predictable timing patterns.

Expedia search results vary in depth by destination and date range. Rather than assuming a fixed page count, the loop's early-exit condition (if not hotels: break) handles termination cleanly — when the API returns an empty list, you've reached the end of the results.

Cleaning and Exporting Data to CSV

Raw text needs cleaning before export — prices, ratings, and review counts arrive as untyped strings. Normalise them first:

import re

def clean_price(raw):
    if not raw: return None
    try: return float(re.sub(r"[^\d.]", "", raw.split()[0]))
    except ValueError: return None

def clean_rating(raw):
    if not raw: return None
    m = re.search(r"(\d+\.?\d*)", raw)
    return float(m.group(1)) if m else None

def clean_review_count(raw):
    if not raw: return None
    d = re.sub(r"[^\d]", "", raw)
    return int(d) if d else None

def clean_and_export(hotels, filename="expedia_hotels.csv"):
    df = pd.DataFrame([{
        "name":         h.get("name", "").strip(),
        "price_usd":    clean_price(h.get("price")),
        "rating":       clean_rating(h.get("rating")),
        "review_count": clean_review_count(h.get("reviews")),
        "location":     h.get("location", "").strip(),
    } for h in hotels])
    df.dropna(subset=["name"], inplace=True)
    df.to_csv(filename, index=False, encoding="utf-8")
    print(f"Exported {len(df)} hotels to {filename}")
    return df

df = clean_and_export(results)

The CSV columns are typed — price_usd (float), rating (float), review_count (int) — ready for analysis without manual post-processing.

Full Script Reference

All functions (fetch, scrape, to_csv) were defined in the sections above. Combine them in a single file called expedia.py, set your WSAPI_KEY environment variable, and trigger the full run with the entry point below.

if __name__ == "__main__":
to_csv(scrape("Rome, Italy", "2026-10-05", "2026-10-10"))

Execute with python expedia.py. Results are written to expedia_hotels.csv in your working directory, clean and ready for immediate analysis.

Maintaining Your Scraper When Expedia Changes Its Layout

One of the most common reasons an Expedia scraper stops working is selector drift — Expedia regularly updates its frontend, class names change, element hierarchies shift, and selectors that worked last month silently stop returning data.

How to detect selector drift: Your scraper runs without errors but returns an empty list. None values appear across all cleaned fields simultaneously. These are reliable signals that selectors have changed.

The re-identification workflow:

Open Expedia in a browser and run a fresh search.
Right-click a hotel card → Inspect.
Compare the current DOM to your extract_rules selectors. Find the same semantic element (hotel name heading, price container) even if class names have changed.
Update CARD_SELECTOR and child selectors, then run a single-page test before re-enabling the full loop.

Lightweight monitoring: Schedule a daily canary run against a fixed destination. Alert on zero results for same-day visibility into selector drift. For more on how JavaScript-heavy sites affect selector stability, see our guide on how JavaScript affects web scraping.

Scaling and Best Practices

Throttle requests. A 3–5 second sleep between pages is a minimum; randomise the delay to avoid predictable timing patterns.
Implement exponential backoff. On HTTP 500 or 429 responses, double the delay on each retry (5s, 10s, 20s).
Rotate country_code. Match the exit country to your target market for accurate price localisation.
Schedule recurring runs. Use cron, Airflow, or a cloud function for price monitoring. Store results with timestamps to track changes over time.
Log per-page status codes and result counts. When something breaks, you'll want to know exactly which page and destination triggered the failure.

For more on avoiding IP blocks at scale, see our guide on getting rid of IP blocks when web scraping.

Key Takeaways

JavaScript rendering is non-negotiable for Expedia. A static HTTP request won't return hotel listings — you need a headless browser or a scraping API that renders JS for you.
CSS selectors drift. Expedia updates its frontend regularly. Build selector-drift detection into your pipeline and know how to re-identify selectors with DevTools when they break.
Pagination requires a loop. Use Expedia's page query parameter and stop when you get an empty result set — don't assume a fixed number of pages.
Clean your data before exporting. Strip currency symbols, parse numeric ratings, and convert review counts to integers at extraction time.
Rate-limit and throttle. Deliberate delays between requests are both ethically correct and practically necessary to avoid blocks.

FAQ

How do I know when my Expedia scraper has broken due to an HTML structure change?

The clearest signal is an empty result list — the API call succeeds (HTTP 200) but returns zero hotel records. A secondary signal is None values across all cleaned fields. Set up a daily canary run against a fixed destination and alert on zero results.

What is the difference between scraping an Expedia search results page and a hotel detail page?

A search results page returns summary data — name, price, rating, location — across a paginated list. A hotel detail page contains richer data for one property: amenity lists, room-type breakdowns, cancellation policies, and review text. Selectors and rendering requirements differ between the two.

How do I avoid hitting Expedia's rate limits when scraping large datasets?

Use randomised delays rather than fixed intervals — a uniform gap is easier for anti-bot systems to fingerprint. Spread destination lists across multiple hours or days, and use exponential backoff on 429 and 500 responses.

Can I scrape Expedia reviews and ratings alongside pricing data in a single request?

Yes, if the review score and count appear on the search results page, add selectors for both fields to your extract_rules dictionary. Full review text lives on the hotel detail page and requires a separate request.

Conclusion

Scraping Expedia hotel data in Python is achievable, but it requires more than a basic HTTP request. You need JavaScript rendering to see the actual listings, reliable proxy rotation to avoid IP blocks, and a clear strategy for identifying and maintaining CSS selectors as Expedia's frontend evolves.

The approach in this guide — using a scraping API to handle the infrastructure layer, combined with explicit extract_rules and js_scenario parameters — gets you to a working scraper faster than setting up and maintaining a local headless browser stack. The pagination loop, data-cleaning functions, and selector-drift monitoring strategy make it production-ready rather than just a proof of concept.

If you want to skip the infrastructure overhead entirely, WebScrapingAPI's Scraper API handles JavaScript rendering, proxy rotation, and CAPTCHA solving behind a single endpoint — so you can focus on the data, not the plumbing. Explore our travel and hospitality scraping use cases for more OTA data collection patterns, or see our related guides on scraping Booking.com and scraping Airbnb listings.