Back to Blog
Guides
Raluca PenciucLast updated on May 8, 202612 min read

How to Scrape Realtor.com: A Practical 2026 Guide

How to Scrape Realtor.com: A Practical 2026 Guide
TL;DR: If you're working out how to scrape Realtor.com cleanly, three things matter most: stable selectors that survive their hashed class names, a request layer that survives Realtor's anti-bot stack, and code that walks both list pages and detail pages. This guide is the full Python build, with anti-block tactics and LLM-ready exports.

If you need property data at scale, learning how to scrape Realtor.com is one of the highest-leverage skills you can pick up. Realtor.com is a major U.S. real estate marketplace, listing homes for sale, rentals, and live housing-market information, and most of that data is rendered into HTML you can parse with Python.

The catch is that Realtor.com is a high-value target with a hardened anti-bot stack. Naive requests.get() calls return CAPTCHA HTML, hashed class names rotate without notice, and the richest fields hide inside embedded JSON blobs. The wrong toolchain can burn a week before producing a single clean row.

This guide walks the full Python build end to end: which fields you can actually pull, the selectors that survive Realtor.com's React rendering, how to route requests through a scraping API that handles proxies and CAPTCHAs for you, and how to extract detail-page data like agent contacts, amenities, and lat/long. We'll cover throttling, error handling, legal limits, and how to feed listings into an LLM for downstream analysis.

You'll leave with a working scraper, not a copy-pasted snippet that breaks the next time the front-end ships.

Why scrape Realtor.com and what teams use the data for

Knowing how to scrape Realtor.com unlocks a long list of practical use cases. Investment teams pull listings for comparable-property analyses and AVM training. Brokerages chase lead generation, spotting expired or relisted properties before competitors. Market researchers track inventory and price trends ZIP by ZIP. Proptech engineers ingest listings into LLM pipelines for enrichment. The common thread: Realtor.com listings refresh constantly, so manual collection breaks down past a handful of properties, and structured scraping is the only sane way to keep up.

Realtor.com data fields you can realistically extract

Most tutorials stop at six fields. You can pull a lot more by walking both search results and detail pages. From a search-results card, the reliable fields are listing price, full address, beds, baths, square footage, lot size, listing status, and the canonical detail-page URL.

Follow each detail URL and you can extract:

  • Property facts: year built, property type, HOA fees, parking, lot dimensions
  • Amenities: heating, cooling, flooring, appliances, exterior features
  • Geo: latitude and longitude (handy for mapping and ZIP-level joins)
  • Media: image URLs from the listing photo gallery
  • Agent: agent name, phone number, and brokered-by office
  • Listing metadata: MLS ID, days on market, price history

That's the dataset that powers comps, lead lists, and AVMs, not just a price-and-address dump.

Realtor.com's anti-bot landscape: what you'll actually hit

Before you write scraper code, understand what you're up against. Realtor.com runs a behavioural anti-bot stack that goes well beyond IP blocking. Reverse-engineering write-ups attribute the protection to a Kasada-class fingerprinting stack; whether or not Kasada is the current vendor today, the detection signals are well-documented in the bot-protection space.

Three layers matter when learning how to scrape Realtor.com at meaningful volume:

  1. TLS and JS fingerprinting. Plain Python requests ships a TLS handshake and header order that don't match real browsers, which Realtor.com's edge can flag on the first hit.
  2. Behavioural correlation. Mouse-entropy proxies, scroll patterns, and DOM-interaction signals are scored against human distributions. A scraper hammering pages from one IP looks nothing like a human.
  3. CAPTCHA pages. When your score crosses the threshold, the body becomes a challenge page rather than listings, often with a 200 status code, so a naive scraper doesn't even realise it failed.

Plan for all three from day one instead of bolting them on after the first batch of empty rows.

Choosing the right scraping approach for your project

There's no single right answer to how to scrape Realtor.com, but three tracks are worth knowing, and each fits a different use case.

Approach

Best for

Tradeoff

requests + a scraping API

Backend pipelines, production data jobs

Fastest and cheapest if your API handles JS rendering and proxies

Selenium with undetected-chromedriver

Pages that need real browser interaction or login flows

Higher cost per page, harder to scale, still hits CAPTCHAs over time

No-code visual scrapers

Analysts pulling a few hundred listings ad hoc

Fragile when class names change; not great for scheduled pipelines

Quick recommendation: if you're a Python team building a recurring data feed, use requests plus a scraping API and parse the HTML with BeautifulSoup, which is what this guide does. Reach for a headless browser only when you genuinely need to interact with a logged-in or heavily JS-driven view, and treat no-code visual scrapers as one-off helpers rather than production infrastructure.

Prerequisites and project setup

Before we get into how to scrape Realtor.com end to end, set up a clean environment. You'll need Python 3.9 or newer, a fresh virtualenv, and four packages:

python -m venv .venv && source .venv/bin/activate
pip install requests beautifulsoup4 lxml pandas

You'll also want a scraping API key. We'll use WebScrapingAPI as the request layer in this guide; sign up, copy your key from the dashboard, and store it as an environment variable so it never lands in git:

export WSA_KEY="your_key_here"

That's the whole setup. No browser drivers, no proxy lists to babysit.

How to scrape Realtor.com with Python, step by step

Now we'll build the actual scraper in five focused steps: inspect the DOM and pin down stable selectors, fetch search pages through the scraping API, parse listing cards with BeautifulSoup, walk paginated results, and export to JSON, CSV, or a pandas DataFrame. Code blocks below assume the setup from the previous section.

Inspect listings and map your CSS selectors

Open Realtor.com search results in Chrome, right-click a listing card, and choose Inspect. Each card is wrapped in an <li> with data-testid="result-card". Price, address, and meta fields live inside <div>s and <li>s carrying their own data-label attributes (pc-price, pc-address, pc-meta-beds, and so on).

Realtor.com also pins each card to a hashed class like BasePropertyCard_propertyCardWrap__abc123. Do not select on the hash. Those suffixes are regenerated on every front-end build and will break your scraper without warning. Anchor on data-testid and data-label attributes wherever possible. They're explicitly there for the front-end's own tests, which makes them the most stable hooks you have.

Fetch search pages through WebScrapingAPI

With selectors mapped, route the actual request through the scraping API. The API takes care of residential proxy rotation, JS rendering, and CAPTCHA handling, so your code stays focused on parsing.

import os, requests

WSA_KEY = os.environ["WSA_KEY"]
SEARCH_URL = "https://www.realtor.com/realestateandhomes-search/Cincinnati_OH"

def fetch(url: str) -> str:
    r = requests.get(
        "https://api.webscrapingapi.com/v2",
        params={
            "api_key": WSA_KEY,
            "url": url,
            "render_js": "1",
            "country": "us",
        },
        timeout=60,
    )
    r.raise_for_status()
    return r.text

html = fetch(SEARCH_URL)

Why route through an API on Realtor.com specifically? A vanilla requests.get will hit a CAPTCHA page within a handful of calls because the TLS fingerprint and header order don't match a real browser. The API normalises both for you and rotates the residential IP per request.

Parse listing cards with BeautifulSoup

With the HTML in hand, BeautifulSoup turns it into something you can query. Be defensive: not every card carries every field, so guard each lookup so a single missing lot-size doesn't blow up the whole row.

from bs4 import BeautifulSoup

def parse_cards(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")
    cards = soup.select('li[data-testid="result-card"]')
    listings = []
    for card in cards:
        def text(sel):
            el = card.select_one(sel)
            return el.get_text(strip=True) if el else None
        link = card.select_one("a[data-testid='card-link']")
        listings.append({
            "price":    text('[data-label="pc-price"]'),
            "address":  text('[data-label="pc-address"]'),
            "beds":     text('[data-label="pc-meta-beds"] span'),
            "baths":    text('[data-label="pc-meta-baths"] span'),
            "sqft":     text('[data-label="pc-meta-sqft"] span'),
            "lot_size": text('[data-label="pc-meta-sqftlot"] span'),
            "url": link["href"] if link and link.has_attr("href") else None,
        })
    return listings

We'll use the url field shortly to follow each listing into its detail page. If you'd rather use BeautifulSoup more deeply, our Python BeautifulSoup parsing guide is a solid companion read.

Loop through paginated results

Realtor.com paginates with a /pg-N URL pattern, and each results page returns roughly 42 cards (the exact number drifts, so don't treat it as a stop condition). The most robust strategy combines a page counter with an empty-results check:

def scrape_search(base_url: str, max_pages: int = 10) -> list[dict]:
    all_listings = []
    for page in range(1, max_pages + 1):
        page_url = base_url if page == 1 else f"{base_url}/pg-{page}"
        cards = parse_cards(fetch(page_url))
        if not cards:
            break
        all_listings.extend(cards)
    return all_listings

Stopping when a page returns zero cards is far safer than trusting any fixed per-page count.

Export results to JSON, CSV, or a DataFrame

Once you've collected listings, push them through pandas to dedupe and export in any format you need. The detail-page URL is the most stable per-listing key.

import pandas as pd

df = pd.DataFrame(scrape_search(SEARCH_URL))
df = df.drop_duplicates(subset="url").reset_index(drop=True)

df.to_json("realtor_listings.json", orient="records", indent=2)
df.to_csv("realtor_listings.csv", index=False)

CSV is the simplest path into Excel and BI tools. JSON works better if downstream code expects nested objects, especially after we add detail-page fields. Either way, dedupe before exporting, since Realtor.com sometimes surfaces the same listing across adjacent search pages.

Scrape individual property detail pages

Detail pages carry the richer dataset: year built, amenities, agent contact, lat/long, photos. Realtor.com embeds two structured payloads:

  1. A <script type="application/ld+json"> block following the schema.org RealEstateListing vocabulary.
  2. A <script id="__NEXT_DATA__"> payload (the Next.js hydration blob) with the full property record.

Both are JSON:

import json
from bs4 import BeautifulSoup

def parse_detail(html: str) -> dict:
    soup = BeautifulSoup(html, "lxml")
    raw  = soup.find("script", id="__NEXT_DATA__").string
    prop = json.loads(raw)["props"]["pageProps"]["initialReduxState"]["propertyDetails"]
    coord = (prop.get("location", {}).get("address", {}).get("coordinate") or {})
    adv   = (prop.get("advertisers") or [{}])[0]
    return {
        "year_built": prop.get("description", {}).get("year_built"),
        "lat": coord.get("lat"), "lon": coord.get("lon"),
        "amenities": prop.get("description", {}).get("features", []),
        "agent_name":  adv.get("name"),
        "agent_phone": (adv.get("phones") or [{}])[0].get("number"),
        "brokered_by": adv.get("office", {}).get("name"),
        "photos": [p["href"] for p in (prop.get("photos") or [])],
    }

That key path drifts; log the raw payload during development and re-pin when you see a KeyError. JSON-LD is more stable but exposes fewer fields.

Turn Realtor listings into LLM-ready data

Once you have clean listing JSON, an LLM is the fastest way to enrich it. Skip the "summarise this page" pattern; chunk by listing and use structured prompts.

Two pipelines that pay off:

  • Comp analysis. Feed five to ten listings from the same ZIP into one prompt with a JSON schema asking for adjusted price-per-sqft, dwelling-type matches, and outlier flags. You get a comp report in one round trip.
  • Lead enrichment. Pass agent name, brokerage, and listing URL to a model that returns a normalised contact card with a confidence score.

Pulling pages as Markdown rather than HTML through your scraping API cuts token cost roughly in half.

Avoid blocks: proxies, headers, and throttling

Scrape Realtor.com from a datacenter IP with default headers and you will get blocked. Beyond what a scraping API handles for you, the mitigations that move the needle are:

  • Residential proxies. Datacenter IPs get flagged in bulk; residential rotation makes traffic look like real visitors.
  • Header rotation. Randomise User-Agent, Accept-Language, and Sec-Ch-Ua per request, and keep them consistent (an iPhone UA with a Windows platform header is an instant tell).
  • Throttling. Roughly three to six seconds of jitter between requests keeps you below most behavioural baselines.
  • Exponential backoff. On 403 or 429, sleep 2 ** attempt + random.random() and rotate session; never retry on the same connection.

Building all of that yourself is a project; most teams delegate it.

Realtor.com listing pages are public, so collecting price, address, and structural fields for internal research is generally defensible, but it isn't a free pass. Read the Realtor.com Terms of Use and robots.txt before building a recurring pipeline. Treat agent names, phone numbers, and emails as personal data under California's CCPA and analogous regimes; collecting them for resale or unsolicited outreach pulls you into regulated territory.

Troubleshoot common Realtor scraper errors

Five failure modes account for most Realtor scraper bugs:

  • Empty listings list. You're looking at a CAPTCHA page that returned 200; check the page title, then turn JS rendering on at the API.
  • Hashed class drift. Replace any BasePropertyCard_propertyCardWrap__* selector with a data-testid or data-label lookup.
  • 403 / 429 spikes. Lengthen backoff, rotate session, and only raise concurrency once your error rate settles.
  • Missing detail fields. The __NEXT_DATA__ key path moved; log the raw payload and re-pin keys.
  • Duplicate rows. Always dedupe on the listing URL before export.

Key Takeaways

  • Selectors first, requests second. Anchor on data-testid and data-label attributes, never on hashed class names like BasePropertyCard_propertyCardWrap__*, which regenerate on every front-end build.
  • Route through a scraping API for production. Plain requests will hit CAPTCHA pages on Realtor.com within a handful of calls; an API layer handles proxies, JS rendering, and TLS fingerprinting in one hop.
  • Detail pages 10x your dataset. Year built, amenities, agent contacts, and lat/long live in the __NEXT_DATA__ blob and JSON-LD on each property page, not in the search-results card.
  • Throttle and back off, even with a good API. Roughly three to six seconds of jitter, residential rotation, and exponential backoff on 403/429 keep long-running pipelines stable.
  • Respect the legal envelope. Listing data is public; agent contacts are personal data. Read the terms before you scale, and avoid resale of regulated fields.

FAQ

Does Realtor.com offer an official API I can use instead of scraping?

No general-purpose public API exists for browsing listings. Realtor.com's parent company has historically offered RETS and RESO Web API access through MLS data feeds, but those are gated behind broker or partner agreements rather than open developer signups. For most independent developers and analysts, structured scraping of public listing pages remains the only practical route to bulk data.

Will Realtor.com block requests coming from AWS, GCP, or other datacenter IPs?

Yes, almost immediately. Realtor.com's anti-bot stack maintains lists of major cloud ASNs and rate-limits or challenges them aggressively. A scraper running on a vanilla EC2 or Cloud Run instance will see CAPTCHA pages within a handful of calls. You need either residential or mobile proxies, or a scraping API that handles IP rotation transparently.

How is scraping Realtor.com different from scraping Zillow or Redfin?

All three are React-rendered, paginated, and protected by behavioural anti-bot stacks, but the implementation details differ. Zillow leans heavily on its __NEXT_DATA__ payload and is aggressive about mobile fingerprint checks. Redfin exposes a richer internal JSON API used by its frontend. Realtor.com sits between them: stable data-testid hooks on cards plus a __NEXT_DATA__ blob on detail pages.

How do I keep the scraper working when Realtor.com changes its hashed class names?

Don't depend on hashed classes in the first place. Anchor every selector on data-testid and data-label attributes, which are part of the front-end's own test contract and rarely change. Add a smoke test that fetches one known listing each morning and asserts a non-empty price, then alert when it fails so you can re-pin selectors before the pipeline silently empties.

What's a safe request rate for a long-running Realtor.com scraper?

A useful starting band is roughly three to six seconds of jittered delay between requests per session, with one to three concurrent sessions, each on a different residential IP. Watch your 403/429 rate; if it climbs above one or two percent over an hour, slow down further or rotate proxies more aggressively. Burst traffic gets flagged faster than steady volume.

Conclusion

Knowing how to scrape Realtor.com end to end is mostly about layering the right pieces in the right order: pin stable selectors, route requests through something that handles the anti-bot stack for you, follow listing URLs into detail pages for the rest of the dataset, and dedupe before exporting. Get those right and you have a pipeline that survives the next front-end refactor instead of breaking on it.

The hard parts, residential rotation, JS rendering, CAPTCHA handling, are exactly what bog teams down for weeks if they're built in-house. They're also the parts that benefit most from being delegated to a managed request layer so your engineers can stay focused on parsing, modelling, and downstream analytics.

If you'd rather not run a proxy fleet, header generator, and CAPTCHA solver yourself, our Scraper API at WebScrapingAPI is built for exactly this shape of job: it returns rendered HTML for any URL you point at it, handles retries on 403/429, and rotates residential IPs across 195 countries so your scraper looks like 200 different humans browsing Cincinnati listings. Wire it into the build above and the only code you maintain is the parsing layer.

About the Author
Raluca Penciuc, Full-Stack Developer @ WebScrapingAPI
Raluca PenciucFull-Stack Developer

Raluca Penciuc is a Full Stack Developer at WebScrapingAPI, building scrapers, improving evasions, and finding reliable ways to reduce detection across target websites.

Start Building

Ready to Scale Your Data Collection?

Join 2,000+ companies using WebScrapingAPI to extract web data at enterprise scale with zero infrastructure overhead.