Back to Blog
Guides
Suciu DanLast updated on May 13, 202611 min read

How to Scrape LinkedIn in 2026: A Python Guide

How to Scrape LinkedIn in 2026: A Python Guide
TL;DR: Scraping LinkedIn means working around an aggressive auth wall, behavioral tracking, and TLS fingerprinting. This guide gives you a method-by-page-type decision tree, working Python patterns for jobs, profiles, and companies (hidden API, JSON-LD, Selenium when needed), and a consolidated anti-block checklist for 2026.

If you have ever tried to figure out how to scrape LinkedIn, you have probably hit the same wall the rest of us have: an aggressive sign-in prompt that fires after only a handful of page views, then quiet 999 responses, then nothing useful at all. Scraping LinkedIn is the practice of extracting public data (profiles, companies, job listings, and search results) directly using HTTP clients, headless browsers, or hidden APIs, without logging in to a personal account. It is technically harder than scraping a typical e-commerce site, but it is far from impossible.

This guide is a code-first walkthrough for developers, data engineers, and growth-ops teams who need public LinkedIn data without burning accounts or rotating proxies blindly. We will start with what you can realistically pull, break down how LinkedIn detects scrapers, and walk through three Python methods (a hidden jobs API, JSON-LD parsing, and a headless browser fallback) with a decision tree so you pick the cheapest reliable path for each page type. The anti-block layer and the legal context come at the end, because they apply regardless of which method you choose.

What LinkedIn Data You Can Realistically Pull

Before we tackle how to scrape LinkedIn, it helps to be honest about what is reachable from outside the auth wall. Four page types are usable without a login: public profile pages, public company pages, individual job postings, and the /jobs/search results page. Everything else (Sales Navigator, the people search index, message graphs, the full employee list on a company page) sits behind authentication and a Terms-of-Service line this guide will not cross.

Within that public surface there is still real value. You get the headline fields most teams need for recruiting intel, sales prospecting, and labor-market research, as long as you accept that depth is capped and you may need to combine sources to fill gaps.

Public LinkedIn Fields by Page Type

The table below maps each public page type to its extractable fields and use case. Many "we want to scrape LinkedIn" requests collapse once a stakeholder sees what is available without logging in.

Page type

Public fields (typical)

Useful for

Profile (/in/...)

Name, title, headline, location, summary, profile URL, employer

Sales prospecting, recruiter shortlists

Company (/company/...)

Name, industry, HQ, followers, website, open-jobs count

Account research, ICP building

Job posting (/jobs/view/...)

Title, company, location, posted date, description, seniority

Labor-market analytics, job aggregation

Jobs search (/jobs/search?...)

Postings list plus pagination metadata

Bulk job collection at scale

If a field is not in that table, assume the auth wall is in the way. A separate jobs-data overview is a reasonable companion read.

How LinkedIn Detects and Blocks Scrapers

Anyone learning how to scrape LinkedIn quickly discovers that the defenses are stacked, not picked from a menu. There are three layers running in parallel, and they all feed one internal signal: a per-visitor fraud score that decides whether your request is approved, soft-blocked behind a sign-in prompt, or dropped entirely.

The first layer is the authentication wall. Anonymous visitors are typically forced to log in after only three to five profile views, which means any scraper requesting dozens of profiles from the same identity is finished on the first run. The second layer is behavioral tracking. LinkedIn watches request timing, navigation flow, mouse activity, and referrer patterns. A human does not load 100 profiles a minute; an unthrottled scraper will, and that single signal is enough to flag the session. The third layer is request fingerprinting. LinkedIn inspects IP quality (residential versus datacenter), the JA3 hash from your TLS handshake, headers and cookies, and device attributes. Sending a default python-requests/2.x user agent from an AWS IP scores poorly on all three at once.

Treat these layers as additive, not alternative. Cleaning up one of them while ignoring the others rarely moves your fraud score enough to matter. A primer on avoiding scraper blocks is worth bookmarking before you scale up.

Choosing How to Scrape LinkedIn: A Method Decision Tree

When deciding how to scrape LinkedIn, default to the lightest tool that returns the data you need. LinkedIn is built as a single-page application, so its data flows through three mechanisms, and each one maps to its own ideal scraping method.

  1. Server-rendered HTML. Some pages return enough data in the initial HTML response to parse with Requests plus BeautifulSoup. Rare on LinkedIn today, but it still applies to a few company subpages and entity pages.
  2. JSON hydrated in <script> tags. Public profile and company pages embed a <script type="application/ld+json"> block that mirrors the fields the visible page renders. Parsing this is faster, cheaper, and far less brittle than chasing CSS selectors through the DOM.
  3. XHR / hidden APIs. The infinite-scroll experiences (jobs feed, company jobs list, search) call internal endpoints with predictable query parameters. Replaying those calls directly bypasses rendering entirely.

The rule of thumb: try JSON-LD first for profile and company pages, replay the hidden API for jobs and search, and reach for a headless browser only when both fail. Most teams figuring out how to scrape LinkedIn at scale overspend on Selenium for tasks a thirty-line Requests script can finish.

Method 1: How to Scrape LinkedIn Jobs via the Hidden Jobs-Guest API

The jobs-search endpoint is the easiest LinkedIn surface to scrape: it is intentionally exposed to unauthenticated visitors and paginated by a single start query parameter. At the time of writing, the path is /jobs-guest/jobs/api/seeMoreJobPostings/search, and a response returns HTML job cards rather than JSON. LinkedIn rotates internal endpoints, so reconfirm the path in DevTools before a production run.

A minimal Python pattern, parsed with BeautifulSoup, looks like this:

import requests
from bs4 import BeautifulSoup

BASE = "https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search"
HEADERS = {
    "User-Agent": "Mozilla/5.0",
    "Accept-Language": "en-US,en;q=0.9",
}

def fetch_page(keywords, location, start=0):
    params = {"keywords": keywords, "location": location, "start": start}
    r = requests.get(BASE, params=params, headers=HEADERS, timeout=20)
    r.raise_for_status()
    return r.text

def parse_cards(html):
    soup = BeautifulSoup(html, "html.parser")
    for card in soup.select("li"):
        title = card.select_one(".base-search-card__title")
        company = card.select_one(".base-search-card__subtitle")
        loc = card.select_one(".job-search-card__location")
        link = card.select_one("a.base-card__full-link")
        if title and link:
            yield {
                "title": title.get_text(strip=True),
                "company": company.get_text(strip=True) if company else None,
                "location": loc.get_text(strip=True) if loc else None,
                "url": link["href"].split("?")[0],
            }

jobs, start = [], 0
while True:
    html = fetch_page("python developer", "Berlin", start)
    batch = list(parse_cards(html))
    if not batch:
        break
    jobs.extend(batch)
    start += 25

Page size is twenty-five cards. The stop condition is an empty result set, not a fixed page count, because LinkedIn trims results by geo and freshness. Pipe jobs into Python's csv module or a Pandas frame and you have a LinkedIn jobs feed without touching a browser. A BeautifulSoup tutorial covers the selector patterns if you need a refresher.

Method 2: How to Scrape LinkedIn Profiles and Companies via JSON-LD

The information-gain move for profile and company pages is to skip CSS selectors entirely and parse the <script type="application/ld+json"> block LinkedIn injects during server-side rendering. JSON-LD is structured, stable, and changes far less often than the visible DOM. To find it on any public LinkedIn URL, open DevTools and search for //script[@type='application/ld+json'] in the Elements panel.

import json
import requests
from bs4 import BeautifulSoup

HEADERS = {
    "User-Agent": "Mozilla/5.0",
    "Accept-Language": "en-US,en;q=0.9",
}

def scrape_ld(url):
    r = requests.get(url, headers=HEADERS, timeout=20)
    r.raise_for_status()
    soup = BeautifulSoup(r.text, "html.parser")
    blob = soup.find("script", {"type": "application/ld+json"})
    if not blob:
        return None
    return json.loads(blob.string)

# Public profile
profile = scrape_ld("https://www.linkedin.com/in/some-public-handle/")
# Public company
company = scrape_ld("https://www.linkedin.com/company/openai/")

For a public profile, expect fields such as name, jobTitle, worksFor, address, and sometimes alumniOf. For a company, expect name, description, url, numberOfEmployees, and an address block. Anything LinkedIn hides behind a login (full employee list, mutual connections, contact info) will not be present in the JSON-LD; that is not a parsing bug, it is the auth wall.

Fall back to HTML parsing only for fields that JSON-LD does not expose, such as the "similar pages" carousel on company pages, and treat those selectors as the most fragile part of your pipeline. That is also where most maintenance time goes, since LinkedIn typically rewires its frontend markup every two to four weeks.

Method 3: Headless Browsers for Search and Lazy-Loaded Sections

Reach for Selenium or Playwright only when the first two methods come up short. The common cases are people search results, the lazy-loaded "Jobs at this company" tab, and any page where critical data appears after a scroll event. If you have not built one of these before, a headless browser primer is a useful prerequisite, and the Selenium-with-Python tutorial walks through driver setup if you need it.

The minimal workflow looks like this: spin up a Chromium driver, navigate with a realistic user agent, wait for the network to settle, scroll until the relevant block has hydrated, then either grab the DOM with a Selenium locator or pass driver.page_source to BeautifulSoup. Do not log in to a real LinkedIn account inside Selenium. That combination violates LinkedIn's Terms of Service and is the fastest way to get an account permanently banned.

Headless browsers are not an anti-bot solution by themselves. Vanilla Puppeteer and Playwright are easy to fingerprint at the JA3 and navigator object level, so you still need the proxy and pacing controls from the next section. If a job can be done without a browser, do it without a browser.

Anti-Block Checklist: Proxies, Headers, JA3, and Pacing

When teams figuring out how to scrape LinkedIn switch from a one-off script to a recurring crawl, the failure mode is almost always anti-bot, not parsing. Work this list in order.

  1. Use residential proxies, not datacenter. LinkedIn maintains an aggressive list of datacenter ASNs. Residential pools rotate real consumer ISP IPs and are much harder to flag. A guide on using proxies with Python Requests is the cleanest place to start if you have not wired up rotation before.
  2. Mind your JA3 fingerprint. Plain requests has a TLS fingerprint that does not match any real browser. Tools that wrap curl_cffi or replay a real browser's JA3 hash will pass this check; raw requests will not.
  3. Send a complete header set. At minimum: a current User-Agent, Accept, Accept-Language, Accept-Encoding, and a plausible Referer. Missing Accept-Language alone is a strong scraper tell.
  4. Pace yourself. Cap concurrency, jitter delays between one and five seconds, and never burst.
  5. Rotate identity, not just IP. Pair every IP rotation with a fresh user agent and cookie jar so LinkedIn cannot stitch sessions together.

If you are still soft-blocked after this list, the problem is volume, not configuration. Slow down.

The headline U.S. case is hiQ Labs v. LinkedIn, in which courts ruled, broadly, that scraping publicly accessible LinkedIn data is not a Computer Fraud and Abuse Act violation. The litigation ran from roughly 2017 through 2022 and ended after the Ninth Circuit's published opinions; for the current status and exact scope of the holding, the EFF case page for hiQ v. LinkedIn is a clean primary-friendly reference. Two things that ruling does not do: it does not override LinkedIn's Terms of Service, which still prohibit automated access while logged in, and it does not apply outside U.S. jurisdiction. If you are scraping at commercial scale, treat this as background and consult counsel before you ship. A general primer on whether web scraping is legal is also worth a read.

Key Takeaways

  • Match the method to the page type, not your habit. JSON-LD wins for profile and company pages, the hidden jobs API wins for job listings and search, and a headless browser is the fallback, not the default.
  • Three defense layers, one fraud score. The auth wall, behavioral tracking, and TLS/header fingerprinting all feed the same internal score; cleaning up only one of them rarely changes the outcome.
  • Residential proxies plus JA3-aware HTTP clients are the baseline. Datacenter IPs alone will not get you to a working pipeline on LinkedIn.
  • Never log in from automation. It violates the Terms of Service and gets accounts permanently banned, regardless of how careful your selectors are.
  • Plan for breakage. LinkedIn typically rewires its frontend every couple of weeks; design selectors and JSON parsers you can swap out in a single file.

FAQ

Can I scrape LinkedIn without logging into an account?

Yes, but only the public surface. Public profile pages, company pages, individual job postings, and the /jobs/search endpoint are reachable without authentication. Sales Navigator, the people search index, mutual-connection data, and the full employee list on a company page are not. Anonymous scrapers also hit a sign-in prompt after roughly three to five profile views, so plan for IP and identity rotation from day one.

Should I use LinkedIn's official API instead of scraping?

Probably not for general data collection. The official LinkedIn API is heavily scoped: it is designed for partner integrations like applying to jobs, sharing posts, or marketing automation, and it does not return the kind of public profile or company data most scraping projects need. Most teams that evaluate the official API end up scraping the public site to cover what the API will not.

What kind of proxies work best for LinkedIn scraping, residential or datacenter?

Residential, with rotation. LinkedIn maintains aggressive blocklists of datacenter ASNs (AWS, GCP, OVH, and similar), so datacenter IPs get throttled or hit with a 999 response very quickly. Residential pools route through real consumer ISP IPs and look like ordinary user traffic. For low-volume one-off pulls, mobile proxies also work, but they are overkill and more expensive for most jobs.

How can I tell if my LinkedIn scraper is about to get blocked?

Watch for three early signals. First, a jump in response times (LinkedIn often delays before blocking). Second, an increase in pages that return a sign-in interstitial instead of content. Third, HTTP 999 responses, which are LinkedIn's specific "you have been flagged" code. If any of the three trend upward over an hour, pause the crawl and rotate identities before it escalates.

How often does LinkedIn change its page structure and break scrapers?

Frequently. Frontend HTML and CSS selectors typically shift every two to four weeks, internal Voyager API endpoints rotate roughly every four to eight weeks, and JSON-LD structures tend to stay stable for several months. Anchor your scrapers on JSON-LD or hidden APIs where possible, isolate brittle CSS selectors into a single module, and budget for a small maintenance pass each month.

Wrapping Up

Figuring out how to scrape LinkedIn at scale is less about clever tricks and more about discipline. Pick the lightest method per page type, respect the auth wall, and treat the anti-bot layer as a first-class concern instead of an afterthought. JSON-LD will carry most of your profile and company work. The jobs-guest endpoint will carry most of your job-market work. Reserve Selenium for the genuinely dynamic surfaces, never run it logged in, and put your engineering time into proxies, pacing, and JA3 hygiene rather than into a fancier Selenium script.

Maintenance is the other half of the job. LinkedIn rewires its frontend on a cadence measured in weeks, so design parsers that fail loudly, log structural changes, and isolate selectors so a fix is a one-file change rather than a rewrite.

If you would rather skip the proxy, fingerprint, and CAPTCHA layer entirely and focus on the data itself, WebScrapingAPI's Scraper API handles the request side (IP rotation, JA3, headers, retries) behind a single endpoint and returns raw HTML you can parse with the same Requests-plus-BeautifulSoup code you already wrote above. The scraping logic stays yours; the unblocking is ours.

About the Author
Suciu Dan, Co-founder @ WebScrapingAPI
Suciu DanCo-founder

Suciu Dan is the co-founder of WebScrapingAPI and writes practical, developer-focused guides on Python web scraping, Ruby web scraping, and proxy infrastructure.

Start Building

Ready to Scale Your Data Collection?

Join 2,000+ companies using WebScrapingAPI to extract web data at enterprise scale with zero infrastructure overhead.