Within our API, query parameters are used to customize the scraper based on your needs. Understanding how each parameter works will enable us to use the full power of our web scraper API. We keep an up to date documentation of the API parameters here. However, we’re also going to dive into them here, to have a better understanding of how query parameters work with Web Scraping API. This being said, there are three types of parameters: required, default and optional. The required ones are quite simple:
- The `api_key` parameter that we’ve discussed above
- The `url` parameter, which represents the URL you want to scrape
Please note that the `url` parameter’s value should be a valid URL, not a domain name, and it should ideally be URL encoded. (i.e. https%3A%2F%2Fwebscrapingapi.com)
When it comes to default parameters, we’ve used historical data to increase our API’s (and implicitly your project’s) success rate. Internal data shows that the best configuration for web scraping is using an actual web browser paired with a residential IP address. Hence, our API’s default parameters are:
- `render_js=1` - to fire up an actual browser (not a basic HTTP client)
- `proxy_type=residential` - to access the target via a residential IP address (enabled only if your current plan supports residential proxies)
Of course, you may also overwrite the value for these parameters, though we don’t encourage it. Scraping with a basic HTTP client and datacenter proxies usually leads to the targeted website picking up on scraping activity and blocking access.
Moving forward, we will be discussing the optional parameters. Since we’ve documented all parameters in our Documentation, we’re only going to discuss the most used parameters for now:
- Parameter: render_js Description: By enabling this parameter, you will access the targeted URL via an actual browser. It has the advantage of rendering JavaScript files. It’s a great choice for scraping JavaScript-heavy sites (like those built with ReactJS, for example). Documentation: [here]
- Parameter: proxy_type Description: Used to access the targeted URL via a residential or a datacenter IP address. Documentation: [here]
- Parameter: stealth_mode Description: Web scraping is not an illegal activity. However, some websites tend to block access to automated software (including web scrapers). Our team has designed a set of tools that makes it almost impossible for anti-bot systems to detect our web scraper. You may enable these features by using the stealth_mode=1 parameter. Documentation: [here]
- Parameter: country Description: Used to access your target from a specific geolocation. Checkout supported countries [here]. Documentation: [here]
- Parameter: timeout Description: By default, we terminate a request (and not charge if failed) after 10s. With certain targets, you may want to increase this value to up to 60s. Documentation: [here]
- Parameter: device Description: You can use this to make your scraper look like a 'desktop', 'tablet', or 'mobile'. Documentation: [here]
- Parameter: wait_until Description: In simple terms, once it reaches the targeted URL, it freezes the scraper until a certain event happens. The concept we follow is best described [here]. Documentation: [here]
- Parameter: wait_for Description: This parameter freezes the scraper for a specified amount of time (that cannot exceed 60s). Documentation: [here]
- Parameter: wait_for_css Description: Freezes the scraper until a certain CSS selector (i.e., class or ID) is visible on the page. Documentation: [here]
- Parameter: session Description: Enables you to use the same Proxy (IP address) across multiple requests. Documentation: [here]