First things first, we need to ‘visualize’ our form. In a website, all elements are grouped in an HTML block and every element has an identifier. Identifiers usually consist of CSS attributes of the element. Yet, you may come across websites that don’t have such selectors. In such scenarios, you can use xPaths for example. But that is a subject for another talk. Let’s focus on identifying elements in Puppeteer using CSS.
To have some sort of background, let’s say we want to automate the login action on Stack Overflow. So the target is https://stackoverflow.com/users/login. Open up your browser, navigate to the login page, and open Developer Tools. You can right click on the page and select ‘Inspect’. You should see something like this:
On the left side, there is a graphical interface. On the right side, there is the HTML structure. If you look closely on the right side, you will see our form. It mainly consists of two inputs and one button. These are the three elements we are targeting. And as you can see, all three elements have an `id` as a CSS identifier. Let’s translate what we’ve learned so far in code:
import puppeteer, { executablePath } from 'puppeteer'
const scraper = async (target) => {
const browser = await puppeteer.launch({
headless: false,
executablePath: executablePath(),
})
const page = await browser.newPage()
await page.goto(target.url,{waitUntil: 'networkidle0'})
await page.type(target.username.selector, target.username.value)
await page.type(target.password.selector, target.password.value)
await page.click(target.buttonSelector)
const html = await page.content()
await browser.close()
return html
}
In order to keep it functional and reusable, I chose to replace my function’s parameter with an object. This object consists of the targeted URL, the input selectors and values, and the selector for the submit button. So, to run the code, just create a new `TARGET` object that holds your data, and pass it to your `scraper` function:
const TARGET = {
url: 'https://stackoverflow.com/users/login',
username: {
selector: 'input[id=email]',
value: '<YOUR_USERNAME>'
},
password: {
selector: 'input[id=password]',
value: '<YOUR_PASSWORD>'
},
buttonSelector: 'button[id=submit-button]'
}
console.log(await scraper(TARGET))