Webscraping with RSelenium

Automate your browser actions

Etienne Bacher

LISER

February 22, 2023

Action	Code
Open a browser	`open()` / `navigate()`
Click on something	`clickElement()`
Enter values	`sendKeysToElement()`
Go to previous/next page	`goBack()` / `goForward()`
Refresh the page	`refresh()`
Get all the HTML that is currently displayed	`getPageSource()`

1 / 112

Webscraping with RSelenium Automate your browser actions Etienne Bacher LISER February 22, 2023

Webscraping with RSelenium
Introduction
Introduction
Introduction
Introduction
Static and dynamic pages
Static and dynamic pages
Static and dynamic pages
Static vs dynamic
Why is it harder to do webscraping with dynamic pages?
Webscraping a static...
Example: elections...
Of course, static...
Example: Premier...
So it seems that...
(R)Selenium
Idea
Almost everything...
Get started
Get started
Installation issues
Installation issues
Installation issues
Get started
Closing Selenium
Exercise 1
Exercise 1
Open the browser and navigate
Click on “Contributors”
To find the element,...
Then, hover the element...
How can we find this...
We must make a distinction...
Tip: You can check...
We are now on the...
Last step: obtain...
Do we read the HTML...
Do we read the HTML...
Exercise 2: a harder & real-life example
The previous example...
Before using RSelenium
Example: Sao Paulo immigration museum
List all interactions
Check that we need Selenium
Make an example
Make an example
Make an example
Make an example
Make an example
Make an example
Problem
Problem
Solution
Loop through modals
Generalize for each page
Nested loops
How many pages? 2348...
How many pages? 2348...
How many pages? 2348...
How many pages? 2348...
Great, let’s run...
Error handling
1. Catching errors
Catching errors
Catching errors
Catching errors
Catching errors
Catching errors
Catching errors
Example with a loop
Example with a loop
Example with a loop
Catching the error
Catching the error
Using tryCatch in our loop
2. Loading times
Loading times
Loading times
Loading times
Loading times
Loading times
Loading times
Loading times
Loading times
3. Display and save information
Display and save information
Display information
Save information
Save information
Save information
Final loop
Now what?
If everything went...
Make a function to...
Apply this function...
Summary
Selenium in general...
Parallelization
Ethics
Ethics
Thanks!
Good resources
Appendix
Appendix
Appendix
Appendix
Appendix
Appendix
Appendix
Appendix
Your turn
Session information