Python web automation with Selenium
Today we are going to discuss how we can automate web based tasks using python. Till now we have seen how to do web scraping using beautiful soup, pandas and calling api’s using Requests library.
Why Selenium
This is wonderful but what if we are supposed to do this daily?
Would you just keep running this everyday in your notebook?
Or can we even automate it to run on it’s own daily at a scheduled time?
Imagine if your job is to check how a website is performing with various browsers. How are we going to test that? Also what if we don’t have an api but are asked to scrape data every day.
This is where Selenium is super useful. We can not only control the timing and scraping but we launch an actual browser to get the data. This is crucial as more websites started tracking the behavior and blocking automated requests.
Installation and setup
Before we go down and explain how the workflow works for selenium, a few important setup steps to ensure we have the whole thing working properly.
pip install selenium
Drivers
Selenium requires additional setup from non-python sources so it can work with browsers, these are called drivers.
Chrome: https://sites.google.com/a/chromium.org/chromedriver/downloads
Firefox: https://github.com/mozilla/geckodriver/releases
Safari: https://webkit.org/blog/6900/webdriver-support-in-safari-10/
Once you have the drivers extracted, you will need to add that folder to your path. If like me, you are not sure how to do this, here is a quick reminder.
Once this is done you are setup for running automated tasks.
Here is a small sample code that opens firefox and goes to flatironschool website then to learn.co
#usual selenium imports
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait#import time so we can wait
import timewith webdriver.Firefox() as driver:
wait = WebDriverWait(driver, 25)
driver.get("https://flatironschool.com")
time.sleep(5)
driver.get("https://learn.co")
time.sleep(5)
driver.quit()
This code creates a new firefox window (assuming the driver was correctly added to the path). It then goes to flatironschool.com, waits for 5 seconds then goes to learn.co waits another 5 seconds and then quits.
Here is the code for Chrome
with webdriver.Chrome() as driver:
wait = WebDriverWait(driver, 25)
driver.get("https://flatironschool.com")
time.sleep(5)
driver.get("https://learn.co")
time.sleep(5)
driver.quit()
As you can see this is the same code as above except for calling the webdriver.Chrome() function instead of webdriver.Firefox()
Obviously this is very basic, we haven’t even scratched the surface of automated testing here. We can have the wait set as wait till the website loads and certain elements are available etc. We can fill forms details from some source excel file, we can check for new files on a web server and copy any new ones. The options are well…..unlimited.
Thanks for reading!