Selenium Web Scrape



How to web-scrape multiple page with Selenium (Python) Ask Question Asked 1 year, 11 months ago. Active 1 year, 9 months ago. Viewed 7k times 3. I've seen several solutions to scrape multiple pages from a website, but couldn't make it work on my code. At the moment, I have this code, that is working to scrape the first page.

  • In this web scraping tutorial, we want to use Selenium to navigate to Reddit’s homepage, use the search box to perform a search for a term, and scrape the headings of the results. Reddit utilizes JavaScript for dynamically rendering content, so it’s a good way of demonstrating how to perform web scraping for advanced websites.
  • The “Webdriver” module of Selenium is most important because it will control the browser. To control the browsers there are certain requirements and these requirements have been set in the form of.

In this article I will show you how it iseasy to scrape a web siteusingSelenium WebDriver. I will guide you through a sample project which is written inC#and usesWebDriverin conjunction with theChromebrowser to login on thetesting pageand scrape the text from the private area of the website.

Downloading the WebDriver

First of all we need to get the latest version ofSelenium Client & WebDriver Language Bindings and theChrome Driver. Of course, you can download WebDriver bindings for any language (Java, C#, Python, Ruby), but within the scope of this sample project I will use the C# binding only. In the same manner, you can use any browser driver, but here I will use Chrome.

After downloading the libraries and the browser driver we need to include them in our Visual Studio solution:

Creating the scraping program

In order to use the WebDriver in our program we need to add its namespaces:

Then, in the main function, we need to initialize the Chrome Driver:

This piece of code searches for thechromedriver.exefile. If this file is located in a directory different from the directory where our program is executed, then we need to specify explicitly its path in theChromeDriverconstructor.

Selenium Web Scrape Function

When an instance of ChromeDriver is created, a new Chrome browser will be started. Now we can control this browser via thedrivervariable. Let’s navigate to the target URL first:

Then we can find the web page elements needed for us to login in the private area of the website:

Selenium

Here we search for user name and password fields and the login button and put them into the corresponding variables. After we have found them, we can type in the user name and the password and press the login button:

Easy translator. At this point the new page will be loaded into the browser, and after it’s done we can scrape the text we need and save it into the file:

That’s it! At the end, I’d like to give you a bonus – saving a screenshot of the current page into a file:

The complete program listing

Selenium Web Scraping Python Firefox

Get the whole project.

Conclusion

Selenium Web Scraper Python

I hope you are impressed with how easy it is to scrape web pages using the WebDriver. You can naturally press keys and click buttons as you would in working with the browser. You don’t even need to understand what kind of HTTP requests are sent and what cookies are stored; the browser does all this for you. This makes theWebDrivera wonderful tool in the hands of a web scraping specialist.