Using Selenium in Docker/Cloud/WSL2 with Python[2022]
Let’s learn how you can scrap dynamic websites using selenium inside Docker
Introduction
If you are looking to use Selenium in a Docker/Cloud to scrap dynamic websites then you’re at the right place. For now, I’ll consider that you want to use it inside Docker.
I’ll start by giving you some context of what we are going to do. (I’m assuming you have some knowledge of Docker).
- We’ll use Docker-compose to create 2 services, one for browserless-chrome and other for scraper.
- Browserless-chrome starts a chrome service inside a docker and the good this is it is already configured, so can directly use that image and connect it with a selenium.
- The idea is that we are going to use Remote connection in selenium to perform all the scraping.
Let’s look at the docker-compose.yml file
Code
Now, let’s look at the code
Pay attention to the command executor. It is the URL of headless_chrome service that is present in docker-compose.yml file. It is in the format http://{service_name}:{port}/webdriver.
You’ll see that I have added a sleep method. I believe browserless-chrome took some seconds to start it’s services. So, if you remove this delay, then probably you’ll get an error.
If you are facing connection issues, then you can try lowering the version of docker-compose file. Earlier I was using 3.8 but I was getting connection issue on that.
Let’s look at the rest of the code
The code is vey simple and I'm scrapping a test website. It can be used for any website.
Further improvements (Some cautions as well)
- You’ll have to be careful while saving a file if you are just going to use docker-compose because you’ll have to mount a volume/folder otherwise you’re data will be lost.
- You can create an API with flask/fastapi and return the result.
- I wasn’t able to disable the logging messages, which I believe can improve speed. If you find a way I’ll be more than happy to give you credits and add it here.
Conclusion
It took me a while to figure out how to use Selenium in a remote way as most of the code I found was absolute. If you have any suggestion or improvements please add them in the comments.
Full Code — LINK
Thank you.