To scrape a javascript-heavy web page you will need a web browser to load the dynamic page into. To automate the browser, you will need something like Puppeteer or Selenium.
We opted for selenium-webdriver with Google-chrome running in headless mode.
It was useful to install the chrome dev package locally for developing and testing. However running this way in production didn’t work, the Rails App server will grind to a halt after some hours when it eventually runs out of memory.
The solution was to connect Selenium in the Rails App to a separate ‘remote’ server running Selenium Grid + Chrome/Firefox. Brilliantly, docker-selenium publishes these Docker images to the Docker Hub registry, ready to use. For simplicity we chose a stable version of The Selenium Grid in Standalone mode with Chrome.
To deploy the Selenium Grid Docker image we used Kamal Accessories, which are long-lived services that the app depends on. To achieve this all that was needed was to specify the image, host and port in our Rails App’s Kamal deploy.yml file. Kamal does the rest. (wow!):
## deploy.yml:
service: myrailsapp
...
servers:
web:
...
accessories:
selenium:
image: selenium/standalone-chrome:4.34.0-20250717
host: xxx.xxx.xxx.x
port: 4444
options:
shm-size: 2g
...
Then deploy the accessory:
$ kamal accessory boot selenium
Now it is running in its own container on your app server, or wherever.
To connect from a Rails App, the URL is in the form
"http://{ service name }-{ accessory name }:4444/"
So to connect Selenium in our Rails App to the separate ‘remote’ server running Selenium Grid + Chrome, this code gets the web driver:
## scraper_base.rb:
...
grid_url = ENV["REMOTE_SELENIUM_URL"] # eg http://myrailsapp-selenium:4444/
options = Selenium::WebDriver::Chrome::Options.new(args: [ "--headless", "--no-sandbox", "--disable-setuid-sandbox" ])
driver = Selenium::WebDriver.for :remote, url: grid_url, options: options
...
Note the selenium/standalone-chrome images are only available for AMD64 and not for Linux/ARM platforms.
That is all there is to it.