How we successfully used Kamal to leave AWS Beanstalk and deploy anywhere. We were new to containers, so this was part of the learning curve as well.
In Part 1 We simply wanted to replicate the example by DHH and use Kamal to deploy to a single server on Hetzner. Includes some errors we overcame.
In Part 2 We added the pieces in preparation for deploying the product.
The Gotchas are some errors we saw and how to fix them.
Very useful links to keep coming back to:
- https://kamal-deploy.org
- https://dev.to/adrienpoly/deploying-a-rails-app-with-mrsk-on-hetzner-a-beginners-guide-39kp
- https://nts.strzibny.name/deploying-rails-single-server-kamal
Also good to read the kamal source code:
Useful short introduction video how to Set up Docker integration into VS Code & WSL 2:
We ran all of the commands from within Windows WSL2. Current versions used at the time:
- kamal 1.4.0
- Ruby 3.0.0
- Rails 7.1.3.2
PART 1 - Proof of concept
Follow the DHH demo and deploy a blog app to Hetzner
The DHH example video we followed is available here: https://kamal-deploy.org, with help from https://dev.to/adrienpoly/deploying-a-rails-app-with-mrsk-on-hetzner-a-beginners-guide-39kp
Objective: Simple proof of concept, deploy to a single server using sqlite3 (instead of DHH’s mySql on a seprarate database server with load balancer and two app servers.)
-
Download & install Docker desktop and create a Docker account if you dont have already, get an Api key with read-write-delete access.
-
Generate ssh key (note: no pass phrase):
$ ssh-keygen -t ed25519 -C "name-goes-here"
-
Set up one server on Hetzner
- Create account on Hetzner https://console.hetzner.cloud/
- Create a project & add the generated SSH public key
- Hetzner CPX11 costs € 4.35 per month which is way cheaper than AWS t3a.small at $ 14.89 per month
-
Create a new Rails blog application running on SQLite3.
$ rails new blog -c sqlite3 -d tailwind
$ rails g scaffold post title:string content:text
$ rails db:migrate
- update routes.rb with posts#index as root.
$ rails s and check that blog website works on localhost
-
Install Kamal
$ gem install kamal
$ kamal version => 1.4.0
-
Initialise Kamal
$ kamal init
-
Prepare config/deploy.yaml
- set up with ip address of the one server, no separate database.
$ git add .
$ git commit -m "first"
-
Run Kamal Setup
- Useful to send the loging output to a text file to look at later:
$ kamal setup > output.txt
We thought it would just work, but we got some unexpected errors!
Error 1) [linux/arm64 build 3/6] RUN bundle install && rm -rf ~/.bundle/ “/usr/local/bundle”/ruby//cache “/usr/local/bundle”/ruby//bundler/gems/*/.git && bundle exec bootsnap precompile –gemfile:
Your bundle only supports platforms [“x86_64-linux”] but your local platform is
aarch64-linux. Add the current platform to the lockfile with bundle lock --add-platform aarch64-linux
and try again.
So we ran:
$ bundle lock --add-platform aarch64-linux
$ bundle install
Error 2) /lib/aarch64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by /usr/local/bundle/ruby/3.0.0/gems/nokogiri-1.16.3-aarch64-linux/lib/nokogiri/3.0/nokogiri.so)
We thought the selected Hetzner server was x86_64-linux, but opted to update Bundler from 2.3 to latest 2.5 along with updating the gems.
$ gem update --system
- add the same in the Dockerfile at line 26 before
RUN bundle install
i.e add: RUN gem update --system
This fixed the bundle install issue when combined with setting Nokogiri to compile locally in the gemfile, same was required for sqlite3:
gem "nokogiri", force_ruby_platform: true
gem "sqlite3", "~> 1.4", force_ruby_platform: true
Then run
$ git commit
$ kamal setup > tmp/output.txt
The Kamal logging output appears to stop for a long number of minutes but looking at the graphs on Hetzner indicates activity continues and finally it does complete. Test the IP address in the browser, add a blog post. It works!
Part 2 Additions in preparation for deploying the product.
With help from https://nts.strzibny.name/deploying-rails-single-server-kamal/
Step 1 Add SSL with Let’s Encrypt certificates issued by Traefik.
-
Update the deploy.yml file with the Traefik section & within server section, add redis section.
-
Add blog.mydomain.org subdomain and point to this IP address in the DNS.
-
Update SSL settings and domain name.
-
Add letsencrypt challenge path creation to docker-setup hook.
-
$ git commit
-
$ kamal app logs
-
$ kamal deploy > tmp/kamal_deploy.txt
=> ERROR (SSHKit::Command::Failed): Exception while executing on host XX.XXX.XX.XXX: docker exit status: 125 docker stdout: Nothing written docker stderr: docker: open .kamal/env/roles/blog-cron.env: no such file or directory. -
$ kamal remove
-
$ kamal setup > tmp/kamal_setup.txt
It works!
Step 2. Improve security with a private docker network & don’t use root user
- Add a password to redis.
# config/deploy.yml
...
redis:
...
cmd: "redis-server --requirepass '<%= File.read('/path/to/redis/password') %>'"
...
# .env
...
REDIS_URL=redis://:mypassword@myapp-redis:6379/1
...
The redis URL starts with “redis://:” Note the trailing colon before the password.
We had the Redis (error) NOAUTH Authentication required
error due to a missing second colon here.
- Configure Hetzner with a created user instead of root on a fresh server. When you create a new Hetzner server, include a cloud-init, basically this: https://community.hetzner.com/tutorials/basic-cloud-config with the addition of these lines:
# cloud-config
...
runcmd:
- groupadd docker
- usermod -aG docker holu
...
Wait until the new Hetzner server activity finishes (watch the graph)
- Rename the .kamal/hooks/docker-setup.sample to .kamal/hooks/docker-setup and edit it to force it to fail and stop the setup after docker is in place (eg return 1) This enables the following commands to be run by hand because unfortunately they dont appear to take effect using the docker-setup hook:
$ kamal setup > tmp/kamal_setup.txt
- When the setup stops /fails after docker is in place, SSH into the web server
$ ssh holu@XXX.XXX.XXX
and run these commands and exit:
$ sudo mkdir -p /letsencrypt && sudo touch /letsencrypt/acme.json && sudo chmod 600 /letsencrypt/acme.json
$ sudo mkdir /storage -p && sudo chmod a+rwx /storage
$ sudo docker network create -d bridge private
Do the same on the DB server but omit the letsencrypt command.
- Edit .kamal/hooks/docker-setup to succeed and run setup again & check the logs:
$ kamal setup > tmp/kamal_setup.txt
$ kamal app logs
$ kamal app logs -g e02da2d4-f831-430b-bea5-cf6be933752a
It works!
Step 3. Add postgresql & firewall
- Add a Hetzner firewall for the DB server DB connections & ssh
- Add a Hetzner firewall for the WEB server http/https conenctions & ssh
- Update the deploy.yml and add the postgres to the accessory section; update the Dockerfile.
$ kamal setup > tmp/kamal_setup.txt
It works!
Summary of changes
- The cloud-config used to provision servers: Add these two lines to the example found here: https://community.hetzner.com/tutorials/basic-cloud-config To enable the ‘holu’ user to run the docker commands.
# cloud-config
...
runcmd:
- groupadd docker
- usermod -aG docker holu
...
- Add the health route to the rails app.
# config/routes.rb
...
get "up" => "rails/health#show", as: :rails_health_check
...
- Gemfile - note: use a recent Bundler such as version 2.5
# Gemfile
...
ruby "3.0.0"
gem "nokogiri", force_ruby_platform: true
...
-
Pay careful attention to the config/environments/production.rb file! This was the source of quite a few errors when we applied the changes to an existing product.
-
Update .dockerignore to ensure the wrong files dont end up in the image or deployment.
-
The Dockerfile: Note that if you are using a javascript minifier such as Terser, you may need to install nodejs to fix "
execjs::runtimeunavailable: could not find a javascript runtime
". In which case add this to your Dockerfile in the two places that Install packages:
...
apt-get install -y nodejs && \
...
Dockerfile now looks like this:
# syntax = docker/dockerfile:1
# Make sure RUBY_VERSION matches the Ruby version in .ruby-version and Gemfile
ARG RUBY_VERSION=3.0.0
FROM registry.docker.com/library/ruby:$RUBY_VERSION-slim as base
# Rails app lives here
WORKDIR /rails
# Set production environment
ENV RAILS_ENV="production" \
BUNDLE_DEPLOYMENT="1" \
BUNDLE_PATH="/usr/local/bundle" \
BUNDLE_WITHOUT="development"
# Throw-away build stage to reduce size of final image
FROM base as build
# Install packages needed to build gems
RUN apt-get update -qq && \
apt-get install -y git && \
apt-get install -y nodejs && \
apt-get install --no-install-recommends -y build-essential libpq-dev
# Install application gems
COPY Gemfile Gemfile.lock ./
RUN gem install bundler:2.4
RUN gem update --system
RUN bundle install && \
rm -rf ~/.bundle/ "${BUNDLE_PATH}"/ruby/*/cache "${BUNDLE_PATH}"/ruby/*/bundler/gems/*/.git && \
bundle exec bootsnap precompile --gemfile
# Copy application code
COPY . .
# Precompile bootsnap code for faster boot times
RUN bundle exec bootsnap precompile app/ lib/
# Precompiling assets for production without requiring secret RAILS_MASTER_KEY
RUN SECRET_KEY_BASE_DUMMY=1 ./bin/rails assets:precompile
# Final stage for app image
FROM base
# Install packages needed for deployment
RUN apt-get update -qq && \
apt-get install -y git && \
apt-get install -y nodejs && \
apt-get install --no-install-recommends -y curl postgresql-client && \
rm -rf /var/lib/apt/lists /var/cache/apt/archives
# Copy built artifacts: gems, application
COPY --from=build /usr/local/bundle /usr/local/bundle
COPY --from=build /rails /rails
# Run and own only the runtime files as a non-root user for security
RUN useradd rails --create-home --shell /bin/bash && \
chown -R rails:rails db log storage tmp
USER rails:rails
# Entrypoint prepares the database.
ENTRYPOINT ["/rails/bin/docker-entrypoint"]
# Start the server by default, this can be overwritten at runtime
EXPOSE 3000
CMD ["./bin/rails", "server"]
Gotchas
Error: docker stderr: docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "./bin/rails": permission denied: unknown.
Fix => update permisions by running $ bundle exec rake app:update:bin
Error: libc-bin qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Fix => side-step the issue by setting deploy.yml to:
builder:
multiarch: false
This works if EC3 happens to be the same arch as the dev pc!
Error: Log into image registry… INFO [d5f61b78] Running docker login xxxxxxxxxxxx.dkr.ecr.eu-west-1.amazonaws.com -u [REDACTED] -p [REDACTED] as k@localhost Releasing the deploy lock… Finished all in 27.2 seconds [31mERROR (SSHKit::Command::Failed): docker exit status: 256 docker stdout: Nothing written docker stderr: WARNING! Using –password via the CLI is insecure. Use –password-stdin. <3>WSL (95213) ERROR: UtilAcceptVsock:250: accept4 failed 110 Error saving credentials: error storing credentials - err: exit status 1, out: ``
Fix => retry after a short wait
Error: Letsencrypt fails and traefik logs contain ‘msg="the router traefik@docker uses a non-existent resolver: myresolver"’
Fix => This can be due to creating a volume with a directory called acme.json instead of a simple file. Change deploy.yml from
command:
- --certresolv.myresolver.acme.storage=/acme.json
to
command:
- --certresolv.myresolver.acme.storage=/path to/acme.json
also beware of letsencrypt’s rate limiter of 5 tries in 7 days. This will show up in the traefik logs.