rsync is a popular file synchronization utility that uses an efficient algorithm to minimize bandwidth consumption. One of rsync’s common roles is deploying a website build to a remote production server. Here’s how to combine rsync’s versatility with the automation provided by GitLab CI pipelines.

Pipeline Executors

GitLab CI supports several types of pipeline executors. These define the environment that your job will run in. The shell executor is the default and runs bare metal on the host machine. It lets your pipelines use any command available on the host without further configuration. As most popular Linux distributions ship with rsync installed, this approach is easy to get to grips with.

Unfortunately, the shell executor doesn’t provide strong isolation and can pollute your host’s environment over time. A better alternative is the docker executor, which spins up a new Docker container for each CI job. All jobs run in a clean environment that can’t impact the host.

The drawback here is that Docker base images don’t generally include rsync or ssh. Even official OS images like ubuntu:latest ship as minimal builds without these commands. This makes for a slightly more involved pipeline script to add the dependencies and rsync your files.

Here’s how to add rsync to your pipeline. Make sure that you have a Docker-based GitLab Runner available before you continue. We’ll also assume that you have a GitLab project that’s ready to use.

Getting Ready

You’ll need an SSH key pair available if you’ll be using rsync to connect to a remote SSH host. You can generate public and private keys by running ssh-keygen -t rsa. Copy the public key to the server that you’ll be connecting to.

Next, copy the generated private key to your clipboard:

Head to your GitLab project and click “Settings” at the bottom of the left navigation menu. Click the “CI/CD” item in the sub-menu. Scroll down to the “Variables” section on the resulting page.

Click the blue “Add variable” button. Give your new variable a name in the “Key” field. We’re using SSH_PRIVATE_KEY. Paste your private key into the “Value” field, including the leading —-BEGIN and trailing —–END lines.

Adding the key as a CI variable lets you reference it in your pipeline later on. It will be added to the SSH agent in the containers that your pipeline creates.

Adding Your Pipeline File

GitLab CI runs jobs based on the contents of a .gitlab-ci.yml file in your repository. GitLab will automatically find this file and run the pipeline it defines when you push changes to your branches.

This .gitlab-ci.yml contains a job that uses rsync to synchronize the contents of the working directory to /var/www/html on the example.com server. It uses the alpine:latest Docker image as the build environment. The pipeline will currently fail because rsync isn’t included in the Alpine image.

Installing SSH and rsync

Alpine is a good base for the job because it’s a lightweight image with few dependencies. This reduces network use while GitLab pulls the image at the start of the job, accelerating your pipeline. To get rsync working, add SSH and rsync to the image, and then start the SSH agent and register the private key that you generated earlier.

OpenSSH and rsync are installed using Alpine’s apk package manager. The SSH authentication agent is started, and your private key is added via ssh-add. GitLab automatically injects the SSH_PRIVATE_KEY environment variable with the value that you defined in your project’s settings. If you used a different key on the GitLab variables screen, make sure that you adjust your pipeline accordingly.

Managing Host Verification

SSH interactively prompts for confirmation the first time that you connect to a new remote host. This is incompatible with the CI environment, where you won’t be able to see or respond to these prompts.

Two options are available to address this: Disable strict host key checks, or register your server as a “known” host ahead of time.

For the first option, add the following line to your pipeline’s before_script:

While this works, it’s a potential security risk. You’d have no warning if an attacker gained control of your server’s domain or IP. Using host key checking lets you verify that the remote’s identity is what you expect it to be.

You can add the remote as a known host non-interactively by connecting to it on your own machine outside of your pipeline. Inspect your ~/.ssh/known_hosts file and find the line containing the remote’s IP or hostname. Copy this line and use the procedure from earlier to add a new GitLab CI variable. Name this variable SSH_HOST_KEY.

Now, update your before_script section with the following line:

Now, you’ll be able to connect to the server without receiving any confirmation prompts. Push your code to your GitLab repository and watch as your pipeline completes.

Further Improvements

This pipeline is a simple example of how to get started with SSH and rsync in a Dockerized environment. There are opportunities to further improve the system by wrapping the preparation steps into a dedicated build stage that constructs a Docker image that you can reuse between pipelines.

The .gitlab-ci.yml would also benefit from greater use of variables. Abstracting the remote server’s hostname (example.com), directory (/var/www/html), and user (user) into GitLab CI variables would help keep the file clean, prevent casual repository browsers from seeing environmental details, and let you change the configuration values without editing your pipeline file.

Summary

Using rsync in GitLab CI pipelines requires a little manual setup to form a build environment that has the dependencies you need. You have to manually inject an SSH private key and register the remote server as a known host.

Although community Docker images are available that roll SSH and rsync atop popular base images, these ultimately give you less control over your build. You’re extending your pipeline’s supply chain with an image that you can’t necessarily trust. Starting with an OS base image and adding what you need helps you have confidence in your builds.