Back

SST + Nomad

Repo with the code is there

Things that require further figuring out:

  1. How to expose only the Traefik service to the outside world without relying on VPS provider to, well, provide a private network
  2. How to properly install and configure initial Nomad setup without a lot of manual work
  3. CI/CD integration
  4. Vault integration and where to host it

$5 VPS

Get yourself a VPS with at least 1GB of RAM, preferably a bit more and a private network. I know it is possible to provision one with SST from a myriad of providers, but I wanted a more general setup that could be used with any VPS provider or even a bare metal server.

DNS

For DNS, I'll be using Cloudflare because it integrates neatly with Traefik, which will be our reverse proxy, load balancer, and TLS certificate provisioner. You can use any other DNS provider, but you'll have to adjust the Traefik configuration accordingly. You can read more here, here, and here.

Pretend that 10.11.12.13 is your server's public IP and example.com is your domain. Then create A records for your domain, pointing to your server's IP address:

Get a CF_ZONE_API_TOKEN for Traefik to use Cloudflare's API for DNS challenges to issue TLS certificates, specifying the zone example.com. You can do that here. Save the token.

Installing things

Install Docker

Install Nomad, then follow Linux post-installation steps to install CNI plugin. Do not install the consul-cni.

Configuring Nomad

I'm sorry for the lack of syntax highlighting for HCL -_-

SSH into your server, go to /etc/nomad.d, do vim nomad.hcl, paste this:

hcl
data_dir = "/opt/nomad/data" bind_addr = "0.0.0.0" server { enabled = true bootstrap_expect = 1 } plugin "docker" { config { volumes { enabled = true } } } plugin "containerd-driver" { config { containerd_runtime = "io.containerd.runc.v2" } } client { enabled = true servers = ["127.0.0.1"] host_network "private" { cidr = "10.0.0.2/32" } } acl { enabled = true }

Where 10.0.0.2/32 is your server's private IP address. You can get it by doing ip a and looking for the interface named something like enp7s0.

Here we are:

Enable and start Nomad systemctl enable --now nomad

Perform a bootstrapping via nomad acl bootstrap. You should get something like this:

shell
Accessor ID = faacbd2a-1085-8552-5e14-1bc604d95ace Secret ID = 3f30403d-f5a3-00ff-b00f-bd256721b867 Name = Bootstrap Token Type = management Global = true Create Time = 2024-10-16 13:08:38.082016962 +0000 UTC Expiry Time = <none> Create Index = 14 Modify Index = 14 Policies = n/a Roles = n/a

Do export NOMAD_TOKEN=<secret_id>

Then do nomad acl token create -name="frontend" -type="management", this will be used to authenticate with the Nomad UI

And then do nomad acl token create -name="sst" -type="management", this one for SST to be able to interact with Nomad remotely

Write them all down

Visit http://10.11.12.13:4646/ui, you should see Nomad's UI. Authenticate with the frontend token.

Traefik Host Configuration

Create folders /opt/letsencrypt and /opt/traefik

In /opt/traefik/dynamic-config.yml put this:

yml
http: routers: nomad: rule: "Host(`nomad.example.com`)" entryPoints: - websecure service: nomad tls: certResolver: myresolver services: nomad: loadBalancer: servers: - url: "http://10.11.12.13:4646"

Don't forget to replace nomad.example.com with your domain.

First we will create just a Traefik container, which will be responsible for routing traffic to our services and issuing TLS certificates. Since we won't have TLS before we have Traefik, we will use HTTP for now and rotate tokens later when we will have HTTPS configured.

First SST Interaction

Actually no, first execute nomad var put nomad/jobs/traefik cf_dns_api_token=<cf_zone_api_token>

Now init SST somehow, add nomad provider via sst add nomad, change home to "local"

You should have something like this:

tsx
... app(input) { return { name: "sst-nomad-thing", removal: input?.stage === "production" ? "retain" : "remove", home: "local", providers: { nomad: "2.3.3" } } }, ...

Create .env file, put this inside:

env
NOMAD_URL=http://10.11.12.13:4646 NOMAD_TOKEN=<nomad-sst-secret-id>

Create a folder named .nomad inside your project, inside it create a file named traefik.nomad with the following content:

hcl
variable "NOMAD_URL" { type = string } job "traefik" { group "traefik-group" { network { mode = "host" port "http" { static = 80 } port "http_secure" { static = 443 } port "database" { static = 5432 } } service { name = "traefik" provider = "nomad" } task "traefik-task" { driver = "docker" config { image = "traefik" ports = ["http", "http_secure", "database"] volumes = ["/opt/letsencrypt:/letsencrypt", "/opt/traefik:/traefik"] args = [ "--api.dashboard=false", "--api.insecure=true", "--entrypoints.web.address=:${NOMAD_PORT_http}", "--entrypoints.web.http.redirections.entrypoint.to=websecure", "--entrypoints.web.http.redirections.entrypoint.scheme=https", "--entrypoints.websecure.address=:${NOMAD_PORT_http_secure}", "--entrypoints.websecure.http.tls=true", "--entrypoints.database.address=:${NOMAD_PORT_database}", "--providers.nomad=true", "--providers.nomad.endpoint.address=${NOMAD_URL}", "--providers.nomad.exposedByDefault=false", "--accesslog=true", "--log.level=DEBUG", "--certificatesresolvers.myresolver.acme.dnschallenge=true", "--certificatesresolvers.myresolver.acme.dnschallenge.provider=cloudflare", "--certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json", "--providers.file.filename=/traefik/dynamic-config.yml" ] } env { NOMAD_URL = var.NOMAD_URL } template { data = <<EOF {{- with nomadVar "nomad/jobs/traefik" -}} CF_DNS_API_TOKEN = {{.cf_dns_api_token}} {{- end -}} EOF destination = "secrets/env" env = true } identity { env = true change_mode = "restart" } } } }

Oof, that's a lot. Let's break it down:

Also we are passing a bunch of arguments to Traefik:

Add this function to sst.config.ts:

tsx
const getEnvVariables = () => { const nomadUrl = process.env.NOMAD_URL if (!nomadUrl) throw new Error("NOMAD_URL is not set") const nomadToken = process.env.NOMAD_TOKEN if (!nomadToken) throw new Error("NOMAD_TOKEN is not set") return { nomadUrl, nomadToken } }

This is how I do it, you can do it however you want, but you need to have access to NOMAD_URL and NOMAD_TOKEN in your code

Update the run function:

tsx
async run() { const { nomadUrl, nomadToken } = getEnvVariables() const nomadProvider = new nomad.Provider("NomadProvider", { address: nomadUrl, skipVerify: true, secretId: nomadToken }) const traefik = new nomad.Job( "Traefik", { jobspec: readFileSync(".nomad/traefik.nomad", "utf-8"), hcl2: { vars: { NOMAD_URL: nomadUrl } } }, { provider: nomadProvider } ) }

Seems self-explanatory, we are creating a Nomad provider, then a Traefik job, then passing the NOMAD_URL variable to the job.

skipVerify is set to true because we don't have a TLS configured yet.

Perform env $(cat .env | xargs) sst deploy, visit the Nomad UI http://10.11.12.13:4646/ui, you should see a traefik job running. Once it's healthy, check Traefik logs, there check that everything is ok, then check /opt/letsencrypt/acme.json, it should be populated with a certificate for nomad.example.com.

If you can visit https://nomad.example.com/ui and see, well, the UI, then everything is working fine.

Change skipVerify to false in the nomadProvider.

Hardening (lmao) the Nomad

Now we need to rotate sst and frontend tokens

Do nomad acl token list, you should see something like this:

shell
Name Type Global Accessor ID Expired Bootstrap Token management true f4ab3e26-ce1d-11e6-3d9a-238db337c10a false frontend management false 0d60c989-a5b4-874a-2940-43e7549a060c false sst management false 0b0f1d6b-85e4-d654-4635-7775dcbe43db false

If you get access denied do export NOMAD_TOKEN=<bootstrap_token>

Delete tokens:

shell
root@sst-nomad-thing:~# nomad acl token delete 0d60c989-a5b4-874a-2940-43e7549a060c Successfully deleted 0d60c989-a5b4-874a-2940-43e7549a060c policy! root@sst-nomad-thing:~# nomad acl token delete 0d60c989-a5b4-874a-2940-43e7549a060c Successfully deleted 0d60c989-a5b4-874a-2940-43e7549a060c policy!

Recreate them as we did before, update NOMAD_TOKEN in the .env file with the new token, also change NOMAD_URL to https://nomad.example.com while you're at it.

Now we can transport secrets to the server over encrypted connection since we have HTTPS figured out.

If you do env $(cat .env | xargs) sst refresh now, you'll get a 403 error even though we updated the token. That's because the token is cached, so we need to do env $(cat .env | xargs) sst deploy. It will error out too, but do env $(cat .env | xargs) sst deploy again, and everything should be fine.

Deploying Services

Echo Service

Create echo.nomad file in the .nomad folder in the project root, put this inside:

hcl
variable "POSTGRES_USER" { type = string } variable "POSTGRES_PASSWORD" { type = string } variable "POSTGRES_DATABASE" { type = string } variable "DOMAIN" { type = string } job "echo" { group "echo-group" { count = 3 network { mode = "bridge" port "http" { to = -1 host_network = "private" } } service { name = "echo" provider = "nomad" port = "http" tags = [ "http-echo", "traefik.enable=true", "traefik.http.routers.http-echo.rule=Host(`${var.DOMAIN}`)", "traefik.http.routers.http-echo.entrypoints=websecure", "traefik.http.routers.http-echo.tls.certresolver=myresolver", "traefik.http.services.http-echo.loadbalancer.server.port=${NOMAD_PORT_http}" ] check { name = "HTTP Echo Health" type = "tcp" interval = "10s" timeout = "2s" } } task "echo-task" { driver = "docker" config { image = "hashicorp/http-echo" ports = ["http"] args = ["-text=DATABASE_URL: ${DATABASE_URL}\n\nCURRENT_PORT: ${NOMAD_PORT_http}", "-listen=:${NOMAD_PORT_http}"] } template { data = <<EOF {{- range nomadService "postgres" }} DATABASE_URL=postgres://${var.POSTGRES_USER}:${var.POSTGRES_PASSWORD}@{{ .Address }}:{{ .Port }}/${var.POSTGRES_DATABASE} {{- end }} EOF destination = "secrets/env" env = true } } } }

Mostly same as the Traefik job, but with some differences:

Traefik labels:

Postgres Service

Create postgres.nomad file in the .nomad folder, put this inside:

hcl
variable "POSTGRES_PASSWORD" { type = string } variable "POSTGRES_USER" { type = string } variable "POSTGRES_DATABASE" { type = string } variable "DOMAIN" { type = string } job "postgres" { group "postgres-group" { network { mode = "bridge" port "database" { to = -1 host_network = "private" } } service { name = "postgres" provider = "nomad" port = "database" tags = [ "database", "traefik.enable=true", "traefik.tcp.routers.db.rule=HostSNI(`database.${var.DOMAIN}`)", "traefik.tcp.routers.db.tls=true", "traefik.tcp.routers.db.entrypoints=database", "traefik.tcp.routers.db.tls.certresolver=myresolver", "traefik.tcp.services.db.loadbalancer.server.port=${NOMAD_PORT_database}" ] } task "postgres-task" { driver = "docker" config { image = "docker.io/postgres" ports = ["database"] volumes = ["/opt/nomad/data/postgres:/var/lib/postgresql/data"] } env { POSTGRES_PASSWORD = var.POSTGRES_PASSWORD POSTGRES_USER = var.POSTGRES_USER POSTGRES_DB = var.POSTGRES_DATABASE PGPORT = "${NOMAD_PORT_database}" } } } }

Same as the echo service, but over TCP.

Second SST Interaction

Add these to the .env file:

env
POSTGRES_PASSWORD=super-secret POSTGRES_USER=oofer POSTGRES_DB=boofer DOMAIN=domain.com

Don't forget to replace domain.com with your domain.

Update getEnvVariables function

ts
const getEnvVariables = () => { const nomadUrl = process.env.NOMAD_URL if (!nomadUrl) throw new Error("NOMAD_URL is not set") const nomadToken = process.env.NOMAD_TOKEN if (!nomadToken) throw new Error("NOMAD_TOKEN is not set") const domain = process.env.DOMAIN if (!domain) throw new Error("DOMAIN is not set") const postgresPassword = process.env.POSTGRES_PASSWORD if (!postgresPassword) throw new Error("POSTGRES_PASSWORD is not set") const postgresUser = process.env.POSTGRES_USER if (!postgresUser) throw new Error("POSTGRES_USER is not set") const postgresDatabase = process.env.POSTGRES_DB if (!postgresDatabase) throw new Error("POSTGRES_DB is not set") return { nomadUrl, nomadToken, domain, postgresPassword, postgresUser, postgresDatabase } }

And its call

ts
const { nomadUrl, nomadToken, domain, postgresPassword, postgresUser, postgresDatabase } = getEnvVariables()

Add new jobs to the run function:

ts
const echo = new nomad.Job( "Echo", { jobspec: readFileSync(".nomad/echo.nomad", "utf-8"), hcl2: { vars: { POSTGRES_PASSWORD: postgresPassword, POSTGRES_USER: postgresUser, POSTGRES_DATABASE: postgresDatabase, DOMAIN: domain } } }, { provider: nomadProvider } ) const postgres = new nomad.Job( "Postgres", { jobspec: readFileSync(".nomad/postgres.nomad", "utf-8"), hcl2: { vars: { POSTGRES_PASSWORD: postgresPassword, POSTGRES_USER: postgresUser, POSTGRES_DATABASE: postgresDatabase, DOMAIN: domain } } }, { provider: nomadProvider } )

Do env $(cat .env | xargs) sst deploy

Visit the UI, you should see jobs, wait for them to be healthy, then visit https://example.com, you should see something like this:

shell
DATABASE_URL: postgres://oofer:super-secret@10.11.12.13:24847/boofer CURRENT_PORT: 28721

Running openssl s_client -connect database.example.com:5432 should show us that TLS is working, indicating that we have an encrypted connection to the database from the outside.

Refresh the page a couple of times, you should see different ports in the CURRENT_PORT field.

And I guess that's it. Now we have a working Nomad setup with Traefik, and we have a way to do rolling updates, run multiple instances of services, and connect them to each other.