Upgrade Exoscale clusters from component version v4 to v5

When migrating from component version v4 to v5, the following breaking changes require manual changes to the Terraform state of existing clusters:

This guide assumes that you’ve already upgraded component openshift4-terraform to a v5 version for the cluster you’re migrating.

Prerequisites

You need to be able to execute the following CLI tools locally:

Setup environment

  1. Access to API

    # For example: https://api.syn.vshn.net
    # IMPORTANT: do NOT add a trailing `/`. Commands below will fail.
    export COMMODORE_API_URL=<lieutenant-api-endpoint>
    
    export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-<something>
    export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
    
    # From https://git.vshn.net/-/profile/personal_access_tokens, "api" scope is sufficient
    export GITLAB_TOKEN=<gitlab-api-token>
    export GITLAB_USER=<gitlab-user-name>
  2. Connect with Vault

    export VAULT_ADDR=https://vault-prod.syn.vshn.net
    vault login -method=ldap username=<your.name>

Update cluster configuration

  1. Verify that your cluster uses component openshift4-terraform v5. If that’s not the case, update the component version to v5.x before compiling the cluster catalog.

  2. Update cluster configuration to use new Exoscale compute instance type identifiers.

    The new exoscale_compute_instance Terraform resource uses different identifiers to specify instance types. We’ve modified the Terraform module to use different variable names for the instance types in the new version. If you haven’t customized instance types for your cluster, you can skip this step.

    Update your cluster configuration to replace the following parameters:

    • infra_sizeinfra_type

    • storage_sizestorage_type

    • worker_sizeworker_type

    • additional_worker_groups[*].sizeadditional_worker_groups[*].type

    • if you were using additional_affinity_group_ids to select a dedicated hypervisor: additional_affinity_group_idsdeploy_target_id

    The new resource uses identifiers of form <family>.<size>, for example Extra-large is now standard.extra-large. See exo compute instance-type list for the list of all types.

  3. Compile the cluster catalog to create a local working directory

    commodore catalog compile "${CLUSTER_ID}"

Migrate Terraform state

  1. Fetch hieradata repo token from Vault

    export HIERADATA_REPO_TOKEN=$(vault kv get \
      -format=json "clusters/kv/lbaas/hieradata_repo_token" | jq '.data.data.token')
    You need the Hieradata repo token so that Terraform can access the LB hieradata Git repo.
  2. Get an Exoscale unrestricted API key for the cluster’s organization from the Exoscale Console.

    export EXOSCALE_API_KEY=EXO... (1)
    export EXOSCALE_API_SECRET=... (2)
    1 Replace with your API key
    2 Replace with your API secret
    You need the Exoscale credentials so that Terraform can refresh the state.
  3. Setup Git author information and create the Terraform environment file. The Git author info is required so that Terraform can commit the hieradata changes.

    export GIT_AUTHOR_NAME=$(git config --global author.name)
    export GIT_AUTHOR_EMAIL=$(git config --global author.email)
    
    cat <<EOF > ./terraform.env
    EXOSCALE_API_KEY
    EXOSCALE_API_SECRET
    HIERADATA_REPO_TOKEN
    GIT_AUTHOR_NAME
    GIT_AUTHOR_EMAIL
    EOF
  4. Setup Terraform

    Prepare Terraform execution environment
    # Set terraform image and tag to be used
    tf_image=$(\
      yq eval ".parameters.openshift4_terraform.images.terraform.image" \
      dependencies/openshift4-terraform/class/defaults.yml)
    tf_tag=$(\
      yq eval ".parameters.openshift4_terraform.images.terraform.tag" \
      dependencies/openshift4-terraform/class/defaults.yml)
    
    # Generate the terraform alias
    base_dir=$(pwd)
    alias terraform='docker run -it --rm \
      -e REAL_UID=$(id -u) \
      --env-file ${base_dir}/terraform.env \
      -w /tf \
      -v $(pwd):/tf \
      --ulimit memlock=-1 \
      "${tf_image}:${tf_tag}" /tf/terraform.sh'
    
    export GITLAB_REPOSITORY_URL=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r '.gitRepo.url' | sed 's|ssh://||; s|/|:|')
    export GITLAB_REPOSITORY_NAME=${GITLAB_REPOSITORY_URL##*/}
    export GITLAB_CATALOG_PROJECT_ID=$(curl -sH "Authorization: Bearer ${GITLAB_TOKEN}" "https://git.vshn.net/api/v4/projects?simple=true&search=${GITLAB_REPOSITORY_NAME/.git}" | jq -r ".[] | select(.ssh_url_to_repo == \"${GITLAB_REPOSITORY_URL}\") | .id")
    export GITLAB_STATE_URL="https://git.vshn.net/api/v4/projects/${GITLAB_CATALOG_PROJECT_ID}/terraform/state/cluster"
    
    pushd catalog/manifests/openshift4-terraform/

    The migration script doesn’t use the terraform alias configured here. Please make sure that you’ve got a Terraform binary available locally (see also section Prerequisites).

    Initialize Terraform
    terraform init \
      "-backend-config=address=${GITLAB_STATE_URL}" \
      "-backend-config=lock_address=${GITLAB_STATE_URL}/lock" \
      "-backend-config=unlock_address=${GITLAB_STATE_URL}/lock" \
      "-backend-config=username=${GITLAB_USER}" \
      "-backend-config=password=${GITLAB_TOKEN}" \
      "-backend-config=lock_method=POST" \
      "-backend-config=unlock_method=DELETE" \
      "-backend-config=retry_wait_min=5"
  5. Migrate state

    Run the migration script
    ./migrate-state-exoscale-v4-v5.py
    Verify state using plan
    terraform plan

    You can expect the following changes:

    • The managed Floaty Exoscale access key will be created

    • The LB hieradata will be updated to use the new Floaty access key

    • All compute instances will be updated to use their FQDN instead of their hostname for field name

      Despite what the web console claims, this change doesn’t require the instances to be restarted.
    • Field private_network_ids of all compute instances is added

    • The admin SSH key resource is recreated

  6. Apply the changes.

    terraform apply
  7. Merge the hieradata MR.

  8. Run Puppet on the cluster’s LBs, so that you’re using the new Floaty API key managed by Terraform

    for id in 0 1; do
      lb_fqdn=$(terraform state show "module.cluster.module.lb.exoscale_domain_record.lb[$id]" | grep hostname | cut -d'=' -f2 | tr -d ' "\r\n')
      echo "${lb_fqdn}"
      ssh "${lb_fqdn}" sudo puppetctl run
    done
  9. Fetch and then remove the old Floaty API key from Vault

    OLD_FLOATY_KEY=$(vault kv get -format=json \
      clusters/kv/${TENANT_ID}/${CLUSTER_ID}/floaty | \
      jq -r '.data.data.iam_key')
    
    vault kv delete clusters/kv/${TENANT_ID}/${CLUSTER_ID}/floaty
  10. Revoke the old Floaty access key

    Don’t remove the old Floaty API key before you’ve ensured that the new API key has been rolled out on the LBs. Otherwise, Floaty won’t be able to migrate the Elastic IPs between the two LBs until you roll out the new key.

    Print out the Terraform-managed key
    NEW_FLOATY_KEY=$(terraform state show "module.cluster.module.lb.exoscale_iam_access_key.floaty" |\
      grep id | cut -d'=' -f2 | tr -d ' "\r\n')
    echo "Terraform-managed key: ${NEW_FLOATY_KEY}"
    Revoke the old key
    exo iam access-key revoke "${OLD_FLOATY_KEY}"