Upgrade component version v1 to v2

When migrating from component version v1 to v2, the following breaking changes require manual setup of zone aware replication on existing instances:

  • mimir-distributed enables zone aware replication for ingesters by default. An upgrade without manual migration will incur data loss.

Prerequisites

You need to be able to execute the following CLI tools locally:

Setup environment

  1. Access to API

    # For example: https://api.syn.vshn.net
    # IMPORTANT: do NOT add a trailing `/`. Commands below will fail.
    export COMMODORE_API_URL=<lieutenant-api-endpoint>
    
    export CLUSTER_ID=<lieutenant-cluster-id> # Looks like: c-<something>
    export TENANT_ID=$(curl -sH "Authorization: Bearer $(commodore fetch-token)" ${COMMODORE_API_URL}/clusters/${CLUSTER_ID} | jq -r .tenant)
    
    # Instance name under which you include the mimir component in your tenant repo
    export MIMIR_INSTANCE=<mimir-instance-name>
    export MIMIR_PARAM_KEY="$( echo $MIMIR_INSTANCE | sed -e 's/-/_/g' )"

Upgrade the component without zone aware replication

  1. Compile the cluster catalog to create a local working directory

    commodore catalog compile "${CLUSTER_ID}"
  2. Update the component version to v2.x in your cluster configuration:

    cd inventory/classes/$TENANT_ID
    
    yq eval -i '.parameters.components.mimir.version = "v2.0.1"' ${CLUSTER_ID}.yml
    
    cd ../../..
  3. Add temporary migration configuration:

    cd inventory/classes/$TENANT_ID
    
    yq eval -i ".parameters.${MIMIR_PARAM_KEY}.helm_values.alertmanager.zoneAwareReplication.enabled = false" ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.${MIMIR_PARAM_KEY}.helm_values.ingester.zoneAwareReplication.enabled = false" ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.${MIMIR_PARAM_KEY}.helm_values.store_gateway.zoneAwareReplication.enabled = false" ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.${MIMIR_PARAM_KEY}.helm_values.rollout_operator.enabled = false" ${CLUSTER_ID}.yml
    
    git commit -am"Update component mimir" && git push origin master
    
    cd ../../..

    This configuration disables zone aware replication. In the first step, the new component version is deployed with this setting disabled.

  4. Compile and push the cluster catalog

  5. Wait for all relevant pods to start and become ready

Migrate Alertmanager

  1. Update the confiugration for alertmanager

    cd inventory/classes/$TENANT_ID
    
    yq eval -i ".parameters.${MIMIR_PARAM_KEY}.helm_values.alertmanager.zoneAwareReplication.enabled = true" ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.${MIMIR_PARAM_KEY}.helm_values.alertmanager.zoneAwareReplication.migration.enabled = true" ${CLUSTER_ID}.yml
    
    yq eval -i ".parameters.${MIMIR_PARAM_KEY}.helm_values.rollout_operator.enabled = true" ${CLUSTER_ID}.yml
    
    git commit -am"Migrate $MIMIR_INSTANCE alertmanager - step 1" && git push origin master
    
    cd ../../..
  2. Compile and push the cluster catalog

  3. Wait for all alertmanager pods to restart and become ready

  4. Once that’s done, enable the write path for the new zone aware alertmanager:

    cd inventory/classes/$TENANT_ID
    
    yq eval -i ".parameters.${MIMIR_PARAM_KEY}.helm_values.alertmanager.zoneAwareReplication.migration.writePath = true" ${CLUSTER_ID}.yml
    
    git commit -am"Migrate $MIMIR_INSTANCE alertmanager - step 2" && git push origin master
    
    cd ../../..
  5. Compile and push the cluster catalog

  6. Wait for the new zone-aware alertmanager pods to become ready

  7. Next, set the final configuration (meaning the default configuration) by removing the custom alertmanager config block:

    cd inventory/classes/$TENANT_ID
    
    yq eval -i "del(.parameters.${MIMIR_PARAM_KEY}.helm_values.alertmanager)" ${CLUSTER_ID}.yml
    
    git commit -am"Migrate $MIMIR_INSTANCE alertmanager - final step" && git push origin master
    
    cd ../../..
  8. Compile and push the cluster catalog

  9. Wait for the old zone-unaware alertmanager pods to be terminated

Migrate store-gateway

  1. Disable auto sync on the root and mimir apps in ArgoCD

    kubectl --as=cluster-admin -n syn patch apps root --type=json -p '[{"op":"replace", "path":"/spec/syncPolicy", "value": {}}]'
    kubectl --as=cluster-admin -n syn patch apps $MIMIR_INSTANCE --type=json -p '[{"op":"replace", "path":"/spec/syncPolicy", "value": {}}]'
  2. Scale down the store-gateway pods

    kubectl scale sts -l app.kubernetes.io/component=store-gateway -n$MIMIR_INSTANCE --replicas=0 --as cluster-admin
  3. Wait for all store-gateway pods to terminate

  4. Next, set the final configuration (meaning the default configuration) by removing the custom store_gateway config block:

    cd inventory/classes/$TENANT_ID
    
    yq eval -i "del(.parameters.${MIMIR_PARAM_KEY}.helm_values.store_gateway)" ${CLUSTER_ID}.yml
    
    git commit -am"Migrate $MIMIR_INSTANCE store_gateway" && git push origin master
    
    cd ../../..
  5. Compile and push the cluster catalog

  6. Re-enable argocd sync

    kubectl --as=cluster-admin -n syn patch apps root --type=json -p '[{"op":"replace", "path":"/spec/syncPolicy", "value": {"automated":{"prune":true,"selfHeal":true}}}]'
  7. Wait for the new store-gateways to become ready

Migrate ingesters

  1. Configure the ingesters to flush data on shutdown

    cd inventory/classes/$TENANT_ID
    
    yq eval -i ".parameters.${MIMIR_PARAM_KEY}.helm_values.mimir.structuredConfig.blocks_storage.tsdb.flush_blocks_on_shutdown = true" ${CLUSTER_ID}.yml
    yq eval -i ".parameters.${MIMIR_PARAM_KEY}.helm_values.mimir.structuredConfig.ingester.ring.unregister_on_shutdown = true" ${CLUSTER_ID}.yml
    
    git commit -am"Migrate $MIMIR_INSTANCE ingesters - step 1" && git push origin master
    
    cd ../../..
  2. Compile and push the cluster catalog

  3. Wait for all ingester pods to restart and become ready

  4. Disable auto sync on the root and mimir apps in ArgoCD

    kubectl --as=cluster-admin -n syn patch apps root --type=json -p '[{"op":"replace", "path":"/spec/syncPolicy", "value": {}}]'
    kubectl --as=cluster-admin -n syn patch apps $MIMIR_INSTANCE --type=json -p '[{"op":"replace", "path":"/spec/syncPolicy", "value": {}}]'
  5. Scale down the traffic to the ingesters

    kubectl scale deploy -l app.kubernetes.io/component=nginx -n$MIMIR_INSTANCE --replicas=0 --as cluster-admin
    kubectl scale deploy -l app.kubernetes.io/component=gateway -n$MIMIR_INSTANCE --replicas=0 --as cluster-admin
  6. Wait for all nginx and gateway pods to terminate

  7. Next, scale down the old zone-unaware ingesters as well:

    kubectl scale sts -l app.kubernetes.io/component=ingester -n$MIMIR_INSTANCE --replicas=0 --as cluster-admin
  8. Wait for the ingesters to terminate

  9. Now, enable the zone-aware ingesters

    cd inventory/classes/$TENANT_ID
    
    yq eval -i ".parameters.${MIMIR_PARAM_KEY}.helm_values.ingester.zoneAwareReplication.enabled = true" ${CLUSTER_ID}.yml
    yq eval -i "del(.parameters.${MIMIR_PARAM_KEY}.helm_values.ingester.replicas)" ${CLUSTER_ID}.yml
    yq eval -i "del(.parameters.${MIMIR_PARAM_KEY}.helm_values.mimir)" ${CLUSTER_ID}.yml
    
    git commit -am"Migrate $MIMIR_INSTANCE ingesters - step 2" && git push origin master
    
    cd ../../..
  10. Compile and push the cluster catalog

  11. Re-enable argocd sync

    kubectl --as=cluster-admin -n syn patch apps root --type=json -p '[{"op":"replace", "path":"/spec/syncPolicy", "value": {"automated":{"prune":true,"selfHeal":true}}}]'
  12. Wait for the new ingesters to become ready

  13. Next, set the final configuration (meaning the default configuration) by removing the custom configuration:

    cd inventory/classes/$TENANT_ID
    
    yq eval -i "del(.parameters.${MIMIR_PARAM_KEY}.helm_values.ingester)" ${CLUSTER_ID}.yml
    yq eval -i "del(.parameters.${MIMIR_PARAM_KEY}.helm_values.nginx)" ${CLUSTER_ID}.yml
    yq eval -i "del(.parameters.${MIMIR_PARAM_KEY}.helm_values.gateway)" ${CLUSTER_ID}.yml
    yq eval -i "del(.parameters.${MIMIR_PARAM_KEY}.helm_values.rollout_operator)" ${CLUSTER_ID}.yml
    
    git commit -am"Finalize $MIMIR_INSTANCE mimir migration" && git push origin master
    
    cd ../../..

    Alternatively, if your cluster configuration didn’t previously contain parameters for the mimir component, you can remove the entire parameter block for your component instance.

    cd inventory/classes/$TENANT_ID
    
    yq eval -i "del(.parameters.${MIMIR_PARAM_KEY})" ${CLUSTER_ID}.yml
    
    git commit -am"Finalize $MIMIR_INSTANCE mimir migration" && git push origin master
    
    cd ../../..
  14. Compile and push the cluster catalog

  15. Wait for all relevant pods to become ready

Cleanup

  1. If all your mimir instances are migrated, move the component version parameter to the appropriate place in your hierarchy