Upgrading to Ceph-CSI v3.9.0

Starting from component version v6.0.0, the component deploys Ceph-CSI v3.9.0 by default.

Ceph-CSI has been updated to pass any mount options configured for CephFS volumes to the NodeStageVolume calls which create bind mounts for existing volumes.

We previously configured mount option discard for CephFS, which isn’t a supported option for bind mounts. However, the option is unnecessary for CephFS anyway, so we remove it completely from the generated CephFS storage class in component version v6.0.0.

This how-to is intended for users which are upgrading an existing Rook-Ceph setup to component version v6.0.0 from a previous component version.

Prerequisites

cluster-admin access to the cluster
Access to Project Syn configuration for the cluster, including a method to compile the catalog
kubectl
jq

Steps

Check mount options for all CephFS volumes. If this command shows custom mount options for any volumes, you’ll want to handle those volumes separately.
```
kubectl get pv -ojson | \
  jq -r '.items[] | select(.spec.storageClassName=="cephfs-fspool-cluster") | "\(.metadata.name) \(.spec.mountOptions)" '
```

Remove mount option discard from all existing CephFS volumes.

for pv in $(kubectl get pv -ojson |\
  jq -r '.items[] | select(.spec.storageClassName=="cephfs-fspool-cluster" and (.spec.mountOptions//[]) == ["discard"]) | .metadata.name');
do
  kubectl patch --as=cluster-admin pv $pv --type=json \
    -p '[{"op": "replace", "path": "/spec/mountOptions", "value": [] }]'
done

Upgrade component to v6.0.0

Check for any CephFS volumes which got provisioned between step 2 and the upgrade and remove mount option discard for those volumes.

kubectl get pv -ojson | \
  jq -r '.items[] | select(.spec.storageClassName=="cephfs-fspool-cluster" and (.spec.mountOptions//[]) == ["discard"]) | "\(.metadata.name) \(.spec.mountOptions)" '

Finally, you should make sure you replace the existing CSI driver holder pods (if they’re present on your cluster) with updated pods to ensure you’re not getting any spurious DaemonSetRolloutStuck alerts.

This needs to be done for each node after a node drain to ensure no Ceph-CSI mounts are active on the node

node_selector="node-role.kubernetes.io/worker" (1)
timeout=300s (2)
for node in $(kubectl get node -o name -l $node_selector); do
  echo "Draining $node"
  if !kubectl drain --ignore-daemonsets --delete-emptydir-data --timeout=$timeout $node
  then
    echo "Drain of $node failed... exiting"
    break
  fi
  echo "Deleting holder pods for $node"
  kubectl -n syn-rook-ceph-operator delete pods \
    --field-selector spec.nodeName=${node//node\/} -l app=csi-cephfsplugin-holder
  kubectl -n syn-rook-ceph-operator delete pods \
    --field-selector spec.nodeName=${node//node\/} -l app=csi-rbdplugin-holder
  echo "Uncordoning $node"
  kubectl uncordon $node
done

1	Adjust the node selector to the set of nodes you want to drain
2	Adjust if you expect node drains to be slower or faster than 5 minutes