upgrading cert-manger not working
Context: I installed cert-manager and nginx-ingress so long ago, about 2 years back. Recently I received the email to inform that the certificates are about to expire soon, in 15 days. Close to the date, I kinda panick and try to renew, and upgrading the cert-manager at the same time. 2 years is a loong long time.
See the release milestone of the cert-manager here, you see why 2 years upgrading is a nightmare.
Old releases
Release Release Date EOL Compatible Kubernetes versions Compatible OpenShift versions
1.8 Apr 05, 2022 Oct 17, 2022 1.19 → 1.24* 4.6 → 4.11
1.7 Jan 26, 2021 Jul 22, 2022 1.18 → 1.23* 4.5 → 4.9
1.6 Oct 26, 2021 Apr 05, 2022 1.17 → 1.22 4.4 → 4.9
1.5 Aug 11, 2021 Jan 26, 2022 1.16 → 1.22 4.3 → 4.8
1.4 Jun 15, 2021 Oct 26, 2021 1.16 → 1.21 4.3 → 4.7
1.3 Apr 08, 2021 Aug 11, 2021 1.16 → 1.21 4.3 → 4.7
1.2 Feb 10, 2021 Jun 15, 2021 1.16 → 1.21 4.3 → 4.7
1.1 Nov 24, 2020 Apr 08, 2021 1.11 → 1.21 3.11 → 4.7
1.0 Sep 02, 2020 Feb 10, 2021 1.11 → 1.21 3.11 → 4.7
0.16 Jul 23, 2020 Nov 24, 2020 1.11 → 1.21 3.11 → 4.7
0.15 May 06, 2020 Sep 02, 2020 1.11 → 1.21 3.11 → 4.7
0.14 Mar 11, 2020 Jul 23, 2020 1.11 → 1.21 3.11 → 4.7
0.13 Jan 21, 2020 May 06, 2020 1.11 → 1.21 3.11 → 4.7
0.12 Nov 27, 2019 Mar 11, 2020 1.11 → 1.21 3.11 → 4.7
0.11 Oct 10, 2019 Jan 21, 2020 1.9 → 1.21 3.09 → 4.7
Upgrading mistakes:
Scare that the down-grade of the cert-manager will cause our endpoints becoming unsecured, and affected to the current system, so I made few mistakes along the way
- I didn’t remove/uninstall cert-manager.
- Try and error with kubectl with so many different versions.
- Looking error in the wrong places.
- Ignore the errors
The developer (nswanka) described the issue that so related to me here.
I installed again cert manager v0.16.1 and restored from backup. Certificate restore was not working, so that we thought of upgrading cert manager version v1.3.1 without any conversion of existing certificates. (Me: ignore errors)
That did not helped in restoring the certificate and we uninstall and re-installed v0.16.1. That brang the disaster. (Me: what i did in panic mode)
Now re-installation was not working( may be we did some extra steps as well, we dont remember). (Me: what i did in panic mode, just praying it worked somehow, or crazy searching on the internet)
We tried several uninstall and reinstall for v.0.16.1 and v1.3.1 and kubectl commands for getting ClusterIssuer was not working and as well as was not working for getting certificates list. System was total broken.
Reason was that CRD : certificaterequests.cert-manager.io was being shown as”Deletion in progress“, which could not get completed as cert manager hook container was no more (Me: ignore errors)
We installed again the cert manager v.1.3.1 and found that cert manager hook container was up but now it was giving SSL certificate error while deleting above CRDs.
So what works for me?
- Delete all previously installed of cert-manager and the namespace
# Proxy so we can connect to apiserver on our local computer
kubectl proxy &
# Get the current configuration of cert-manager, replace the finalizers to empty, remote its dependencies
kubectl get namespace 'cert-manager' -o json |jq '.spec = {"finalizers":[]}' >temp.json
# Apply the temp.json file aboved to make changes.
curl -k -H "Content-Type: application/json" -X PUT --data-binary @temp.json 127.0.0.1:8001/api/v1/namespaces/cert-manager/finalize
- Get the apiserver and check its status
kubectl get apiservice
# Delete anything that not healthy or too old
# This is the list that I removed
kubectl delete apiservice v1alpha2.certmanager.k8s.io
kubectl delete apiservice v1alpha2.certmanager.k8s.io
kubectl delete apiservice v1.certmanager.k8s.io
kubectl delete apiservice v1.certmanager.k8s.io --all-namespaces
kubectl delete apiservice v1alpha1.certmanager.k8s.io
kubectl delete apiservice v1alpha2.certmanager.k8s.io
kubectl get endpoints -n cert-manager cert-manager-webhook
kubectl delete apiservice v1beta1.webhook.cert-manager.io
kubectl delete apiservice v1alpha2.cert-manager.io
kubectl delete apiservice v1alpha3.cert-manager.io
- Delete the custom resource definition, make sure you delete all old CRDs, the ones with xxxxx.cert-manager.io was being shown as”Deletion in progress”, or error when using helm install
🐬 $ helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.3.1 --set installCRDs=true
Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: CustomResourceDefinition "certificates.cert-manager.io" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "cert-manager"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "cert-manager"
kubectl get crd --all-namespaces
kubectl delete crd certificates.cert-manager.io
If you can’t delete them, you can patch before deleting it again.
kubectl patch crd/certificates.cert-manager.io -p '{"metadata":{"finalizers":[]}}' --type=merge
- Install the cert-manager again, with helm (strongly recommended)
export NAME_SPACE=cert-manager
kubectl create namespace $NAME_SPACE
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager --namespace $NAME_SPACE --create-namespace --version v1.3.1 --set installCRDs=true
*Note: I can’t use the newer version due to old version of kubernetes / old cluster API.
After following steps above, the cert-manager package was successfully installed, and my certificates are renewed!
It took me few days in wary 🙁 I followed a lot of recommend solutions but none of them works until I found above one. I am happy to run a lot of commands to verify them
kubectl get certificates --all-namespaces
kubectl get clusterissuer
kubectl -n <cluster-namespace> describe certificate default-tls
— Extra notes:
ly@workstation:/workspace/devops/<your-project> (master) [<cluster-context> | <cluster-namespace>]
🐬 $ kubectl get certificates --all-namespaces
Error from server: conversion webhook for cert-manager.io/v1alpha2, Kind=Certificate failed: Post https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s: x509: certificate signed by unknown authority (possibly because of "x509: ECDSA verification failure" while trying to verify candidate authority certificate "cert-manager-webhook-ca")
In the message above, I tried searching the solution for the message “x509: certificate signed by unknown authority (possibly because of "x509: ECDSA verification failure" while trying to verify candidate authority certificate "cert-manager-webhook-ca"
” whereas the solution should be found earlier if I looked for
“conversion webhook for cert-manager.io/v1alpha2
“
I’ve referred these page for help
- https://poopcode.com/error-rendered-manifests-contain-a-resource-that-already-exists-unable-to-continue-with-install-helm-error-how-to-fix/
- https://github.com/cert-manager/cert-manager/issues/2757
- https://cert-manager.io/docs/installation/upgrading/upgrading-0.16-1.0/
- https://cert-manager.io/v1.2-docs/concepts/webhook/
- https://cert-manager.io/v1.1-docs/usage/kubectl-plugin/