Traefik upgrade

The overall process

Context

  • Upgrading traefik from 1.7.26 (chart traefik-1.87.7) to 2.5.6 (chart 10.9.1) isn’t possible because of labels immutability:
❯ helm upgrade -i traefik -n default -f values.yaml traefik/traefik --version 10.9.1 --force --debug
history.go:56: [debug] getting history for release traefik
upgrade.go:139: [debug] preparing upgrade for traefik
upgrade.go:147: [debug] performing update for traefik
upgrade.go:319: [debug] creating upgraded release for traefik
client.go:218: [debug] checking 5 resources for changes
client.go:493: [debug] Replaced "traefik" with kind ServiceAccount for kind ServiceAccount
client.go:493: [debug] Replaced "traefik" with kind ClusterRole for kind ClusterRole
client.go:493: [debug] Replaced "traefik" with kind ClusterRoleBinding for kind ClusterRoleBinding
client.go:250: [debug] error updating the resource "traefik":
	 failed to replace object: Deployment.apps "traefik" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"traefik", "app.kubernetes.io/name":"traefik"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
client.go:250: [debug] error updating the resource "traefik":
	 failed to replace object: Service "traefik" is invalid: spec.clusterIPs[0]: Invalid value: []string(nil): primary clusterIP can not be unset
upgrade.go:430: [debug] warning: Upgrade "traefik" failed: failed to replace object: Deployment.apps "traefik" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"traefik", "app.kubernetes.io/name":"traefik"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && failed to replace object: Service "traefik" is invalid: spec.clusterIPs[0]: Invalid value: []string(nil): primary clusterIP can not be unset
Error: UPGRADE FAILED: failed to replace object: Deployment.apps "traefik" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"traefik", "app.kubernetes.io/name":"traefik"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && failed to replace object: Service "traefik" is invalid: spec.clusterIPs[0]: Invalid value: []string(nil): primary clusterIP can not be unset
helm.go:88: [debug] failed to replace object: Deployment.apps "traefik" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"traefik", "app.kubernetes.io/name":"traefik"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && failed to replace object: Service "traefik" is invalid: spec.clusterIPs[0]: Invalid value: []string(nil): primary clusterIP can not be unset
helm.sh/helm/v3/pkg/kube.(*Client).Update
	helm.sh/helm/v3/pkg/kube/client.go:263
helm.sh/helm/v3/pkg/action.(*Upgrade).releasingUpgrade
	helm.sh/helm/v3/pkg/action/upgrade.go:375
runtime.goexit
	runtime/asm_arm64.s:1133
UPGRADE FAILED
main.newUpgradeCmd.func2
	helm.sh/helm/v3/cmd/helm/upgrade.go:202
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/cobra@v1.2.1/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/cobra@v1.2.1/command.go:974
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/cobra@v1.2.1/command.go:902
main.main
	helm.sh/helm/v3/cmd/helm/helm.go:87
runtime.main
	runtime/proc.go:255
runtime.goexit
	runtime/asm_arm64.s:1133
  • You can see the problem about the labels because they’re immutable, therefore, what I have done is to deploy a parallel traefik deployment (in version 2.5.6) which sits with the older version (1.8.7).

There are no issues, because the ELB which Traefik creates by default (if you’re using AWS) is not used in our DNS. When I want to test it externally, I do need to create a new CNAME and point it to the:

NAME                                      TYPE           CLUSTER-IP       EXTERNAL-IP                                                              PORT(S)                      AGE
cerebro                                   ClusterIP      100.65.108.89    <none>                                                                   80/TCP                       539d
kubernetes                                ClusterIP      100.64.0.1       <none>                                                                   443/TCP                      540d
tokenservice                              ClusterIP      100.66.106.178   <none>                                                                   3000/TCP                     390d
traefik                                   LoadBalancer   100.67.75.183    a2c139e9a42564cbb897203cd2f9f16d-740162447.us-east-1.elb.amazonaws.com   443:30809/TCP,80:32518/TCP   11d
traefik-dashboard                         ClusterIP      100.67.228.71    <none>                                                                   80/TCP                       11d
traefik2                                  LoadBalancer   100.69.196.88    a60c47c8fa23e4f09b562afc93e80274-894996197.us-east-1.elb.amazonaws.com   80:31852/TCP,443:30701/TCP   5h57m

Installing Traefik2 in parallel with current deployment.

  • Then as this doesn’t work, we will install Traefik v2 as a parallel deployment:

 ~/Kodify/infrastructure/kubernetes/staging/namespaces/default/traefik │ SYS-1556/Traefik_upgrade  date;helm install traefik2 -f values.yaml traefik/traefik --version 10.9.1 --debug                                                                                                         ✔ │ tubes-stg ⎈ │ 12:50:30
Mon Mar  7 12:51:41 CET 2022
install.go:178: [debug] Original chart version: "10.9.1"
install.go:199: [debug] CHART PATH: /Users/dba7x/Library/Caches/helm/repository/traefik-10.9.1.tgz

client.go:128: [debug] creating 1 resource(s)
install.go:151: [debug] CRD ingressroutes.traefik.containo.us is already present. Skipping.
client.go:128: [debug] creating 1 resource(s)
install.go:151: [debug] CRD ingressroutetcps.traefik.containo.us is already present. Skipping.
client.go:128: [debug] creating 1 resource(s)
install.go:151: [debug] CRD ingressrouteudps.traefik.containo.us is already present. Skipping.
client.go:128: [debug] creating 1 resource(s)
install.go:151: [debug] CRD middlewares.traefik.containo.us is already present. Skipping.
client.go:128: [debug] creating 1 resource(s)
install.go:151: [debug] CRD middlewaretcps.traefik.containo.us is already present. Skipping.
client.go:128: [debug] creating 1 resource(s)
install.go:151: [debug] CRD serverstransports.traefik.containo.us is already present. Skipping.
client.go:128: [debug] creating 1 resource(s)
install.go:151: [debug] CRD tlsoptions.traefik.containo.us is already present. Skipping.
client.go:128: [debug] creating 1 resource(s)
install.go:151: [debug] CRD tlsstores.traefik.containo.us is already present. Skipping.
client.go:128: [debug] creating 1 resource(s)
install.go:151: [debug] CRD traefikservices.traefik.containo.us is already present. Skipping.
client.go:128: [debug] creating 5 resource(s)
client.go:299: [debug] Starting delete for "traefik2-dashboard" IngressRoute
client.go:128: [debug] creating 1 resource(s)
NAME: traefik2
LAST DEPLOYED: Mon Mar  7 12:51:49 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
USER-SUPPLIED VALUES:
accessLogs:
  enabled: true
  fields:
    defaultMode: keep
    headers:
      defaultMode: drop
      names:
        Referer: keep
        User-Agent: keep
  format: json
additionalArguments:
- --providers.kubernetesCRD.allowCrossNamespace=true
- --providers.kubernetesingress.ingressclass=traefik-staging
autoscaling:
  maxReplicas: 5
  metrics:
  - resource:
      name: cpu
      targetAverageUtilization: 70
    type: Resource
  minReplicas: 1
dashboard:
  domain: traefik.kodify.com
  enabled: true
debug:
  enabled: false
deployment:
  hostPort:
    dashboardEnabled: true
    httpEnabled: true
    httpsEnabled: true
deploymentStrategy:
  rollingUpdate:
    maxSurge: 1
    maxUnavailable: 1
  type: RollingUpdate
externalTrafficPolicy: Local
forwardedHeaders:
  enabled: false
gzip:
  enabled: true
image:
  name: traefik
  tag: 2.5.6
metrics:
  prometheus:
    enabled: false
  serviceMonitor:
    enabled: false
proxyProtocol:
  enabled: true
  trustedIPs:
  - 10.0.0.0/8
rbac:
  enabled: true
replicas: 1
resources:
  limits:
    memory: 500Mi
  requests:
    cpu: 200m
    memory: 200Mi
sendAnonymousUsage: false
service:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*'
serviceType: LoadBalancer
ssl:
  enabled: true
  enforced: false
  insecureSkipVerify: false
  upstream: false
tolerations: []
tracing:
  enabled: false
  serviceName: traefik
traefikLogFormat: json
websecure:
  tls:
    enabled: true

Then re-formatting the values.yaml file used in the helm release to use the appropiate values and removes the ones not needed as stated in the official chart repository

Issue with HTTPS endpoints

I created a new CNAME record pointing to the new ELB created and when I try to access with curl I found that HTTPS didn’t work correctly.

curl -vvv https://int.fux.com                          ✔ │ 18:06:49
*   Trying 54.156.8.198:443...
* Connected to int.fux.com (54.156.8.198) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-CHACHA20-POLY1305
* ALPN, server accepted to use h2
* Server certificate:
*  subject: OU=Domain Control Validated; CN=*.fux.com
*  start date: Jun  7 13:06:34 2021 GMT
*  expire date: Jul  9 13:06:34 2022 GMT
*  subjectAltName: host "int.fux.com" matched cert's "*.fux.com"
*  issuer: C=US; ST=Arizona; L=Scottsdale; O=GoDaddy.com, Inc.; OU=http://certs.godaddy.com/repository/; CN=Go Daddy Secure Certificate Authority - G2
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x13e811e00)
> GET / HTTP/2
> Host: int.fux.com
> user-agent: curl/7.77.0
> accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
< HTTP/2 404
< content-type: text/plain; charset=utf-8
< x-content-type-options: nosniff
< content-length: 19
< date: Mon, 07 Mar 2022 17:06:50 GMT
<
404 page not found
* Connection #0 to host int.fux.com left intact

So, HTTP works but not HTTPs (although is presenting me the certifictate correctly). Then, what what am I missing?

After reviewing the breaking changes from Traefik 1.x and 2.x, in my case, the problem is that my ingress resource doesn’t have the necessary annotation to enable TLS.

And if I check the ingress (here I am using the Ingress resource from K8s not the Ingressroute from Traefik) I see that is missing traefik.ingress.kubernetes.io/router.tls: "true":

k get ingress -n fux fux-front -o yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    ingress.kubernetes.io/ssl-redirect: "false"
    ingress.kubernetes.io/whitelist-x-forwarded-for: "true"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"extensions/v1beta1","kind":"Ingress","metadata":{"annotations":{"ingress.kubernetes.io/ssl-redirect":"false","ingress.kubernetes.io/whitelist-x-forwarded-for":"true","kubernetes.io/ingress.class":"traefik-staging","kubernetes.io/tls-acme":"true","meta.helm.sh/release-name":"fux-front","meta.helm.sh/release-namespace":"","status":{"loadBalancer":{}}}
    kubernetes.io/ingress.class: traefik-staging
    kubernetes.io/tls-acme: "true"
    meta.helm.sh/release-name: fux-front
    meta.helm.sh/release-namespace: fux
  creationTimestamp: "2020-12-08T13:55:03Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
  name: fux-front
  namespace: fux
  resourceVersion: "194662430"
  uid: 27406fb6-65f8-44ef-a396-e6bf656144f1
spec:
  rules:
  - host: int.fux.com
    http:
      paths:
      - backend:
          service:
            name: fux-front
            port:
              number: 80
        path: /
        pathType: ImplementationSpecific
  tls:
  • Solution

I just added traefik.ingress.kubernetes.io/router.tls: "true" to my ingress resource by editing or just redeploying your ingress and, finally it works:

 curl -v https://int.fux.com                 0|1 ✘ │ 16:41:40
*   Trying 34.196.145.235:443...
* Connected to int.fux.com (34.196.145.235) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-CHACHA20-POLY1305
* ALPN, server accepted to use h2
* Server certificate:
*  subject: OU=Domain Control Validated; CN=*.fux.com
*  start date: Jun  7 13:06:34 2021 GMT
*  expire date: Jul  9 13:06:34 2022 GMT
*  subjectAltName: host "int.fux.com" matched cert's "*.fux.com"
*  issuer: C=US; ST=Arizona; L=Scottsdale; O=GoDaddy.com, Inc.; OU=http://certs.godaddy.com/repository/; CN=Go Daddy Secure Certificate Authority - G2
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x120010600)
> GET / HTTP/2
> Host: int.fux.com
> user-agent: curl/7.77.0
> accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
< HTTP/2 200
< content-type: text/html; charset=utf-8
< date: Tue, 08 Mar 2022 15:41:46 GMT
< etag: W/"50590-miVJLAHT/A1iTR6izN3u0hAsgL4"
< server: nginx/1.11.13
< strict-transport-security: max-age=15552000; includeSubDomains
< vary: Accept-Encoding
< x-content-type-options: nosniff
< x-dns-prefetch-control: off
< x-download-options: noopen
< x-frame-options: SAMEORIGIN
< x-xss-protection: 1; mode=block
<
    <!DOCTYPE html>
    <html lang="en">
      <head>
        <title>
.
.
.
  <script>
  </script>
  <script async src='https://www.google-analytics.com/analytics.js'></script>

      </body>
    </html>
* Connection #0 to host int.fux.com left intact

Then I upgraded to latest chart version (after changing my values.yaml):

date;helm upgrade traefik2 -f values.yaml traefik/traefik --version 10.14.2

Side notes

  • Found that for Whitelisting IPs, you need to use a middleware instead of the annotation in the ingress. Then in your ingress, point to that middleware you have created.

![[Pasted image 20220405174110.png]]

Middleware created:

![[Pasted image 20220405174158.png]]

  • Good thing to test if it works correctly is pointing your etc/hosts to the Public IP of the LB you have created in Kubernetes.

![[Pasted image 20220405174348.png]]

Then in your /etc/hosts

![[Pasted image 20220405174411.png]]