Things shifted slightly in the Cloud Native world recently, when the Docker Hub turned on rate limiting. If you run a Kubernetes cluster, or make extensive use of Docker images, this is something you need to be aware of as it could cause outages. In particular, if you are suddenly finding a lot of Kubernetes pods failing with ErrImagePull
and event messages like:
Failed to pull image "ratelimitalways/test:latest": rpc error: code = Unknown desc = Error response from daemon: pull access denied for ratelimitalways/test, repository does not exist or may require 'docker login': denied: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
Then you’ve probably been hit by the rate limiting. Essentially, in order to control costs, the Docker Hub now controls the speed at which image pulls can be made. The rules are:
Large clusters and CI/CD platforms that use the Hub are likely to hit these limits—in these situations you are likely to have multiple nodes pulling from the same IP address (or what appears to the Hub as the same address).
The first thing you might want to do is find out what images from the Docker Hub you’re using. Remember that the Docker Hub controls the ‘default namespace’ for container images, so it’s not always obvious where images come from.
If you run the following on a Kubernetes cluster, it should identify all images from the Docker Hub that use the normal naming convention:
$ kubectl get pods --all-namespaces -o jsonpath="{.items[*].spec.containers[*].image}" | tr -s '[[:space:]]' '\n' | grep -v "[^.]*[:.].*/" | sort | uniq>
This won’t identify images that explicitly reference the Docker Hub— i.e., images like “docker.io/library/postgres:latest”
. You can find these with the rather simpler expression:
$ kubectl get pods --all-namespaces -o jsonpath="{.items[*].spec.containers[*].image}" | tr -s '[[:space:]]' '\n' | grep “^docker.io" | sort | uniq
So what’s the best way to solve this problem? It will depend on how quick you need to get this sorted, but your options are:
Pay for Docker Hub licenses. It’s not expensive, but Docker pricing is per team member, which can be a little confusing when what you actually want to license is a cluster of 100 Kubernetes nodes. To make sure you’re in the clear here, opt for the team membership unless it’s a very small cluster.
To use the new credentials, you will need to add image pull secrets to your deployments. Note that image pull secrets can be added to the default service account, so you don’t have to manually update every deployment.
It’s worth pointing out that most of these aren’t mutually exclusive—you can pay for the Docker Hub to get you out of a bind, then move to a solution that uses both of the final two options.
In the long run, I would recommend that most clusters should be set up with their own registry and the cluster should only be allowed to run images from that registry. Any third-party images, such as Docker official images, can be proxy-cached.
This will provide a fall-back in the case of remote outages: As well as having a local copy that can be used, the registry also provides a place where new images can be pushed, allowing updates to still take place when the remote registry can’t be reached. In a lot of cases it may be worth taking this further and ‘gating’ all third-party content to protect against bad upstream content. In this set-up, images are tested and verified before being added to the organisational registry.
To give an example of where this helps, imagine a bad image is pushed to the nginx:1.19 repo on Docker Hub (see this NodeJS Docker issue for a real world example). If your set-up pulls this version into the cache, you’ll be stuck until a fix is pushed, but if you used gating, it should never have hit you in the first place, and you should also have a history of old images in case you need to roll back.
So what’s the takeaway from all this? We need to be more careful and thoughtful with our software supply chains. I think this is going to be a big topic in the future, and we can already see hints of where things are going in the Notary and Grafeas projects.
Photo by Pau Casals on Unsplash