Is your Google Container Registry filling up, taking up storage and becoming expensive? How to handle images retention as a service?
Amazon’s Elastic Container Registry has a feature called Lifecycle Policies to handle images retention. Google doesn’t have this feature. There is a feature request in their tracker since Aug 2018 and there is not ETA for it so far…
There is a popular bash script from Ahmet and Go in CloudRun from Seth but none of them solve the requirements I needed. What exactly do I need?
I wanna scan my whole GCR and delete the digests that are:
When I check these requirements, I want to apply these lifecycle policies to all the images.
Say I have few images in GCR with certain prefixes:
eu.gcr.io/my-project/foo/bar/my-service:123
eu.gcr.io is the docker registry endpoint
my-project is ID of my GCP project
foo/bar is the prefix (“repo”)
my-service is an image name
123 is a tag
my-service:123 is an image with a tag, but wait, what is the digest?
A docker image digest is an ID (hashing algorithm used and the hash computed). The digest can look like this:
@sha256:296e2378f7a14695b2f53101a3bd443f656f823c46d13bf6406b91e9e9950ef0
You can tag a digest with several tags, even zero tags = untagged image.
Let’s say build an image my-service and push it to docker registry. When pushing, I tag it with :123. The new produced digest has two tags, :123 and :latest.The digest that was tagged :latest before I pushed this image, got the :latest tag removed.
If I remove a tag from an image in GCR, I simply remove a tag from the digest, I don’t delete the digest though.
What I can delete, in order to save some space, is the digest, like this:
gcloud container images delete -q — force-delete-tags eu.gcr.io/my-project/foo/bar/my-service@sha256:296e2378f7a14695b2f53101a3bd443f656f823c46d13bf6406b91e9e9950ef0
Then, what do I need to do?
How to check if they match the rules:
you could use standard kubectl to fetch the data:
kubectl get rs,po --all-namespaces -o jsonpath={..image} | tr ' ' '\n'
the gcr.io is exposing a docker v2 API, you can use a standard docker client or just curl using the gcloud token
ACCESS_TOKEN=$(gcloud auth print-access-token)
curl --silent --show-error -u_token:"$ACCESS_TOKEN" -X GET "https://eu.gcr.io/v2/_catalog"
I implemented all of this using bash/jq (yep, that wasn’t a smart idea) and published it to github:
Right now I’m running this in Gitlab-CI pipeline on a cron schedule (once a day) to evaluate it’s dry-run logs for production GCP projects.
I’m planning on rewriting this to python (py-kubeng and docker-py) if Google will not to come up with ETA for this feature :(
Q1: What is a container?
A container is a logical “package” that contains everything an application needs in order to function, including the application itself, its dependencies, libraries, and configuration files.
Q2: How did containers evolve, and how are they different from virtual machines (VMs)?
Containers are an evolution from physical servers and then virtual machines. Unlike VMs, which virtualize entire machines including the operating system, containers only virtualize the application. Because they do not contain an OS image, containers are much smaller, more portable, and more flexible than VMs.
Q3: What is the primary advantage of containers being independent of their environment?
The primary advantage is that a container with your application can be run seamlessly in any environment — whether it’s AWS, GCP, Azure, a private data center, or a developer’s laptop. This allows developers to focus on application development without having to worry about where or how their applications will run.
Q4: How do containers lead to more efficient resource utilization?
Containers are small compared to conventional virtual machines and require fewer resources like memory or CPU. As a result, you can use your physical server resources more efficiently by stacking multiple containers on one server in a smart way, making the most out of the available hardware.
Q5: In what ways do containers improve the agility and stability of development and operations?
Because they are lightweight, containers can be started, stopped, replicated (horizontally scaled), or patched very quickly. This allows development and operations teams to be more independent, spend less time on debugging, and achieve faster development cycles, a quicker time-to-market, and a more stable infrastructure that can respond immediately to frequent changes like new releases or traffic spikes.
Q6: Why are containers considered an ideal solution for a hybrid infrastructure?
Since containers can be run anywhere, they are perfectly suited for hybrid infrastructure setups where applications run partially in a private data center and partially on a public cloud. Currently, there is no better solution for designing a hybrid infrastructure than to “containerize” your applications.