Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The big problem is that the docker client makes it nearly impossible to audit a large deployment to make sure it’s not accidentally talking to docker hub.

This is by design, according to docker.

I’ve never encountered anyone at any of my employers that wanted to use docker hub for anything other than a one-time download of a base image like Ubuntu or Alpine.

I’ve also never seen a CD deployment that doesn’t repeatedly accidentally pull in a docker hub dependency, and then occasionally have outages because of it.

It’s also a massive security hole.

Fork it.



> This is by design, according to docker.

I have a vague memory of reading something to that effect on their bug tracker, but I always thought the reasoning was ok. IIRC it was something to the effect that the goal was to keep things simple for first time users. I think that's disservice to users, because you end up with many refusing to learn how things actually work, but I get the sentiment.

> I’ve also never seen a CD deployment that doesn’t repeatedly accidentally pull in a docker hub dependency, and then occasionally have outages because of it.

There's a point where developers need to take responsibility for some of those issues. The core systems don't prevent anyone from setting up durable build pipelines. Structure the build like this [1]. Set up a local container registry for any images that are required by the build and pull/push those images into a hosted repo. Use a pull through cache so you aren't pulling the same image over the internet 1000 times.

Basically, gate all registry access through something like Nexus. Don't set up the pull through cache as a mirror on local clients. Use a dedicated host name. I use 'xxcr.io' for my local Nexus and set up subdomains for different pull-through upstreams; 'hub.xxcr.io/ubuntu', 'ghcr.xxcr.io/group/project', etc..

Beyond having control over all the build infrastructure, it's also something that would have been considered good netiquette, at least 15-20 years ago. I'm always surprised to see people shocked that free services disappear when the stats quo seems to be to ignore efficiency as long as the cost of inefficiency is externalized to a free service somewhere.

1. https://phauer.com/2019/no-fat-jar-in-docker-image/


> I'm always surprised to see people shocked that free services disappear when the stats quo seems to be to ignore efficiency as long as the cost of inefficiency is externalized to a free service somewhere.

Same. The “I don’t pay for it, why do I care” attitude is abundant, and it drives me nuts. Don’t bite the hand that feeds you, and make sure, regularly, that you’re not doing that by mistake. Else, you might find the hand biting you back.


Block the DNS if you don’t want dockerhub images. Rewrite it to your artifactory.

This is really not complicated and your not entitled to unlimited anonymous usage of any service.


That will most likely fail, since the daemon tries to connect to the registry with SSL and your registry will not have the same SSL certificate as Docker Hub. I don't know if a proxy could solve this.


This is supported in the client/daemon. You configure your client to use a self-hosted registry mirror (e.g. docker.io/distribution or zot) with your own TLS cert (or insecure without if you must) as pull-through cache (that's your search key word). This way it works "automagically" with existing docker.io/ image references now being proxied and cached via your mirror.

You would put this as a separate registry and storage from your actual self-hosted registry of explicitly pushed example.com/ images.

It's an extremely common use-case and well-documented if you try to RTFM instead of just throwing your hands in the air before speculating and posting about how hard or impossible this supposedly is.

You could fall back to DNS rewrite and front with your own trusted CA but I don't think that particular approach is generally advisable given how straightforward a pull-through cache is to set up and operate.


This is ridiculous.

All the large objects in the OCI world are identified by their cryptographic hash. When you’re pulling things when building a Dockerfile or preparing to run a container, you are doing one of two things:

a) resolving a name (like ubuntu:latest or whatever)

b) downloading an object, possibly a quite large object, by hash

Part b may recurse in the sense that an object can reference other objects by hash.

In a sensible universe, we would describe the things we want to pull by name, pin hashes via a lock file, and download the objects. And the only part that requires any sort of authentication of the server is the resolution of a name that is not in the lockfile to the corresponding hash.

Of course, the tooling doesn’t work like this, there usually aren’t lockfiles, and there is no effort made AFAICT to allow pulling an object with a known hash without dealing with the almost entirely pointless authentication of the source server.


Right but then you notice the failing CI job and fix it to correctly pull from your artifact repository. It's definitely doable. We require using an internal repo at my work where we run things like vulnerability scanners.


> since the daemon tries to connect to the registry with SSL

If you rewrite DNS, you should of course also have a custom CA trusted by your container engine as well as appropriate certificates and host configurations for your registry.

You'll always need to take these steps if you want to go the rewrite-DNS path for isolation from external services because some proprietary tool forces you to use those services.


You don't have to run docker. Containerd is available.


It's trivial to audit a large deployment, you look at dns logs.


This is Infamous Dropbox Comment https://news.ycombinator.com/item?id=9224 energy


They didn't say it's easy to fix, just detect.


Is there no way to operate a caching proxy for docker hub?!


There are quite a few docker registries you can self-host. A lot of them also have a pull-through cache.

Artifactory and Nexus are the two I've used for work. Harbor is also popular.

I can't think of the name right now, but there are some cool projects doing a p2p/distributed type of cache on the nodes directly too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: