At the core, this is not a problem with the server, or the CA, but with the clients. However, servers have to deal with broken clients, so it’s easy to point at the server and say it was broken, or to point at the server and say it’s fixed, but that’s not quite the case.
I discussed this some in https://twitter.com/sleevi_/status/1266647545675210753 , as clients need to be prepared to discover and explore alternative certificate paths. Almost every major CA relies on cross-certificates, some even with circular loops (e.g. DigiCert), and clients need to be capable of exploring those certificates and finding what they like. There’s not a single canonical “correct” certificate chain, because of course different clients trust different CAs.
Of course, using shorter lived certificates, and automating them, also helps prepare your servers, by removing the toil from configuring changes and making sure you pickup updates (to the certificate path) in a timely fashion.
Tools like Censys can be used to explore the certificate graph and visualize the nodes and edges. You’ll see plenty of sites rely on this, and that means clients need to not be lazy in how they verify certificates. Or, alternatively, that root stores should impose more rules on how CAs sign such cross-certificates, to reduce the risk posed to the ecosystem by these events.
Given you mention OpenSSL is currently terrible at verifying "real" certificates: why doesn't e.g. Google just throw a bit of money at them and fix their bugs when they're clearly so well-known? It seems like such an obvious thing to do for a company whose entire business is built on the web. Is there really too little benefit to justify the cost of the engineer(s) it would take even for big companies? Or are the projects somehow blocking help?
At the core, this is not a problem with the server, or the CA, but with the clients. However, servers have to deal with broken clients, so it’s easy to point at the server and say it was broken, or to point at the server and say it’s fixed, but that’s not quite the case.
I discussed this some in https://twitter.com/sleevi_/status/1266647545675210753 , as clients need to be prepared to discover and explore alternative certificate paths. Almost every major CA relies on cross-certificates, some even with circular loops (e.g. DigiCert), and clients need to be capable of exploring those certificates and finding what they like. There’s not a single canonical “correct” certificate chain, because of course different clients trust different CAs.
Regardless of your CA, you can still do things to reduce the risk. Using tools like mkbundle in CFSSL (with https://github.com/cloudflare/cfssl_trust ) or https://whatsmychaincert.com/ help configure a chain that will maximize interoperability, even with dumb and old clients.
Of course, using shorter lived certificates, and automating them, also helps prepare your servers, by removing the toil from configuring changes and making sure you pickup updates (to the certificate path) in a timely fashion.
Tools like Censys can be used to explore the certificate graph and visualize the nodes and edges. You’ll see plenty of sites rely on this, and that means clients need to not be lazy in how they verify certificates. Or, alternatively, that root stores should impose more rules on how CAs sign such cross-certificates, to reduce the risk posed to the ecosystem by these events.