So code that was untested (the code path that failed was never exercised), perha...

sebazzz · 2025-06-14T11:33:09 1749900789

Continuous Integration/Continuous Disaster

bananapub · 2025-06-14T11:33:06 1749900786

There absolutely is a test environment, it was absolutely reviewed and Google has absolutely spent Moon-landing money on testing and in particular static analysis.

fidotron · 2025-06-14T11:49:23 1749901763

Moon landing money on static analysis that failed to identify the existence of a completely untested code path? Or even to shake this out with random data generation?

This is a dumbfounding level of mistake for an organization such as Google.

Unroasted6154 · 2025-06-14T12:57:22 1749905842

What makes you think it was completely untested? The condition that triggered the null pointer exception was obviously not tested, but it doesn't mean it didn't have tests or even 100% unit test coverage for the coverage tools.

In addition it looks like the code was not ready for production and the mistake was not gating it behind a feature flag. It didn't go through the normal release process.

piva00 · 2025-06-14T13:20:20 1749907220

If Google spent Moon-landing level of money in their quality/deployment infrastructure I expect a much better coverage checker than "100% unit tested", they are famous for having a whole fuzzing infrastructure, coverage analysers for more complex interplay of logic is something I use daily in a non-Google levels of spending (even though still a big enough corporation) which often reminds me that I forgot to write a functional test to cover a potential orchestration issue.

I don't think "completely untested" is correct but tested way below expectations for such structural piece of code is a lesson they should learn, it does look like an amateur-hour mistake.

Unroasted6154 · 2025-06-14T13:36:44 1749908204

The main issues to me seem to be that the code was not gated by a flag when it was not ready to be used, thus skipping a lot of the testing / release qualification.

polotics · 2025-06-14T11:36:32 1749900992

ok so what gives then?

JackSlateur · 2025-06-14T15:11:24 1749913884

No amount of "whatever" can prevent bugs to reach production

miyuru · 2025-06-14T07:43:45 1749887025

I would not be surprised if the code was AI generated.

yen223 · 2025-06-14T12:12:57 1749903177

I like the faith you have that people weren't making null-pointer mistakes before LLMs.

miyuru · 2025-06-14T15:45:14 1749915914

It did happen before LLMs, but there are well documented process to catch them. Google literally wrote the book on SRE best practices.

error handling is very basic, the only explanation these kind of bad code to get pushed to prod is LLMs and high trust on LLM automation.

they wont admit this publicly anyway, there is too much money invested on LLMs.