Just to add a little bit of nuance to this not because I'm trying to defend GitHub, they definitely need to up their reliability, but the 90% uptime figure represents every single service that GitHub offers being online 90% of the time. You don't need every single service to be online in order to use GitHub. For example, I don't use Copilot myself and it's seen a 96.47% uptime, the worst of the services which are tracked.
That’s… one 9 of reliability. You could argue the title understates the problem.
> You don't need every single service to be online in order to use GitHub.
Well that’s how they want you to use it, so it’s an epic failure in their intended use story. Another way to put this is ”if you use more GitHub features, your overall reliability goes down significantly and unpredictably”.
Look, I have never been obsessed with nines for most types of services. But the cloud service providers certainly were using it as major selling/bragging points until it got boring and old because of LLMs. Same with security. And GitHub is so upstream that downstream effects can propagate and cascade quite seriously.
On the other hand: it also doesn't include instances where GitHub is painfully slow but technically usable.
These days it is very common that something like opening the diff view of a trivial PR takes 15-30 seconds to load. Sure, it will eventually load after a long wait or an F5, but it is still negatively impacting my productivity.
There have been multiple outages in the past year where they didn’t even fully report it very quickly. I’m talking the types of outages that brings down normal enterprise usage: we hook delivery for CI/CD, git operations for everyone, PRs for code review. And that’s not even including GitHub actions or copilot which lots of people also rely on.
Not sure if I fully understand it, but this seems highly inefficient?
Instead of using embeddings which are easy to make a cheap to compare, you use summarized sections of documents and process them with an LLM? LLM's are slower and more expensive to run.
If this is used as an important tool call for an AI agent that preforms many other calls, then it's likely that the added cost and latency would be negligible compared to the benefit of significantly improved retrieval. As an analogy, for a small task you're often ok with just going over the first few search results, but to prepare for a large project, you might want to spend an afternoon researching.
In specific domains, accuracy matters more than than speed. Document structure and reasoning bring better retrieval than semantic search which retrieves "similar" but not "relevant" results.
The idea this person is trying for is a LLM that explores the codebase using the source graph in the way a human might, by control+clicking in idea/vscode to go to definition, searching for usages of a function, etc. It actually does work, other systems use it as well, though they have the main agent performing the codebase walk rather than delegate to a "codebase walker" agent.
My concern would be that a function called setup() might mask some really important thing, likewise a “preface” chapter might get missed by an LLM when you ask some specially deep question.
Either way that your input data structure could build bad summaries that the LLM misses with.
I think it only needs to generate the tree once before retrieval, and it doesn’t require any external model at query time. The indexing may take some time upfront, but retrieval is then very fast and cost-free.
For better diarization quality than pyannote, check out Whisper-DiarizationX which combines Whisper with ECAPA-TDNN speaker embeddings and spectral clustering.
I've experienced this as well. I'm working on a project for which I wanted to search through transcripts of a video. This is often a very long text. I figured since models like the GPT 4.1 series have very large context windows RAG was not needed but I definitely notice some strange issues, especially on the smaller models. Things like not answering the question that was asked but returning a generic summary of the content.
To be fair (or pedantic), in this post they didn't have root, so cat'ing etc/passwd would not have been possible, whereas installing a doom apk is trivial.