There are _lots_ of objects in a large git repository. E.g., I happen to have a fork of VLC lying around. VLC has 70k+ commits (on that version). Each commit has about 10k files. The typical AI crawler wants, for every commit, to download every file (so 700M objects), every tarball (70k+ .tar.gz files), and the blame layer of every file (700M objects, where blame has to look back on average 35k commits). Plus some more.
Saying “just cache this” is not sustainable. And this is only one repository; the only reasonable way to deal with this is some sort of traffic mitigation, you cannot just deal with the traffic as the happy path.
Saying “just cache this” is not sustainable. And this is only one repository; the only reasonable way to deal with this is some sort of traffic mitigation, you cannot just deal with the traffic as the happy path.