Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The point is that fair use exemptions isn't limited to "being a conscious human being enjoying human rights"

Sure. However, my point is that this is not fair use*, so other principles need to be applied. Whether legal systems in various countries find that fair use applies here or not, I agree we are yet to see.

* At least in cases where it’s an LLM operated at scale for profit (which I suppose would not hold for Meta’s models if they were truly open, but that’s not the case if they require obtaining a license in some conditions).



>Sure. However, my point is that this is not fair use (at least in cases where it’s an LLM operated for profit), so other principles need to be applied.

This isn't a complete argument. Most of AI companies' argument relies on the fact that AI models are "transformative". That's a plausible claim, and as Perfect 10 v. Google, and Authors Guild, Inc. v. Google, Inc. has shown, being a for-profit company is hardly a disqualification from getting fair protection.


“Transformative” is always a grey area. If my service just returns you a book you requested, but in upper case, then it was transformed.

But sure, the “transformative” argument is the one that could apply (and even I believe Google used it to argue its case), if it can be shown that an LLM can not verbatim reproduce a given work (which, incidentally, is something that you, a warm-blooded fleshy human with agency who has the freedom to read books, cannot do, but LLMs were shown to do).

That said, relevant laws existed before LLMs, and may are outdated. If the goal is to balance reasonable uses while protecting original output of authors that ultimately drives innovation and creativity, I am not sure if the preexisting laws are continuing to fulfil their function, but that’s my opinion.


>But sure, the “transformative” argument is the one that could apply (and even I believe Google used it to argue its case), if it can be shown that an LLM can not verbatim reproduce a given work.

You have to try pretty hard to get LLMs to reproduce a work verbatim, especially any lengthy passages that aren't famous (and thus re-quoted on the internet a bazillion times). Moreover just because LLMs can reproduce a work verbatim if you try hard enough doesn't mean it's not transformative. Google search snippets and google book search has been ruled "transformative" by the courts, but if you tried hard enough you can use them to extract the entire work.

>That said, relevant laws existed before LLMs, and may are outdater. If the goal is to balance reasonable uses while protecting original output of authors that ultimately drives innovation and creativity, I am not sure if the preexisting laws are continuing to fulfil their function, but that’s my opinion.

AFAIK the era of mining the public internet or published works for AI training data is over, or at least coming to an end. Everything that could be mined, has already been mined, and besides, the internet is getting increasingly polluted by AI output. Private training data is where it's at now, whether it's sourcing document troves from companies (eg. emails, documentation, source code, etc.), or paying "AI annotators" to produce training data for you. If the argument is that human authors should get a cut of AI profits because their works were "stolen" to train the models, this is going to be a increasingly losing argument, because it doesn't have a leg to stand on for private training data.


> If the argument is that human authors should get a cut of AI profits because their works were "stolen" to train the models, this is going to be a increasingly losing argument, because it doesn't have a leg to stand on for private training data.

The argument can be made that LLMs could not be created without expropriating the original works of all the authors they were trained on, and that argument would in fact be true and have quite sturdy legs as far as I’m concerned.

It’s not a historical instance of forgotten times, it started less than half a decade ago and I would be surprised if it’s not still ongoing (your argument about synthetic training data is forward-looking).


>The argument can be made that LLMs could not be created without expropriating the original works of all the authors they were trained on, and that argument would in fact be true and have quite sturdy legs as far as I’m concerned.

That makes as much sense as "American industry was built on the backs of British inventors (back it the day it was the "China" when it came to IP), so Britain should get perpetual (?) royalties from the US economy".


So we’re back the human vs. unthinking machine distinction. American inventors were human. We’re going in circles and this article was hidden on HN anyway.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: