It's premature to claim these "AI" models are equivalent to a human memorizing lyrics to copyrighted songs. They're much closer to XOR'ing copyrighted works so that the encoded form is not casually recognizable. I don't have sources close to hand, but I seem to recall during the 00's reading about such schemes being regularly proposed and also regularly struck down, legally.
From a high level both things consist of putting copyrighted bits in a digital blender, producing a mixture unlike the originals but which, if you prod it the right way, can be induced to reproduce some parts of the (copyrighted) input. I'm not sure calling it AI makes it (legally) different. What is the argument here - that previous schemes just didn't blend the bits enough?
It isn't premature at all. Stable Diffusion took 100GB of already compressed images and turned them into a 2GB model. The XOR of those 100GM images could never fit into 2GB.
All this argument proves is that model it doesn't contain lossless full size byte perfect copies of all the image. And that's assuming all those 100GB of are images are unique with no common data. Copying 2% of those images or maybe 8% at 50% width would still be a problem.Just the difference between high and low quality JPG compression can easily make 10x size difference. You can start to see visible artifacts at that compression levels, but I wouldn't try to convince a movie studio that badly compressed version of their movie doesn't count as copyright infringement.
But that is exactly how copyright law works. If you copy the first 10000 bytes of a steven king book and compress it with zip and distribute it, there is a good chance you would lose the lawsuit if sued.
If on the other hand you copy the first, middle, and last byte of a Steven King book and share those three bytes with someone you would win a copyright lawsuit, because those three bytes are not unique enough to be copyrighted.
From a high level both things consist of putting copyrighted bits in a digital blender, producing a mixture unlike the originals but which, if you prod it the right way, can be induced to reproduce some parts of the (copyrighted) input. I'm not sure calling it AI makes it (legally) different. What is the argument here - that previous schemes just didn't blend the bits enough?