Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If I spent every last second of my life in a public library, I couldn't even view a fraction of the information that OpenAI has ingested. The comparison is irrelevant. To make the comparison somehow valid, I'd have to back up my truck to a public library, steal the entire contents, then start selling copies out of my garage


Look, even I'm not a fan of ClosedAI, but this is ridiculous. ClosedAI isn't giving copies of anything. It is giving you a response it infers based on things it has "read" and/or "learned" by reading content. Does ClosedAI store a copy of the content it scrapes, or does it immediately start tokenizing it or whatever is involved in training? If they store it, that's a lot of data, and we should be able to prove that sites were scraped through lawsuit discovery process. Are you then also suggesting that ClosedAI will sell you copies of that raw data if you prompted correctly?

I'm in no way justifying anything about GPT/LLM training. I'm just calling out that these comparisons are extremely strained.


Let's say OpenAI developers use illegal copy of Windows on their laptops to save on buying a license. Is that ok to run a business this way?

Also I think it is different thing when someone uses copyrighted works for research and publishing a paper or when someone uses copyrighted works to earn money.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: