If, hypothetically, libraries in the US - including in particular the Library of... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		TeMPOraL on Nov 8, 2023 \| parent \| context \| favorite \| on: Major outages across ChatGPT and API If, hypothetically, libraries in the US - including in particular the Library of Congress - were to scan and OCR every book, newspaper and magazine they have with copyright protection already expired, would that be enough? Is there some estimate for the size of such dataset?

Turing_Machine on Nov 8, 2023 [–]

Much of that material is already available at https://archive.org. It might be good enough for some purposes, but limiting it to stuff before 1928 (in the United Sates) isn't going to be very helpful for (e.g.) coding.

Maybe if you added github projects with permissive licenses?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact