Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This was in progress, 830GB was downloaded before a Sourceforge guy popped onto the IRC and said he's ok with the archiving, but that the robots.txt should be respected. This would put things at a practical standstill. So the downloading was paused, I'm not really sure what's happened in the week since.

Right now Xfire's videos, several URL shortners' links, and Toshiba Support material are being archived. If you have spare cycles and bandwidth, and want to contribute, running an instance of the "ArchiveTeam Warrior" is pretty easy through docker or a VM. http://archiveteam.org/index.php?title=Warrior



Honestly I think ignoring robots.txt in this case is acceptable. Even if he programs in code to respect robots.txt - once the management at sourceforge get wind of what he is doing - what is stopping sourceforge from putting up robots.txt everywhere blocking him?


Look at their current robots.txt; they're already prohibiting robots to crawl the actual source code: http://sourceforge.net/robots.txt


Sourceforge doesn't host the binaries themselves. Universities and others offer mirrors (like HEANET) for free!

So the mirrors should just cut the upload write permission for Sourceforge and transfer it over to archive.org or ArchiveTeam.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: