helloworld42024's comments

helloworld42024 · on Aug 4, 2024

This is spot on.

helloworld42024 · on Aug 2, 2024

This is a brilliant idea. The product and system also look extremely well put together.

helloworld42024 · on July 11, 2024

I absolutely hate programming during the day!

Working from home and working at night - for me this is the most quiet, energetic and productive time.

gwervc · on July 11, 2024

I wish I could have work hours more fitted to my sleep pattern. Everyone is loosing including the company I work for since I'm staying at bed until the daily, then do basically jack shit in the morning.

helloworld42024 · on July 8, 2024

At this point, we need a service that "offers" an 8-bay (with 12TB? 14TB? drives) full with the whole ~80TB Anna's Archive. It's essentially all of human knowledge and to be frank it belongs to no one - rather...everyone.

People can store this at their house, keep it offline. Just to have these seeds of knowledge everywhere.

...I suppose LLM's trained on this data, essentially their model weights and tokenization are a much more efficient way of storing and condensing this 80TB archive?

GolfPopper · on July 9, 2024

When it comes to the evils (and goods) of copyright, it is hard to go wrong with Thomas Babington Macaulay's address to the House of Commons in 1841[1]:

"At present the holder of copyright has the public feeling on his side. Those who invade copyright are regarded as knaves who take the bread out of the mouths of deserving men. Everybody is well pleased to see them restrained by the law, and compelled to refund their ill-gotten gains. No tradesman of good repute will have anything to do with such disgraceful transactions. Pass this law: and that feeling is at an end. Men very different from the present race of piratical booksellers will soon infringe this intolerable monopoly. Great masses of capital will be constantly employed in the violation of the law. Every art will be employed to evade legal pursuit; and the whole nation will be in the plot. On which side indeed should the public sympathy be when the question is whether some book as popular as “Robinson Crusoe” or the “Pilgrim’s Progress” shall be in every cottage, or whether it shall be confined to the libraries of the rich for the advantage of the great-grandson of a bookseller who, a hundred years before, drove a hard bargain for the copyright with the author when in great distress? Remember too that, when once it ceases to be considered as wrong and discreditable to invade literary property, no person can say where the invasion will stop. The public seldom makes nice distinctions. The wholesome copyright which now exists will share in the disgrace and danger of the new copyright which you are about to create. And you will find that, in attempting to impose unreasonable restraints on the reprinting of the works of the dead, you have, to a great extent, annulled those restraints which now prevent men from pillaging and defrauding the living."

He was decrying the increase in term of copyright to life of the author + 50 years.

1.https://www.thepublicdomain.org/2014/07/24/macaulay-on-copyr...

matheusmoreira · on July 9, 2024

That is a powerful address, indeed. Good thing to know even in 1841 people saw what copyright would become today: an intolerable monopoly granted by the government, functionally infinite.

Seriously, this text is so great. I read the entire thing. It's nearly two hundred years old and contains everything one needs to know about copyright in 2024. Thank you for posting it.

squigz · on July 8, 2024

You dropped a 0. Anna's Archive is currently 862.4 TB

bityard · on July 8, 2024

That is true. However, it also has a staggering amount of duplicate data. I have _heard_ that if you search for most any particular book, you often get a dozen results of varying sizes and quality. Even for the same filetype. It's a hard problem to solve, but if we had something that could somehow pick the "best" copy of a particular title, for every title in the library, Anna could likely drop the zero herself.

unaindz · on July 10, 2024

As one of their blog posts explains that's by design, they download all versions of any file. The reasoning was that some worse quality video files will have subtitles or better audio than the high quality video.

Some filtering may be possible to automate but lots of the tasks involved will have to be manual. Like merging video and audio from different sources or syncing subtitles from another file.

squigz · on July 9, 2024

The above number is excluding duplicates.

RachelF · on July 8, 2024

Yes, too much for one person, but collectively it is possible to keep it alive.

If anyone wishes to help, you can generate a chunk in 1TB units and seed via BitTorrent here:

https://annas-archive.gs/torrents

CamperBob2 · on July 9, 2024

Honestly, if I can't have the whole thing, I'm not going to bother mirroring a 1TB fragment that's worthless by itself to everybody except copyright attorneys.

As ndriscoll points out, the only feasible way to distribute an archive of this size is with physical hard drives. I sure wish they would find a reasonably-trustworthy way to offer that.

MrDrMcCoy · on July 10, 2024

Most of the books are bloated PDFs. I'm slowly working on a project to reliably convert PDF to DjVu, which on average yields a highly readable document that's 33% of the original size on disk. The project is proving difficult, as the tooling for DjVu is quite moldy now, and often needs to be manually reviewed to ensure the file remains readable. Pdf2djvu exists, but it's highly unreliable, and thus can't be used in bulk. Other ebook formats are XML-based and tend to be similarly bloated due to the overhead of the markup. It's a hard problem with so little in the way of good file format choices.

CamperBob2 · on July 12, 2024

That sounds like a pretty terrible idea, TBH. All of the best tooling is for PDFs, as you note, and storage will only get cheaper.

Ultimately that content is going to need to be represented as raw UTF-8 text and encoded images, so I don't see much upside to migrating it from one intermediate lossy file format to another.

squigz · on July 9, 2024

You are never going to have a physical copy of the archive. It's nearly a petabyte in size.

ranger_danger · on July 9, 2024

I know several datahoarders that have at least 1PB, also archive.org grows by that much at least every day

squigz · on July 9, 2024

I assumed that GP was an average person who doesn't have a storage array sitting at home. I'm not really sure why the IA is relevant here

CamperBob2 · on July 9, 2024

1 PB of disk space would cost about $10K at this point in time. Not exactly unattainable. Looks like it would fit in a volume of space about the size of a standard refrigerator.

I'd be OK with both requirements.

squigz · on July 9, 2024

It doesn't seem reasonable to me to suggest that an average person would spend $10,000+ (and the time to maintain it) on a pirate archive, hence my comment.

On the other hand, contributing a TB or two to a torrent swarm is much more feasible for most people.

In any case, if you're okay with that, you should do it. Please report back in 6 months with how it's going.

CamperBob2 · on July 9, 2024

In any case, if you're okay with that, you should do it. Please report back in 6 months with how it's going.

Point being, if I tried to torrent the whole thing, it probably would take 6 months, and would likely get me booted from my ISP and/or sued. I would much rather buy a set of hard drives with the contents already loaded. Or tapes, as userbinator suggests.

(And as for the hypothetical "average person" you keep citing, I don't see anyone meeting that description around here.)

squigz · on July 9, 2024

> I would much rather buy a set of hard drives with the contents already loaded. Or tapes, as userbinator suggests.

And my point is that this is an absurd suggestion. I shouldn't have to explain why a shadow library shouldn't be selling (tens of) thousands of dollars worth of hard drives containing pirated content. Beyond that, and what I was getting at earlier, is that maintaining a 1PB storage array at home isn't exactly easy, or cheap.

CamperBob2 · on July 9, 2024

I shouldn't have to explain why a shadow library shouldn't be selling (tens of) thousands of dollars worth of hard drives containing pirated content.

Depends on what their goal is. I shouldn't have to explain why a "library" that's operating illegally in virtually every jurisdiction, with few or no complete mirrors, is vulnerable to being shut down by a small number of governmental or judicial entities.

If I were running the archive, not being a single point of interdiction would be high on my list of priorities. Especially when any number of people are indeed willing and able to keep 1 PB+ of content in circulation, samizdat-style. I would work to find these people, put them in touch with each other, and help them.

Beyond that, and what I was getting at earlier, is that maintaining a 1PB storage array at home isn't exactly easy, or cheap.

Not everything that's worth doing is easy or cheap, or otherwise suited to "average people." Again, I don't know where you're coming from here. What's your interest in the subject, exactly?

ranger_danger · on July 11, 2024

> It doesn't seem reasonable to me to suggest that an average person would spend $10,000+

You're right, and I was not trying to suggest that. I was merely disagreeing with "You are never going to" because I know there are people who are reading this who can and maybe will.

userbinator · on July 9, 2024

1PB is well beyond the point at which a tape drive and a bunch of tapes will be cheaper than hard drives, and likely more reliable.

account42 · on July 11, 2024

For archival, yes. Not if you want to access the thing with any frequency.

shrubble · on July 8, 2024

If you only care about non-fiction and science journals it is more like 250TB I think? Still several thousands in 22TB drives with ZFS though.

ndriscoll · on July 8, 2024

22 TB drives are around $230 on ebay, so if you used 15 of them in raidz2, that'd be around $3500 (so maybe a little over $4k with the rest of the server), which is around the cost of a new mirrorless camera and a decent lens, so certainly within the realm of a hobbyist. You probably couldn't get away with downloading 250 TB in any reasonable timeframe with most US ISPs (or at least Comcast) though. That'd be over 2.5 months of 300 Mb/s non-stop. Even copying it from a friend using 2.5 Gbit/s Ethernet would take over a week.

userbinator · on July 9, 2024

Tape might be a better choice with that much data.

jononor · on July 8, 2024

If the content is to be trustworthy then using LLMs to compress it makes no sense.

miloignis · on July 8, 2024

It's possible to do lossless compression with LLMs, basically using the LLM as a predictor and then storing differences when the LLM would have predicted incorrectly. The incredible Fabrice Bellard actually implemented this idea: https://bellard.org/ts_zip/

rustcleaner · on July 8, 2024

Can we do this in physics?

Use a universal function approximator to approximate the universe, seek Erf(x)>threshold, interrogate universe for fresh data, retrain new universal approximator, ... loop previous ... , universe in a bottle.

A_D_E_P_T · on July 8, 2024

You can do that sort of thing with a toy universe -- in fact Stephen Wolfram has a number of ongoing projects along broadly similar lines -- but you can't do it in our physical universe. Among other reasons, the universe is to all appearances infinite and simultaneously very complex, therefore it is incompressible and cannot be described by anything smaller than itself, nor can it be encapsulated in any encoding. You can make statistical statements about it -- with, e.g., Ramsey Theory -- but you can never capture its totality in a way that would enable its use in computation. For another thing, toy model universes tend to be straightforwardly deterministic, which is not clearly the case with our physical universe. (It is likely deterministic in ways that are not straightforward from our frame of reference.)

falsaberN1 · on July 8, 2024

The problem with the later is reliability, or rather it's efficient but unreliable. I'd rather overdo my offline storage and figure out some way to script/code my way into searching it in a convenient way.

jerojero · on July 9, 2024

People didn't want to buy apples new headset because it was too expensive at 3500 dollars.

You think anyone would spend 3000 dollars on such a thing? I doubt it.

account42 · on July 11, 2024

There are plenty of gadgets that people are willing to spend over 3k dollars on. So it just depends on what value you thing you can get out of it.

xk_id · on July 8, 2024

I’d rather copy the whole thing down by hand, than rely on a bullshit generator for my access to knowledge.

helloworld42024 · on July 7, 2024

So you're saying because 1 Million children die every year is due to Vitamin A you believe the answer is genetically modify rice rather than actually feed the children a balanced diet of Vitamin A enriched foods (carrots, sweet potatoes, spinach, kale...?) or even supplements?

I suppose in theory we can put every nutrient needed known to man in a grain of rice one day.

But just because you could, does it mean you should? I suppose this opens up a much more broader discussion of morals, ethics and potentially alogical topics of conversation.

sfifs · on July 7, 2024

Arguably yes. In this day and age of prosperity and technological development of our species, isn't it unethical and morally depraved to let children to die of malnutrition when a simple change could help?

Iodine enrichment of salt achieved much the same in so many countries.

coin · on July 7, 2024

> rather than actually feed the children a balanced diet of Vitamin A enriched foods (carrots, sweet potatoes, spinach, kale...?) or even supplements?

As if no one has ever thought of this and it's just that simple.

big-green-man · on July 7, 2024

Starvation is an acute problem for those living with it. They need to not die right now. So yes, absolutely we should do it. The converse is saying to them "it's better for you to have a balanced diet, which is harder to do, so you're just going to have to starve." Just solve the problem they're living with in the cheapest, easiest way possible, and then improve the overall picture when the situation isn't so dire.

NotMichaelBay · on July 7, 2024

Fortifying foods with nutrients is not a new concept (bread, cereals, etc), and I don't have any studies on hand but my assumption is that since it's been done for decades, it must have noticeable benefits to public health.

klyrs · on July 7, 2024

Vitamin-A enriched foods are indeed a solution that can be implemented by swapping a single component of an ecosystem rather than the complete overhaul that you're proposing.

helloworld42024 · on July 7, 2024

Sure, I mean - let's just simplify the whole system. 1 Cup of genetically infused full daily-meal Rice®™℠ for every single person on the planet. One cup sized serving portion of nutrient material. Yum.

big-green-man · on July 7, 2024

"Yum" is not the point though, it's avoiding malnutrition, starvation and death. Maslow's hierarchy of needs and all that, survive first, then later when they're not dead they can worry about producing culinary delights.

fragmede · on July 7, 2024

You have heard of Soylent and Huel, right?

klyrs · on July 7, 2024

No. Don't redesign the whole system! Why do people have that reflex?

tedunangst · on July 7, 2024

It's easier to meet the people where they are. If they eat rice, feed them better rice.

helloworld42024 · on July 7, 2024

"Sorry we are having trucking difficulties and supply problems at the moment. Please take your daily serving of 1-cup all nutrient dense supplied food called Rice®™℠. Remember be grateful to your {INSERT ENTITY HERE}"

...And remember kids "It's easier to meet the people where they are!"

fragmede · on July 7, 2024

Meeting people where they are dates to the 1940's, when we started putting iodine, iron and other additives into food with a significant impact in reducing disease like goiter and anemia all over the world.

The Lucky Iron Fish is a good story in this area.

kurthr · on July 7, 2024

They already know how to grow, transport, store, and cook rice. Just give them the seeds.

I think you're the one proposing to supply "Vitamin A enriched foods (carrots, sweet potatoes, spinach, kale...?)", which they may not know how to grow, transport, store, or cook.

nilamo · on July 7, 2024

Consider Mars. Starting a colony would require a minimum amount of material to be sent. The more densely packed those seeds are with nutrients, the better.

Suppafly · on July 8, 2024

>So you're saying because 1 Million children die every year is due to Vitamin A you believe the answer is genetically modify rice rather than actually feed the children a balanced diet of Vitamin A enriched foods (carrots, sweet potatoes, spinach, kale...?) or even supplements?

Yes.

VMG · on July 7, 2024

Obviously the answer is yes

hollerith · on July 8, 2024

>Vitamin A enriched foods (carrots, sweet potatoes, spinach, kale...?)

And oily fish, liver, cheese, and butter.

pawelmurias · on July 8, 2024

If you could make such a rice it would be awesome.

helloworld42024 · on April 17, 2024

You're overcomplicating everything. Websocket endpoints. Simple.

codingdave · on April 17, 2024

...and that oversimplifies everything. Yes, websockets to the browser are likely to be a piece of the answer. But a server needs to maintain the session for the websocket and send messages. And that server needs to subscribe to data updates in the database... while also respecting auth of the current user.

It is certainly a problem with an achievable solution, but it is not simple.

helloworld42024 · on April 18, 2024

In his initial message, the original poster provided a link to Supabase, which integrates PostgreSQL with PostgREST out of the box. This setup automatically offers websocket server functionality:

1. The websocket server, database data subscriptions, and all session management are consolidated in this layer.

2. Updates to data are efficiently managed using a built-in PostgreSQL function.

3. User authentication is managed via React, employing either JWT or OAuth.

As I mentioned earlier, it's quite simple.

bruchim · on April 18, 2024

Agree.