Hacker Newsnew | past | comments | ask | show | jobs | submit | nmcela's commentslogin

Cool! Reading through the page, there seems to be people behind this who know what they're doing.


I am 100.0% sure this is what someone has said for 256GB drives, 256MB drives, 256KB drives, and the same person will be saying it for 256YB drives.


I clearly remember getting an 80gb drive, circa 2001-2002 I think, and talking with my friends about how impossible it would be to ever fill.


The problem is that the size of media has been growing exponentially.

When I keep wondering how my phone is running out of space every time its images / videos. Even when you look at an app that is like 400 MB its not 400 MB of code, its like 350 MB of images and 50 MB of code.


I’d argue media storage usage is starting to level off somewhat because we’re approaching the limits of human perception. For movie content people with average to good eyesight can’t tell the difference between 4K and 8K.

Environmental regulations also bite, an 8K tv that’s “green” is going to have to use very aggressive auto dimming. Storage capacity growth to me looks like it’s outstripping media size growth pretty handily.

Now this isn’t to say I can’t think of a few ways to use a few yottabytes of data but I don’t think there is a real need for this for media consumption. You might see media sizes increase anyways because why not store your movies as 16K RAW files if you have the storage, but such things will become increasingly frivolous.


I would agree with you; but as technology improves we move the goalposts.

iPhones for example capture a small collection of images at the same time which are able to be replayed as a small animation (or loop) called “Live photos”.

I am certain the future will hold for us: video which allows us to pan left and right.

These both require more space.


Interestingly enough I've been messing around with ffmpeg recently and the newest high end codecs (VVC / h266) drop HD video size by 30% or more, it's pretty crazy.

It'll be very interesting where AVIF and similar next generation image formats go in the near future, hopefully we'll get some reduction from the exponential growth.


I laughed today when it was announced Buldar's Gate 3 is 120GB


Even with all these advancements, 120GB for a game is still and always will be a lot!


Having many very high resolution textures adds up. I’m sure we’ll see it rise, especially in the age of generated textures and materials.


>I’m sure we’ll see it rise, especially in the age of generated textures and materials.

if generative AIs get good enough then I suppose at some point the data transmitted for games and media could be significantly less than now -- you'd 'just' need to transmit the data required for the prompt to generate something within some bounded tolerance.

Imagine a game shipping no textures, generating what was needed on the fly and in real time to fulfill some shipped set of priorities/prompts/flavors.

we're not there yet but it seems like on-the-fly squashing of concepts into 'AI language' is going to be a trend in lossy compression at some point.


There are actually a lot of procedural games out there, I think No Man's Sky uses some of those techniques, but they definitely have been around since the 80s. The thing now is that the fidelity can be much higher, for sure.


Even 50MB of code is insane.


I remember being a kid at Babbages at the mall in the 90s and some guy told my friend and I that he just built a system with 8 gigs of storage, and my friend I talked about it endlessly as the coolest thing ever.


If it helps, I bought circa 1993 an Apple Powerbook (for at the time an awful amount of money) running System 7, that came with 40, 80 or 120 MB disk:

https://en.wikipedia.org/wiki/Powerbook_160

I chose the 80 MB version as the 40 was too little, and the 120 was way too much for non-professional use (impossible to ever fill).


While I agree, it's been hard filling up the 2TB drive in my laptop.

My home server has a couple dozen terabytes (on spinning metal) and, with current fill rate, it's predicted it'll need an increase in space only after two of the drives reach retirement according to SMART. It hosts multiple development VMs and stores backups for all computers in the house.

Another aspect is that the total write lifetime is a multiple of the drive capacity. You can treat a 256TB drive as a very durable 16TB drive, able to last 16 times more writes than the 16TB one.


>While I agree, it's been hard filling up the 2TB drive in my laptop.

Then you're defiantly not torrenting enough "definitely legit" content as I am. Once you sail the dark seas it piles up quick. Or maybe I have ADHD.


Don't even have to set your sail; this landlubber likes likes to shoot videos with a smartphone, and these days, recording a few minutes of a family event, or even your plane taking off, in decent quality, will easily give you a multi-gigabyte video file. And that's for normal videos; $deity help you if you enable HDR.

And yes, this is the universal answer to "how much storage is enough" - use cases will grow to consume generally-available computing resources. Today it's 4k UHD + HDR; tomorrow it'll be 8k UHD + HDR, few years later it will be 120 FPS lightfield recording with separate high-resolution radar depth map channel. And as long as progress in display tech keeps pace, the benefits will be apparent, and adoption will be swift.


I'll be curious to see the file sizes for Apple's version of 3D video capture in their Vision goggles. After one, two or three generations, I'm sure the first gen files will look small and lacking.


Of course. It won't encode touch and smell.


I've actually found my videos are not increasing as rapidly as I would expect. I've been reencoding in x265 and the file size difference is shocking. Right now I'm not ditching the existing original files but I may do that at some point, or just offload to a cloud service like Glacier


I’m right up next to a limit on live (easily-accessible, always visible in photo apps) cloud storage, with years of family photos and video taking about 95% of that.

I definitely don’t want to delete any of it, so I have been just hoping for bigger storage to be offered soon, but…

I hadn’t considered that re-encoding could be an option. I take standalone snapshots of everything every few months so if re-encoding would make a significant difference I might have to try this.

Do you have any tips on tools, parameters etc. that work well for you, please?


I use a shell script with ffmpeg. I encourage you to check out what works best for you but honestly the quality is pretty stellar with just a really simple one like

    mkdir -p reencoded

    ffmpeg -i input_filename.mp4 -c:v libx265 -crf 26 -preset fast -c:a aac -b:a 128k reencoded/output_filename.mp4
That's a fast single-pass constant quality encode - a two-pass encode would be better quality for the size but I find that very acceptable. It knocks down what would be a ~2gb file all the way to between 800mb - 1200mb with very reasonable quality, sometimes even more - I've seen a 5gb file become a 400mb file (!!). You can experiment with the -crf 26 parameter to get the quality/size tradeoff you like. I run that over every video in the directory as a cron job basically.


I think, for me, it satisfies some kind of hoarding instinct. I have a hard time keeping 'random junk' laying around my apartment, but I have absolutely no problem keeping a copy of a DVD I ripped 15 years ago that I will probably never watch again, and would probably be upset if it disappeared for some reason.


Or just download a few modern games.

No torrents here, absolutely none.


Blu-rays can take up 25gb each, so just a decent collection of those could easily consume most of one of these drives. If you want to do basic model tuning in stable diffusion, each model variation can take 7gb. This level of storage would mean you could almost setup a versioning system for those. And finally, any work with uncompressed data, which can just be easier in general, could benefit from it.


256TB is 10,000 25GB BD movies.

Even with brand new 25TB 3.5" drives, it's 10 of them, each holding 1,000 movies, for a total of 20,000 hours of entertainment or, roughly, 2 years of uninterrupted watching.

That's a lot.


Oh look at Mr. “I pay legitimate streaming services for all my tv shows and movies” over there. (=

I have a 12 TB NAS that is 99% full at the moment. Should I delete movies I may want to watch later, knowing full well they aren’t easily available on the streaming services I pay for? Ha.

It fills fast!


The people streaming their data are just using someone else's SSD... at the end of the day all this data we generate and consume sits somewhere


I'm also deduplicating that data by not storing it locally.


They have 16TB NAS drives for $300 now so if you have decent disposable income just upgrading the drives one by one is probably a decent strategy.


Sounds like you need a bigger NAS. I throw a fresh 10TB drive in mine every 3-4 months.


Start. Smart to cycle drives out after 3 years too.


I'm thrifty, I buy drives that businesses already cycled out after 3-5 years.


You laugh in the face of danger; a real risk taker! (=

I love all the Backblaze drive update posts about the lifespans of storage media.


It's interesting to think that, as flash densities surpass hard disks, it'll become cheaper to store data on flash than on spinning metal once you factor in rack space and power consumption.

Won't take that long.


Usenet is my backup. I've tried to make Backblaze my backup a couple times but the ETA on completing the first pass is always right about never.


For the kind of usage a streaming device has, an SSD is overkill. For that, spinning metal is probably a better choice. OTOH, 256TB of spinning metal take up space and is quite noisy.


Once you start saving media or playing with ai models space goes quickly.


That's what the server is for.


And there are many reasons for one to prefer having their workstation be their server.


In fairness, it has a screen and keyboard connected, but no mouse. Adding a mouse would be trivial.

Anyway, 20TB takes a 3.5" bay, something my laptop lacks.


>256YB drives

Ah yes, Yagnibytes.


Y'otta look that one up.


"640K ought to be enough for anybody"


Was this the same person who said that 640KB ought to be enough for anybody?


They are probably still right. How much of the computing resources we all now have access to do we actually need?


Many novels are more than 640KB of ASCII text.


They are, but ought they?


Makes you wonder how people read long novels before they had enough RAM!


I vividly remember seeing a 5TB drive at Fry's Electronics sometime around 2010-2013 and thinking to my self "Who in gods name would ever need that much space"

I now have 24 terabytes in my NAS


But practically don’t you reach a threshold where storing that much data on one drive makes it a bottleneck and safety risk until the speed of the surrounding systems catch up?


As a gamedev, that sentence nearly gave me a heart attack. :)

I get where you are coming from, but the amount of work you can do in 5ms is mindblowing.


I'm not sure what to think about the fact that this whole thing is playing out exactly like in the movie "Don't Look Up". It's eerie.

I'm between fourth and fifth stage, trying to think how I could best prepare for the uncertain future.


Indeed, each one brings worse news than the previous.


I was seriousy shocked of all the abuse when I had to install a new winblows laptop for a relative. How is any of it legal? How the hell did we let it happen? Mindblowing.


I was seriousy shocked of all the abuse when I had to install a new winblows laptop for a relative. How is any of it legal?

What should be made illegal?


I don't know man, maybe a massive antitrust case is enough.


I hate everything about this. Everything Meta touches turns to dystopian shit.


Yeah, no thank you. There's something wrong with the culture if the answer to this worsening ad dystopia is "just deal with it".


I agree. Especially the point about staying as a honest tool and not joining the growth-monopoly-privacy-theft -game at all resonated heavily with me.

Also kudos to Kev Quirk. Meta in its entirety is incredibly immoral in every way.


This is huge, and unfortunately not surprising at all in the age of massive ever-growing out of control tech monopolies that do whatever the fuck they want. Whatever reads in the TOS now, they can and will just reword it when they need it. There's no trust.

Every service and utility gets enshittificated sooner or later, it's a given at the moment. I deleted all my private repos, github and all other MS services should be avoided in the future.

The only solution is to self-host. Gitea is good.


> The only solution is to self-host. Gitea is good.

Gitea project hosts its code on GitHub: https://github.com/go-gitea/gitea. You must admit that is a bit ironic.

> age of massive ever-growing out of control tech monopolies that do whatever the fuck they want

GitHub is not the only option for source code hosting. There are alternatives like GitLab, Bitbucket, and numerous smaller ones.


If you like the community driven fork of Gitea (which still upstreams to Gitea project) then you should check out https://forgejo.org

The fork was established at the time that Gitea got entepreneurial and founded Gitea Ltd. with plans for an enterprise version. https://codeberg.org used to run on Gitea, but switched to Forgejo, and Forgejo project is hosted on Codeberg at https://codeberg.org/forgejo


That is a real tongue-twister of a name.


Pronounced: for-jay-oh (it is a derivation of the Esperanto name for "forge")


Or maybe like .. "forge ho"


Another fork? gogs -> gitea -> forgejo lol


OSS development model functioning as intended.


It's possible with OSS development but spreading out contributors and patches over three projects instead of one with a functioning community is hardly the ideal OSS development model.


Yet, when you lose value alignment with the project, the best thing to do is to abandon the ship the sooner as possible. Insisting on total collaboration is bad for every party.


If they continue push upstream per the license then it works. There's a lot of Linux desktop apps that work this way; it's hardly a broken model.


While not an option for everyone, if you have a server with ssh access you can do:

    git init --bare /path/to/repo.git
on the server. Then locally you git clone that repo with a ssh url.

It does not have any visual MR or enterprisey features, but it works.


That's OK but too uncomfortable when managing a number of git repositories on a ssh server. I'm using gitolite [1] for that.

The features are basic and managed by editing text files and git-pushing them to a control repository: create repositories, add users and their keys, readonly or readwrite. There is no GUI but once you have a copy of the repo on your machine you can use one of the several git GUIs available for any OS.

[1] https://gitolite.com/gitolite/


What aspects of bare git repos over ssh are uncomfortable or subpar?


With gitolite I don't have to manually setup every single repo and configure access with maybe one user name per project. That would be too much. And how about read only, read write?


A large portion of people don’t want to memorise all the commands related merging, branching etc, catering towards the lowest common denominator I’d important.


How does the choice of hosting change this?

Bare repo on a server is exposed to people exactly like Github: a remote URL you put in once and forget about it.

How people use their local git repository is their business, command-line, Sourcetree, GitKraken, what have you, but any of those work with any remotes.

(Sure, git by itself does not provide the other features from the hosting services like issue tracking and pull requests, but not every workflow requires those to be linked directly to the SCM)


I don’t care for issue tracking, but I do like the usability of diffs and merging in web ui apps - my primary job is to look at code not write it (meaning I’d fail basic git merge questions) but I’ve also found out the hard way that just because I know my way around a shell doesn’t mean I can force my views on the people my company hires, and I own the company.


I do genuinely appreciate that, but that's the point: the graphical clients that do visual diffing and merging like the ones I listed all work with a bare repo as the remote.

Heck I think even the Github Desktop application also works with non-Github repositories, and they would be the only ones that would have any interest in locking people in.

Unless you mean specifically the UX of having a URL you can copy to a specific line of a specific commit in a repository, which indeed is not possible without a standard URI scheme (which does not exist) or a web client.


I get your point. At the same time I find it funny how Linus was checking patches via email deciding what gets merged for the linux kernel. Now, every service needs all the replicated enterprisey festures.

It is not a personal criticism to you. I find it interesting git gave us all this efficiency and the enterprise removes it by adding complexity back because employees supposedly cannot be bothered to learn their tools (or cannot be mandated) or plainly prefer a nicer ui. Not a crime, but I can see how big corporations become inefficient with this type of thinking, when appliend to hundreds of tools and processes.


I use gitolite as well, it's great. Currently working on integrating it into a CI/CD pipeline, which admittedly proves to be a slight challenge, but I'm sure I'll get there eventually.


I was sold on the features but never got the hang of how to work with it.


We've been doing that for years in our org. This works perfectly.we do push open source repos to GitHub though.


I, as a Linux user, built a similar system myself by getting an FTP account, mounting it locally with curlftpfs and then using git on the mounted filesystem.


this exactly. git itself is all ya need. u can connect clients/ides like vs code to such a repo easily.


> You must admit that is a bit ironic

It's a sad situation that if you desire exposure and community building you must maintain a fork on Github, but that's how it is for smaller projects. I am in a similar situation, with some of my projects with main repos hosted on sourcehut, but most of external engagement comes from clones on github. It is what it is, and we do what we must. :)


I would agree for any other kind of project, except for a GitHub alternative.

How does it look from the potential users' perspective when the product they market is not the product they choose to use for themselves?


It looks like they are a pragmatic project that prefers to have contributors to being ideologically pure. It's not like there isn't an official repository hosted on gitea: https://gitea.com/gitea


GitLab entered a strategic partnership with Google, likely for the very same reason - feeding Google AI models with enough code.


Could you link to some of the announcements or articles. I only ask because I was totally unaware and would like to learn more.



> GitLab is working with Google Cloud because of its strong commitment to privacy and enterprise readiness, and its leadership in AI.

Google's commitment to privacy? Google's leadership in AI?

O how I love marketing, you can say just about anything



> You must admit that is a bit ironic.

Looks like they're working on migrating to a Gitea instance: https://github.com/go-gitea/gitea/issues/1029 .


Wow how pathetic that github is refusing to export their data:

https://github.com/go-gitea/gitea/issues/1029#issuecomment-1...


Failing != Refusing


Failing = Refusing + hiding behind corporate bureacracy


> You must admit that is a bit ironic.

The people are on github, so it is really enticing.

Maybe the reddit and twitter drama creates a viable enough community for federated logins to become useful.


> The people are on github, so it is really enticing.

For any other project, sure. But when building an alternative to GitHub.. there is value in dogfooding.


or just git init --bare


> You must admit that is a bit ironic.

Every time someone parrots this, I have to wonder if they did more than 5 minutes of reading - it's one of the top issues on the issue tracker and they've outright stated they will move once Gitea is at a spot where they are not losing functionality and history.


I did not parrot anything. This is the first time I have heard of Gitea, I have googled it and the 1. thing I have noticed it was hosted on GitHub. It was an original tought.

I did not care enough to open their issue tracker. I still don't. It is ironic, not a bit, a lot. That statement was a bit sarcastic.

I hope that put the end to your wondering.


>Whatever reads in the TOS now, they can and will just reword it when they need it. There's no trust.

This is what is crazy to me. You can agree to terms, build infrastructure around terms you agreed to, then those terms can completely change. Don't like it? Click disagree and we'll close your account, no problem!

And, thanks to politics around social media censorship, we have way too people willing to say, "Don't like the terms, don't use the platform!" to the point of normalization. Sad.


I am agreeing and adding another solution.

The other solution is political. There's a reason that governments regulate and define economic rules of the road. This is a good example of where governments need to step in. The link between generative AI and the data it is trained on needs to be carefully thought through and properly handled especially given the capitalist nature of our economy.


[flagged]


Parroting ideologue-dogmatic bullshit platitudes is not conducive to a good discussion.


If you're going to wade into a 200 year old flame war at least have something interesting to say.


Emergence of machine intelligence* and its control by Capital was not foreseen by Karl Marx, and the intervening period between heat death of Capitalist system and the Workers Utopia has been indefinitely extended.

* pure transformation of energy into labor


There's an awful lot of very smart people who have studied economics for the majority of their lives who disagree with this. There are also alternatives to capitalism that don't entirely involve govt control.


Absolute BULLSHIT!

Greed is what screws up the market. Ask Alan Greenspan re 2008 Banking Crisis.


There is always SourceHut (https://sourcehut.org) if you want.


Do you have experience with self-hosting Guitea? I am on to fence about going with Gitea because of the recent fork of the project (Forgejo). Seems that many contributors are now contributing mainly to Forgejo.


The reason for the fork was that Gitea was going for-profit and the folks that forked to Forgejo felt they went about that transition in a way that eroded trust. Here's their explanation: https://blog.codeberg.org/codeberg-launches-forgejo.html


Gitea is itself a fork of gogs (Go Git Server)

it is functioning like Open Source should, there was a disagreement in how the project was run so it gets forked

This used to be more common place when projects were run by people not companies. I wish the practice would come back we need more forks in Free Software


It feels bad to "waste" the work that could have otherwise gone into highly-paid billable hours, or at least charity work on other repos that get more use.


I self host Gitea. Very reliable. Painless setup. I wish it had some sort of CI like github actions or bitbucket pipelines, but otherwise totally happy wit it.


> I wish it had some sort of CI like github actions or bitbucket pipeline

I use Gitea with Drone CI and it works pretty well: https://www.drone.io/

Some might also prefer the Woodpecker CI fork due to the license: https://woodpecker-ci.org/

I setup Drone as a part of my migration away from GitLab Omnibus and have no complaints so far: https://blog.kronis.dev/articles/goodbye-gitlab-hello-gitea-...

Here's the Drone example in particular: https://blog.kronis.dev/tutorials/moving-from-gitlab-ci-to-d...


It's been added recently. Not sure how they compare.


GitHub actions works in gittea at version 1.19


Just self host the community edition of gitlab. It's miles better than gitea. It's got ci pipelines, it's got a pretty robust issue tracker, it's got wiki pages, it'll integrate with ldap/ad for authentication, it's got a package repository for self hosting libraries, it's got releases, it's got a service desk to make email -> ticket pipelines, etc.


GitLab CE is far too heavy and requires minimum 4GB to run. Contains lots of componnents including PostgreSQL and Redis and various components and startup takes long. With Gitea I can run it with just 1GB or a raspberry pi. It includes wiki, package repositories and releases as well. ldap, service desk - these are enterprise features that I don't need.


> It's miles better than gitea.

Gitlab is a crazy setup full of services, with elaborate interdependence, absurd hardware requirements, iffy performance, and all the lack of confidence on security that comes from this (and it only ever running if you use their docker images and don't touch anything).

But yeah, it got everything.


Gitea has all these features as well, except maybe the last.


I've got Gitea running on a $5 Vultr instance and it's great.

Upgrades have been painless. Doesn't tax the server.

Was using Gitea when that fork happened and didn't see a reason to migrate. Looked very much like poor communication on the behalf of Gitea causing a misunderstanding.


I self host Gitea both on my home NAS and a DO droplet. I set up repos sync between the instances, it works flawlessly. I've moved the most of my projects off Github/Gitlab and overall I'm very happy with it.


I self-host gitea as a github backup just in case. It's pretty easy and well documented (it's a single executable and you can use sqlite for the database).


Cyberpunk 2077 here we come!


> The only solution is to self-host. Gitea is good.

I don’t understand your thinking and gitea’s marketing. They say in the same breath that it’s “self-hosting” and that they do “Git hosting… similar to GitHub, BitBucket, and GitLab”. — https://docs.gitea.com/


You install Gitea on a server that you own. You use that instance of Gitea to host your git repositories. That is self-hosted git hosting.

Gitea is an open source alternative to GitHub, that you run yourself.


It's a "run your own github" application. akin to Github Enterprise Server or Gitlab CE/EE, except unlike Github Enterprise Server and Gitlab EE, it's open source.


As far as I am aware, they do not offer a hosting service. I believe that statement was meant to convey that the Gitea software, once installed is a git host similar to the others. I think they were trying to differentiate between a typical remote git repo and all the web components that come with Gitea. They do offer paid support, but that's still for self hosting.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: