The problem is that the size of media has been growing exponentially.
When I keep wondering how my phone is running out of space every time its images / videos. Even when you look at an app that is like 400 MB its not 400 MB of code, its like 350 MB of images and 50 MB of code.
I’d argue media storage usage is starting to level off somewhat because we’re approaching the limits of human perception. For movie content people with average to good eyesight can’t tell the difference between 4K and 8K.
Environmental regulations also bite, an 8K tv that’s “green” is going to have to use very aggressive auto dimming. Storage capacity growth to me looks like it’s outstripping media size growth pretty handily.
Now this isn’t to say I can’t think of a few ways to use a few yottabytes of data but I don’t think there is a real need for this for media consumption. You might see media sizes increase anyways because why not store your movies as 16K RAW files if you have the storage, but such things will become increasingly frivolous.
I would agree with you; but as technology improves we move the goalposts.
iPhones for example capture a small collection of images at the same time which are able to be replayed as a small animation (or loop) called “Live photos”.
I am certain the future will hold for us: video which allows us to pan left and right.
Interestingly enough I've been messing around with ffmpeg recently and the newest high end codecs (VVC / h266) drop HD video size by 30% or more, it's pretty crazy.
It'll be very interesting where AVIF and similar next generation image formats go in the near future, hopefully we'll get some reduction from the exponential growth.
>I’m sure we’ll see it rise, especially in the age of generated textures and materials.
if generative AIs get good enough then I suppose at some point the data transmitted for games and media could be significantly less than now -- you'd 'just' need to transmit the data required for the prompt to generate something within some bounded tolerance.
Imagine a game shipping no textures, generating what was needed on the fly and in real time to fulfill some shipped set of priorities/prompts/flavors.
we're not there yet but it seems like on-the-fly squashing of concepts into 'AI language' is going to be a trend in lossy compression at some point.
There are actually a lot of procedural games out there, I think No Man's Sky uses some of those techniques, but they definitely have been around since the 80s. The thing now is that the fidelity can be much higher, for sure.
I remember being a kid at Babbages at the mall in the 90s and some guy told my friend and I that he just built a system with 8 gigs of storage, and my friend I talked about it endlessly as the coolest thing ever.
While I agree, it's been hard filling up the 2TB drive in my laptop.
My home server has a couple dozen terabytes (on spinning metal) and, with current fill rate, it's predicted it'll need an increase in space only after two of the drives reach retirement according to SMART. It hosts multiple development VMs and stores backups for all computers in the house.
Another aspect is that the total write lifetime is a multiple of the drive capacity. You can treat a 256TB drive as a very durable 16TB drive, able to last 16 times more writes than the 16TB one.
Don't even have to set your sail; this landlubber likes likes to shoot videos with a smartphone, and these days, recording a few minutes of a family event, or even your plane taking off, in decent quality, will easily give you a multi-gigabyte video file. And that's for normal videos; $deity help you if you enable HDR.
And yes, this is the universal answer to "how much storage is enough" - use cases will grow to consume generally-available computing resources. Today it's 4k UHD + HDR; tomorrow it'll be 8k UHD + HDR, few years later it will be 120 FPS lightfield recording with separate high-resolution radar depth map channel. And as long as progress in display tech keeps pace, the benefits will be apparent, and adoption will be swift.
I'll be curious to see the file sizes for Apple's version of 3D video capture in their Vision goggles. After one, two or three generations, I'm sure the first gen files will look small and lacking.
I've actually found my videos are not increasing as rapidly as I would expect. I've been reencoding in x265 and the file size difference is shocking. Right now I'm not ditching the existing original files but I may do that at some point, or just offload to a cloud service like Glacier
I’m right up next to a limit on live (easily-accessible, always visible in photo apps) cloud storage, with years of family photos and video taking about 95% of that.
I definitely don’t want to delete any of it, so I have been just hoping for bigger storage to be offered soon, but…
I hadn’t considered that re-encoding could be an option. I take standalone snapshots of everything every few months so if re-encoding would make a significant difference I might have to try this.
Do you have any tips on tools, parameters etc. that work well for you, please?
I use a shell script with ffmpeg. I encourage you to check out what works best for you but honestly the quality is pretty stellar with just a really simple one like
That's a fast single-pass constant quality encode - a two-pass encode would be better quality for the size but I find that very acceptable. It knocks down what would be a ~2gb file all the way to between 800mb - 1200mb with very reasonable quality, sometimes even more - I've seen a 5gb file become a 400mb file (!!). You can experiment with the -crf 26 parameter to get the quality/size tradeoff you like. I run that over every video in the directory as a cron job basically.
I think, for me, it satisfies some kind of hoarding instinct. I have a hard time keeping 'random junk' laying around my apartment, but I have absolutely no problem keeping a copy of a DVD I ripped 15 years ago that I will probably never watch again, and would probably be upset if it disappeared for some reason.
Blu-rays can take up 25gb each, so just a decent collection of those could easily consume most of one of these drives. If you want to do basic model tuning in stable diffusion, each model variation can take 7gb. This level of storage would mean you could almost setup a versioning system for those. And finally, any work with uncompressed data, which can just be easier in general, could benefit from it.
Even with brand new 25TB 3.5" drives, it's 10 of them, each holding 1,000 movies, for a total of 20,000 hours of entertainment or, roughly, 2 years of uninterrupted watching.
Oh look at Mr. “I pay legitimate streaming services for all my tv shows and movies” over there. (=
I have a 12 TB NAS that is 99% full at the moment. Should I delete movies I may want to watch later, knowing full well they aren’t easily available on the streaming services I pay for? Ha.
It's interesting to think that, as flash densities surpass hard disks, it'll become cheaper to store data on flash than on spinning metal once you factor in rack space and power consumption.
For the kind of usage a streaming device has, an SSD is overkill. For that, spinning metal is probably a better choice. OTOH, 256TB of spinning metal take up space and is quite noisy.
I vividly remember seeing a 5TB drive at Fry's Electronics sometime around 2010-2013 and thinking to my self "Who in gods name would ever need that much space"
But practically don’t you reach a threshold where storing that much data on one drive makes it a bottleneck and safety risk until the speed of the surrounding systems catch up?
I was seriousy shocked of all the abuse when I had to install a new winblows laptop for a relative. How is any of it legal? How the hell did we let it happen? Mindblowing.
This is huge, and unfortunately not surprising at all in the age of massive ever-growing out of control tech monopolies that do whatever the fuck they want. Whatever reads in the TOS now, they can and will just reword it when they need it. There's no trust.
Every service and utility gets enshittificated sooner or later, it's a given at the moment. I deleted all my private repos, github and all other MS services should be avoided in the future.
If you like the community driven fork of Gitea (which still upstreams to Gitea project) then you should check out https://forgejo.org
The fork was established at the time that Gitea got entepreneurial and founded Gitea Ltd. with plans for an enterprise version. https://codeberg.org used to run on Gitea, but switched to Forgejo, and Forgejo project is hosted on Codeberg at https://codeberg.org/forgejo
It's possible with OSS development but spreading out contributors and patches over three projects instead of one with a functioning community is hardly the ideal OSS development model.
Yet, when you lose value alignment with the project, the best thing to do is to abandon the ship the sooner as possible. Insisting on total collaboration is bad for every party.
That's OK but too uncomfortable when managing a number of git repositories on a ssh server. I'm using gitolite [1] for that.
The features are basic and managed by editing text files and git-pushing them to a control repository: create repositories, add users and their keys, readonly or readwrite. There is no GUI but once you have a copy of the repo on your machine you can use one of the several git GUIs available for any OS.
With gitolite I don't have to manually setup every single repo and configure access with maybe one user name per project. That would be too much. And how about read only, read write?
A large portion of people don’t want to memorise all the commands related merging, branching etc, catering towards the lowest common denominator I’d important.
Bare repo on a server is exposed to people exactly like Github: a remote URL you put in once and forget about it.
How people use their local git repository is their business, command-line, Sourcetree, GitKraken, what have you, but any of those work with any remotes.
(Sure, git by itself does not provide the other features from the hosting services like issue tracking and pull requests, but not every workflow requires those to be linked directly to the SCM)
I don’t care for issue tracking, but I do like the usability of diffs and merging in web ui apps - my primary job is to look at code not write it (meaning I’d fail basic git merge questions) but I’ve also found out the hard way that just because I know my way around a shell doesn’t mean I can force my views on the people my company hires, and I own the company.
I do genuinely appreciate that, but that's the point: the graphical clients that do visual diffing and merging like the ones I listed all work with a bare repo as the remote.
Heck I think even the Github Desktop application also works with non-Github repositories, and they would be the only ones that would have any interest in locking people in.
Unless you mean specifically the UX of having a URL you can copy to a specific line of a specific commit in a repository, which indeed is not possible without a standard URI scheme (which does not exist) or a web client.
I get your point. At the same time I find it funny how Linus was checking patches via email deciding what gets merged for the linux kernel. Now, every service needs all the replicated enterprisey festures.
It is not a personal criticism to you. I find it interesting git gave us all this efficiency and the enterprise removes it by adding complexity back because employees supposedly cannot be bothered to learn their tools (or cannot be mandated) or plainly prefer a nicer ui. Not a crime, but I can see how big corporations become inefficient with this type of thinking, when appliend to hundreds of tools and processes.
I use gitolite as well, it's great. Currently working on integrating it into a CI/CD pipeline, which admittedly proves to be a slight challenge, but I'm sure I'll get there eventually.
I, as a Linux user, built a similar system myself by getting an FTP account, mounting it locally with curlftpfs and then using git on the mounted filesystem.
It's a sad situation that if you desire exposure and community building you must maintain a fork on Github, but that's how it is for smaller projects. I am in a similar situation, with some of my projects with main repos hosted on sourcehut, but most of external engagement comes from clones on github. It is what it is, and we do what we must. :)
It looks like they are a pragmatic project that prefers to have contributors to being ideologically pure. It's not like there isn't an official repository hosted on gitea: https://gitea.com/gitea
Every time someone parrots this, I have to wonder if they did more than 5 minutes of reading - it's one of the top issues on the issue tracker and they've outright stated they will move once Gitea is at a spot where they are not losing functionality and history.
I did not parrot anything.
This is the first time I have heard of Gitea, I have googled it and the 1. thing I have noticed it was hosted on GitHub. It was an original tought.
I did not care enough to open their issue tracker. I still don't. It is ironic, not a bit, a lot. That statement was a bit sarcastic.
>Whatever reads in the TOS now, they can and will just reword it when they need it. There's no trust.
This is what is crazy to me. You can agree to terms, build infrastructure around terms you agreed to, then those terms can completely change. Don't like it? Click disagree and we'll close your account, no problem!
And, thanks to politics around social media censorship, we have way too people willing to say, "Don't like the terms, don't use the platform!" to the point of normalization. Sad.
The other solution is political. There's a reason that governments regulate and define economic rules of the road. This is a good example of where governments need to step in. The link between generative AI and the data it is trained on needs to be carefully thought through and properly handled especially given the capitalist nature of our economy.
Emergence of machine intelligence* and its control by Capital was not foreseen by Karl Marx, and the intervening period between heat death of Capitalist system and the Workers Utopia has been indefinitely extended.
There's an awful lot of very smart people who have studied economics for the majority of their lives who disagree with this. There are also alternatives to capitalism that don't entirely involve govt control.
Do you have experience with self-hosting Guitea? I am on to fence about going with Gitea because of the recent fork of the project (Forgejo). Seems that many contributors are now contributing mainly to Forgejo.
The reason for the fork was that Gitea was going for-profit and the folks that forked to Forgejo felt they went about that transition in a way that eroded trust. Here's their explanation: https://blog.codeberg.org/codeberg-launches-forgejo.html
it is functioning like Open Source should, there was a disagreement in how the project was run so it gets forked
This used to be more common place when projects were run by people not companies. I wish the practice would come back we need more forks in Free Software
It feels bad to "waste" the work that could have otherwise gone into highly-paid billable hours, or at least charity work on other repos that get more use.
I self host Gitea. Very reliable. Painless setup. I wish it had some sort of CI like github actions or bitbucket pipelines, but otherwise totally happy wit it.
Just self host the community edition of gitlab. It's miles better than gitea. It's got ci pipelines, it's got a pretty robust issue tracker, it's got wiki pages, it'll integrate with ldap/ad for authentication, it's got a package repository for self hosting libraries, it's got releases, it's got a service desk to make email -> ticket pipelines, etc.
GitLab CE is far too heavy and requires minimum 4GB to run. Contains lots of componnents including PostgreSQL and Redis and various components and startup takes long. With Gitea I can run it with just 1GB or a raspberry pi. It includes wiki, package repositories and releases as well. ldap, service desk - these are enterprise features that I don't need.
Gitlab is a crazy setup full of services, with elaborate interdependence, absurd hardware requirements, iffy performance, and all the lack of confidence on security that comes from this (and it only ever running if you use their docker images and don't touch anything).
I've got Gitea running on a $5 Vultr instance and it's great.
Upgrades have been painless. Doesn't tax the server.
Was using Gitea when that fork happened and didn't see a reason to migrate. Looked very much like poor communication on the behalf of Gitea causing a misunderstanding.
I self host Gitea both on my home NAS and a DO droplet. I set up repos sync between the instances, it works flawlessly. I've moved the most of my projects off Github/Gitlab and overall I'm very happy with it.
I self-host gitea as a github backup just in case. It's pretty easy and well documented (it's a single executable and you can use sqlite for the database).
> The only solution is to self-host. Gitea is good.
I don’t understand your thinking and gitea’s marketing. They say in the same breath that it’s “self-hosting” and that they do “Git hosting… similar to GitHub, BitBucket, and GitLab”. — https://docs.gitea.com/
It's a "run your own github" application. akin to Github Enterprise Server or Gitlab CE/EE, except unlike Github Enterprise Server and Gitlab EE, it's open source.
As far as I am aware, they do not offer a hosting service. I believe that statement was meant to convey that the Gitea software, once installed is a git host similar to the others. I think they were trying to differentiate between a typical remote git repo and all the web components that come with Gitea. They do offer paid support, but that's still for self hosting.