The time has come to replace file systems

memetomancer · on Feb 24, 2022

Reading this article left me wondering just where the 200 million tags this guy needs are supposed to come from. Manual curation?! Automatically derived by file extension? file headers? what is the cost of opening a file, parsing its filetype, comparing against a reference, writing it to a database, etc. How is that cheaper than current indexers (which all seem to work fine btw)?

I rarely waste effort trying to remember filenames in the first place, much less needing some expensive tag curation to locate files. I simply use a bit of discipline organizing the directory structure(s). If I do ever need to actually search for something, it will be constrained to a narrow subset of directories and ignore the other 199.9 million files or whatever.

Moreover, I just don't have the problem of searching for filename fragments to begin with. Nor do I see a reasonable way to use a whole host of powerful unix techniques with a whackadoodle tiny tags filesystem. Or the need to produce a list of 20 million images in 2 seconds. What use would that be anyway? I'm not going to read a list like that - I'm going to operate on it.

Please correct me if I'm wrong, but the versatility of `find` is far more powerful if you actually need to handle/sort through that many files, and something like `fzf` probably curtails all these complaints in the first place.

brigandish · on Feb 24, 2022

If I had a penny for every time someone on HN responds with something like this - "just become more disciplined and you don't need X" - I'd be a millionaire. Doesn't matter what it is, type systems, memory safety, a better UI for Git… there's always someone ready to chime in with how their workflow means these problems don't happen, or, even better, asking the question why would anyone need this?

Yes, why would anyone need better search or a faster, easier to organise file system? I can't think why.

atoav · on Feb 24, 2022

A better search and tagging can be valuable tools. But no matter how good search gets, it will not stop users from putting files untagged into one big junk folder.

Being able to think about how to order your files is a fundamental skill in this day and age and doing this on a big scale does indeed require discipline.

IMO it is just a false hope to think tags would help with the root cause of a lack of care about the data.

onion2k · on Feb 24, 2022

Being able to think about how to order your files is a fundamental skill in this day and age and doing this on a big scale does indeed require discipline.

I'm not sure that's true, because no one does that on mobile devices. Some people have even suggested that young people who've grown up with mobile phones struggle with filesystems because they have no experience of file management despite having plenty of experience of computing.

Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.

The same is true to a lesser extent with online office suites. You don't need to know the name of a file in Google Docs - you refer to things by their titles.

Moving from file names to tags, or any meta data really, would be possible. Whether it'd be better is a matter of opinion.

grumbel · on Feb 24, 2022

> Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.

I think the way Android does it is completely the wrong way around, as it makes it completely centered around apps, not documents. Making you a slave to the App, which in turn gets used to force you into using cloud services. It goes so far that you don't even have control over your files anymore, if you delete and App, the files created by that App will get deleted with it, you don't even get a warning.

I rarely use Android, but every interaction with it has been god awful. And from what I hear new version of Android will start making tools like SSHelper impossible, so you can't even workaround the madness anymore.

Double_a_92 · on Feb 24, 2022

How is an iPhone (or other mobile OS) better?

CryptoBanker · on Feb 24, 2022

Android has had file explorer capability for over a decade? iOS hasn’t even had it for 5 years?

cesarb · on Feb 24, 2022

There are two main places where Android apps store files: within an application-private slice of the main filesystem, and the shared /sdcard (which, as its name implies, was originally a removable SD card, but nowadays is just another slice of the main filesystem). What the parent is complaining about is the former (and a per-application directory on the later), which is removed whenever the application is uninstalled (or the user tells Android to clear the application state); and unless you have root on the phone (or are looking at the per-application directory below /sdcard), or the application explicitly exports it, it's not even visible to any file explorer.

Jensson · on Feb 24, 2022

> Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.

That is an inherent weakness of mobile OS's and prevents them from competing with traditional computers in workplaces. It works as long as you never want to do anything complicated with your computer, but most people working in offices needs the organization offered by a file system, not to mention how crippling this would be to use as a development environment.

makeitdouble · on Feb 24, 2022

We have systems, but they don’t need to be “files”.

On development environments: we manage all our code under version control, our IDE index all the sources and most of us navigate by pretty unusual ways. I guess we would be fine without any direct access to the fs as long as we had a layer that gave back streams of files we look for and a way to commit changes to git. For a lot of devs I think everything is already abstracted by their IDE.

A lot of people work the same. For instance navigating exclusively through links sent to them, their office app’s “recent documents” and the “open” dialogue with the folder they stored all their docs and might not completely understand where it is exactly (except if it’s straight their “Desktop” folder). I think a ton of adults get by with that on their everyday job.

the_other · on Feb 24, 2022

I’m still largely pro-file-systems, but your comment made me think. Here’s some loose thoughts, just to get into the problem space.

I navigate my codebase at work primarily using the name of the entity I’m aiming to inspect or work on next (e.g. “Popup” or “uiEventStream”)

- usually using fuzzy search. This matches by file ame, but feasibly could operate by entity symbolic name to the same effect

- increasingly using VSCode’s “find references”, which already operates by entity name (at least that’s now the UI appears)

However.. I also use the file tree, because important and meaningful application structure is encoded in the tree. The tree (and its node namez) gives me sections of the app, collections of entity types, and hints how they’re connected. This is invaluable. It helps new colleagues learn the application structure and it helps old hands get to what they want faster. It forms a “silent” background context against which all entity-based decisions get made.

The structure could be encoded as tags, with all the files dumped in a single directory. I have yet to see a tagging interface work as well for tag hierarchies as a directory tree works.

Tag hierarchies are a specialised use, or extension of generalised tags. Tagging systems typically emphasise (through UI and explanatory notes) the unstructured approach. Structure and unstructure are basically opposed so making a single UI work for both seems problematic. Educating users, most of whom wont use the word “taxonomy” in daily life, how to use a tool supporting a model with an almost inherent self-contraction seems like a mammoth task.

makeitdouble · on Feb 24, 2022

To add to that, there is IMO a renaissance of “fuzzy” navigation, with macos’ quick search (the Sherlock/quicksilver clone), and tools like Obsidian spreading the model of moving to apps and file by a set of partial keywords.

I agree with you that a tree structure is important, the same way people still look at trees to navigate pages on most sites, going through categories, sub-categories etc.. More and more the tree is just dissociated with the actual representation, the same ways urls don’t exactly match the site structure on so many sites.

I’d imagine the tags would have some hierarchical relations if we were to use tags exclusively.

johannes1234321 · on Feb 24, 2022

Many programming languages have a tagging system for that: namespaces, packages, etc.

Often they have a close mapping to filesystem, but with IDE support that isn't strictly needed. (In reality however it currently is, as the filesystem is the language agnostic common interface between version control system, IDE, etc.)

incompatible · on Feb 24, 2022

I guess we could have "Emacs files" and "gcc files" and you could transfer one to the other using a Downloads folder or something? Pretty convenient.

_dain_ · on Feb 24, 2022

>Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.

... and completely cripple the user's creative powers, making him a passive consumer. you cannot get serious work done on mobile and that is as true now as it was ten years ago.

JodieBenitez · on Feb 24, 2022

> Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.

It's ok when you have a file type exclusive to an app AND the app provides export functionality. But it breaks as soon as you need to share files between apps.

Arguably, this is more about implementations than the principle, but ask any musician about the IOS music apps and they'll tell you they're great... except the file management.

kryptiskt · on Feb 24, 2022

> Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.

File browsers got very popular very quickly on Android and there is a bundled one on there now. So mobile has shown that despite the designers attempt to deprecate files, it didn't work out.

Double_a_92 · on Feb 24, 2022

I strongly doubt that regular users ever bothered to use Android's file system. If there are even files involved, most people will just use the app that received or created that file to interact with it. Especially since apps sometimes put files in completely random places, that even I as a technical person have trouble finding.

onion2k · on Feb 24, 2022

I imagine the number of users who have opened the file browser app is a low single digit percentage of Android users. Maybe even less than 1%.

midasuni · on Feb 24, 2022

So 3 million people in the US alone

_se · on Feb 24, 2022

1% of American Android users, not 1% of Americans. Far less than 3MM.

midasuni · on Feb 24, 2022

330m Americans, almost all have at least one smart phone, about half are androids, still taking millions of people, hindered of millions world wide.

ungamedplayer · on Feb 24, 2022

I am the 1 percent. Yay.

atoav · on Feb 24, 2022

> I'm not sure that's true, because no one does that on mobile devices.

Even my mother sorts her pictures into gallery folders. Granted – a lot of sorting on mobile happens automatically (per app).

But "consumer devices don't need an accessible file system" is not a good argument to extrapolate that to machines people use productively. Don't get me wrong, I do think we can improve filesystems in terms of usability – I just don't think having some of it in your head will go away any time soon (and if it does, it will not be an improvement).

My point is, that in a productive environment the filesystem becomes part of your brain, just like a carpenter's workshop becomes part of their brain. This is not a bug, it is a feature. You don't need to think about where things are, because you arranged your environment in a way that suits the tasks you are doing 99% of the time. Now if someone came in and arranged the tools for you, moved them around automatically by their own logic, chances are that it doesn't fit your current task, your personal preferences, etc.

Moving from a world where you blindly know where something is, to one where you have to guesstimate what another entity "thought" would be an appropriate place for the thing they are looking for is not progress. If you were to make a automatic system that can read thoughts and put the file precisely in the place people are expecting it to be – that would be an improvement, but everything else not so much.

The key difference for mehere is the one between productive work and consumption: If you are in a space where you are consuming (e.g. a food on a buffet) it is totally acceptable to not have it your way. Who cares if it takes you 5 seconds more to find the balsamico for your salad? Tasks that you don't do productively like looking at pictures on your smartphone – who cares if it takes you a minute more to find a thing? But if you are a professional photographer and you look for that one picture you took in a specific session 4 years ago not a lot will beat a well built folder structure.

Double_a_92 · on Feb 24, 2022

> no one does that on mobile devices.

Because you don't really have valuabe data on a mobile phone. It's mainly just photos, and they are all in one folder ordered by date. So adding tags to that is a feasible strategy.

Everything else on the phone most people don't consider as permanent data, so it's not worth organizing it. You contacts are in the cloud, so are your chat logs,... And app configuration data can always be recreated with some effort.

thefz · on Feb 24, 2022

Let's remove street addresses altogether, because that requires hard memorization, right? Instead let's all put any house of a city all along a contiguous space and refer to them via a description of their appearance.

Pasorrijer · on Feb 24, 2022

I don't disagree with your premise that mobile abstracts away the file system. However many people do put rather a lot of effort into organizing files on mobile. I clean and sort my downloads folder just like on my computer, but more importantly the majority of people I know use folders to handle the now thousands of pictures we generate on mobile.

dls2016 · on Feb 24, 2022

> Some people have even suggested that young people who've grown up with mobile phones struggle with filesystems because they have no experience of file management despite having plenty of experience of computing.

Currently teaching introductory programming at college level, can confirm.

bayindirh · on Feb 24, 2022

> But no matter how good search gets, it will not stop users from putting files untagged into one big junk folder.

I personally don't see folders (or traditional file organization) and tags as competitive technologies. I think they're complementing each other very well. I generally put my stuff under well defined folders, but tag the notes (or the files if I have the capability).

95% of the time, I can just go to the folder and get what I need, but sometimes I need to search something which I don't remember whether I have it or think I misplaced. In that case file indexing and search really comes in handy.

I apply these methods at least in three places very enthusiastically: Pagico, Evernote and Tiddly Wiki. Both have hierarchical organization models (It's fixed in Evernote, not mandated in Pagico, and Tiddly is just free floating by nature), but they're meticulously tagged. I rarely use search in either of these. However this doesn't mean tags save me serious time or effort. I think both ways of organization is very useful, at the end of the day.

As a pet peeve, I really don't like this strong worded titles and posts. You don't have to kill something working well to enhance it with something to make it better for some or every use case.

makeitdouble · on Feb 24, 2022

I’m on the side of peoples who gave up on organization entirely, and have a stream of scanned documents with only the scanned date as the title for 99% of them.

The docs are OCRed and I retrieve them mostly by search, with tags for a few critical docs and by approximate dates for the rest if search by content fails completely.

This is viable, and there’s no way I’ll go back to manually setting up tags and names on all the docs that we scan in case it’s needed some day. It’s like asking everyone to do inbox zero with their mail, why would you put your time in the hand of uncontrolled external forces feeding you more info day after day?

ParetoOptimal · on Feb 24, 2022

> But no matter how good search gets, it will not stop users from putting files untagged into one big junk folder.

The difference is it's more likely the user will notice other tags the apply in each thing that would have been moved to 'junk' then if they had to more coarsely categorize all the junk in advance.

laumars · on Feb 24, 2022

I’m all for optimising usability but there comes a point where one over optimises. While I love the app-orientated design of Android and iOS (ie you share data between apps rather than apps sharing a file system), they are effectively toy OSs for toy devices. Sure some people are hugely productive on them these days but they’re the exception rather than the norm. Whereas I depend on a file system to organise my data. In fact the file system is directly interpreted as name spacing in a number of different programming languages.

I get that most people are either too lazy or too technologically inept to own a computer but this race to the bottom to support everyone who doesn’t give a crap has to end somewhere. You see it with Windows UI in Win11 removing popular options so the designers can streamline the UI. You see it with this article too. Some people are always going to struggle simply because they are forced to use a computer day in and day out rather than them wanting to use it. But designing a unified system pandering for them but servicing everyone just makes the experience shit for those of us how genuinely know how to use a computer and depend on these features.

To use a car analogy (because for some reason people love comparing cars to computers…) I have no issue with track cars being sold without air con, a radio, etc because they’re a toy not a tool. So you optimise for that single purpose: racing on the track. But I sure as hell want the kitchen sink thrown into my family car.

I sometimes wonder if the problem isn’t computers but rather our assumption that everyone should be able to use a computer without training. If your job depends on using a file system correctly then you should be trained on that in exactly the same way that you’re taught how to use any of the specialist applications. In fact pre-computers, companies did exactly that with training their staff in how the filing system works!

selfhoster11 · on Feb 24, 2022

By analogy: if you were to store all your paper copies of bank statements, bills, mortgage papers, etc... Would you just dump them in one big pile in the middle of your living room, or would you sort them into vaguely themed folders to impose some organisation? Hierarchical filesystems are valuable to those that want to organise data in ways that tags can only emulate.

dividedbyzero · on Feb 24, 2022

Can I shout "Plumber invoice 2021" at that pile and the right document will come flying out? Or just "invoices 2021"? A pile that could do this would probably be fine for me.

grumbel · on Feb 24, 2022

The difference is that if a file isn't sorted in a file system, it just sits on your Desktop or Download folder and is still easily visible. If you use tags and your file doesn't get tagged properly it just is lost in the pile and impossible to retrieve.

I find tags work very well for discovering other peoples content on the web, but really don't help much with organizing your own data.

One operation that seems especially problematic with tags is copying. If I want to modify something on a file system and keep a backup, I just copy the directory tree before I do my modifications. If I copy something with tags, I end up with two things with the same tags, which is not very useful for keeping them separate.

Also how do you deal with removable media (USB, DVD) in a purely tagged system? What if the tags on the media conflict with your own tags? Once you allow filtering by device, you are just reinventing the file system again.

Double_a_92 · on Feb 24, 2022

The solution should be a combination of folders and tags. E.g. imagine a folder that just contains all your photos without any substructures. It would be easy to just select e.g. all "2018 photos", "birthday photos" or "photos of your parents" in there without needing specific subfolders for those things (especially since those subfolders would conflict with eachother).

But I agree that I wouldn't want those pictures to be mixed with e.g. other random screenshots or drawings that I made, even if they could be separated by tags somehow. So folders as a hard separation would still make sense.

swores · on Feb 24, 2022

> "The difference is that if a file isn't sorted in a file system, it just sits on your Desktop or Download folder and is still easily visible. If you use tags and your file doesn't get tagged properly it just is lost in the pile and impossible to retrieve."

Wouldn't a shortcut to a view of un-tagged files sorted by recent basically serve the same role as an unsorted Downloads/whatever folder?

NBJack · on Feb 24, 2022

Anecdotally, my short answer to this is 'no'. At least a Downloads folder has a loosely defined purpose. I have often gone back to try and find something I downloaded that's no longer available (or too big to download again), and it's tricky at best to find something already.

Now, imagine how many files are being shifted around on a regular basis. Temp files, cached downloads, automatic installs of software updates, all sorts of crap your IT department may remotely put on your work laptop, etc. Sorting by date isn't all that useful. At least with folders, we have an 'enforced' blast radius if random junk shows up; my temp files stay in my temp folders, install files stay where they should, system files stay with my OS, etc.

And don't get me started on how things can end up in odd places on mobile devices OSs.

grumbel · on Feb 24, 2022

The lost file might not be recent and it might not even be untagged, it might just be tagged incorrectly (e.g. `vacation-2019` vs `vaccination2019`).

The nice thing with a hierarchical directory structure is that every file has a place, even if a file is misplaced or misnamed, there is a good chance it will be near where it needs to be.

With tagging you don't really have that, it's just a pile and you have to hope that you can remember a query to make the file show up again.

The biggest problem however is that I don't see how you can actually work within a tagged system. How do you extract a `.zip`? How do you copy a file? How do you deal with removable media (DVD, USB)? Finding a file and handing it over to an app is not the only way we deal with files.

Your local file system is a work environment, where you are the one creating and modifying files. Tagging seems to works best when it comes to exploring around other peoples content, but that kind of exploration is not something I do on my local machine with my own files, since I already know where I put them.

autoexec · on Feb 24, 2022

As long as you know exactly what's in the pile I guess? A good folder/directory structure tells you pretty quickly what you've got available and makes it easy to browse and explore even if you've got no idea what you might find going in.

_solr · on Feb 24, 2022

A directory structure : Invoices > 2021 > Plumber is easy to browse and makes it easy for a machine to retrieve it as well.

Double_a_92 · on Feb 24, 2022

It gets annoying though when you have just single invoices or documents that don't quite fit in your structure. I eventually started throwing all received documents in one folder and used the name as tags basically. E.g. 20210214_plumber_invoice.pdf or 20181204_someshop_invoice_playstation.pdf... (And yes I use the filename for tagging because I don't trust e.g. Windows tagging system to be there forever or be copied properly onto other operating systems.)

ibmfud · on Feb 24, 2022

Please can I have all the Plumber correspondence from Joe the Plumber, regardless of year or document type?

_solr · on Feb 24, 2022

Do you mean Joe the Plumber from London, UK or Joe the Plumber from London, Ohio ?

We all know how this ends up. It ends up being like Google where the search engine uses word embeddings and the like and removes word from your search queries or replaces November by December because they are both months so you can substitute one for the other right ?

ibmfud · on Feb 24, 2022

Don't care, just give me both. Since I've never been to London Ohio I don't think it will be a problem.

majewsky · on Feb 24, 2022

Modern shells have double-star globs for traversing arbitrary numbers of layers, so I would do something like this with:

  ls **/*Plumber*

ibmfud · on Feb 24, 2022

Thanks, you just got me all the docs from Samantha the Plumber and Amit the Plumber too!

scaryclam · on Feb 24, 2022

And tags don't help with that, when you just tagged everything with "Plumber".

didgetmaster · on Feb 24, 2022

They could help if the tags were 'Occupation = Plumber' and 'Name = Joe'. Now a search for all files where both tags are present will get you Joe the Plumber's invoices. If you want everything from any plumber or from any person named Joe, then just leave off one of the tags from your search. It is very much like when querying rows in a relational database, just adjust your WHERE clause.

_solr · on Feb 24, 2022

You just summarised the whole argument : you can go very far with a well structured relational database.

didgetmaster · on Feb 25, 2022

I agree. Unfortunately, right now the 'well structured relational database' is completely separate from the file system. Didgets was designed to combine the two into a single coherent system so that you can't update one without the other. By 'combining' doesn't mean I did what WinFS tried to do and just take a filesystem and a database and stick them together somehow. I built a completely new system from the ground up that incorporates traditional filesystem features (block allocation, stream management, metadata control, folder hierarchies, etc.) with solid relational database features (schema, tags).

selfhoster11 · on Feb 24, 2022

If your memory is anything like mine, chances are you momentarily don't remember the word "plumber".

Hierarchical structures, while inflexible and sometimes prone to mis-categorization, provide navigational cues that tags don't provide. It's almost like with GUIs vs CLI - if you know what's already possible and want to express yourself precisely, you want a CLI (tags with lots of Boolean operators to precisely include/exclude). And conversely, if you don't know what's already possible, but could figure it out if you have the options laid out in front of you, then GUI (a hierarchy with all choices already laid out) will be more relevant.

gpderetta · on Feb 24, 2022

brute force grep -R for the win.

ibmfud · on Feb 24, 2022

I keep passports, degree certificates, deeds, health insurance docs.. the things I would grab if I was running out the door, in a single box file.

Everything else is basically unsorted, maybe vaguely sorted by date of putting on top of the pile, by placing things together after searching for them once, or by 'I think I know where I saw it'.

I have tried putting everything in themed folders, it's a waste of time. The time spent searching for something is much less than the time spent organizing everything in advance. The modal piece of paper will be thrown away after a few years without ever having been needed.

selfhoster11 · on Feb 24, 2022

Is that not effectively a first-level hierarchy with no further subdivisions? The "important stuff" category and "everything else" category are already a useful taxonomy, even if very minimalist.

didgetmaster · on Feb 24, 2022

One of the biggest problems with folder hierarchies is that files can often be classified in several different ways. To take your paper statements analogy, do you organize by year, by institution, or by category? What if you have a 2002 bank statement? Do you put it in the '2002' pile or the 'bank statements' pile? Using existing file hierarchies allow you to store the digital document in the '2002' folder and then create a hard or soft link in the 'bank statements' folder, but that can be a hassle. Tags allow you to attach them to documents, photos, videos, etc. without worrying about how you might organize them. Luckily, Didgets lets you organize your file using either a hierarchical folder structure or just by using tags. It is your choice.

selfhoster11 · on Feb 24, 2022

To me, it seems best to exclude as many possibilities as I can during each step of the (naturally recursive) search. Filtering by "is bank account statement" excludes a lot more files than "was incorporated into my files in 2002", since most people only have a few bank accounts but a lot of photos, videos and other things that they create or download in a given year.

I think the best system is actually a mix of hierarchy and tags. Top-level, very broad "semantic zones" (aka is this .PDF a bank statement, a cake recipe, a textbook, or some temporary file from the browser cache) would lend themselves to being represented as a shallow hierarchy, and items within a specific semantic zone could be then freely tagged or further subdivided into a hierarchy, whichever approach makes sense for that particular semantic zone.

didgetmaster · on Feb 25, 2022

You assume that there are fewer bank statements than 2002 files in your argument. What if you loaded in a million bank statements in 2005 but only created a few thousand files in 2002? With Didgets, I can tell how many objects have each tag attached so I order the search to eliminate based on how likely the set I am searching for has each tag.

selfhoster11 · on March 2, 2022

How many bank accounts does an average person have? Even for the most extreme cases, we're looking at the low hundreds of statements annually, at the maximum. If you're not an average person but instead a business or an archivist, then you need a custom system anyway.

I'm really not trying to criticise or diminish the value of your system. All I'm saying is that even without an additional tag (or hybrid tag+hierarchy) overlay, a hierarchical system can be quite useful as long it's well thought-out by the user.

Jensson · on Feb 24, 2022

Having every file pollute a global namespace seems to require more discipline than the current hierarchical system where you can easily copy a directory tree without having to worry about breaking something else.

That is the main problem with these so called "solutions", they usually take more effort and discipline than the problem they originally set out to solve. The right solution is just to learn the original system properly rather than trying to invent an even worse way to work around it.

rolandog · on Feb 24, 2022

Indeed; on one occasion I had to help my wife troubleshoot something on a shared folder in her workplace's Google Drive...

I was shocked by my wife's colleagues extensive use of special characters because they wanted their file to appear first.

The proposed solution won't be any better if the average user doesn't know how to name things properly or how to search for them.

Khoth · on Feb 24, 2022

Yeah, you can see the downside in the demo video. When he shows off the search for pictures, there's a random mixture of actual photos and things like toolbar icons and whatnot. Sure, you could fix this by tagging everything and doing a more complex search, but that sounds like a lot of work and discipline, more than eg the guy doing the demo was willing to put into it.

didgetmaster · on Feb 24, 2022

Actually no. I wanted the demo video to be short (4 minutes) so I didn't do a lot of complex searches. I have other videos, but to show everything takes a 20 minute video and I didn't think that was a good length for an introduction.

ptero · on Feb 24, 2022

The article proposes to replace the current file system approach (which works just fine for me, by the way, thank you very much) with something different to solve a problem that I (just like the post you reply to) have no interest in.

Better search? Sure! Improved speed of storage and retrieval? Great! But either show that it is not degrading current functionality or be ready for pushback from people suspicious that their current setups will break. My 2c.

bawolff · on Feb 24, 2022

The solution being proposed here seems to also involve a lot of discipline (files aren't going to tag themselves or at least not usefully)

max51 · on Feb 24, 2022

don't you think a tag system would require even more discipline?

what do you think happens if you make a mistake with your tags and/or there are typos in the filename? With a directory structure, you can navigate to the location and see the list of items to quickly identify what you were looking for. It is far more forgiving when it comes to poor organisation or mistakes. With a pure tag system, a file with the wrong name/tags is pretty much forever lost.

didgetmaster · on Feb 24, 2022

Not necessarily. Missing or misspelled tags could be discovered just like a row in a database that has a column value that is missing or misspelled can be. For example if you want all your photos to have a 'Year' tag attached for when the photo is taken, just query for all photos WHERE 'Year = NULL'. The same goes for values like names. If you see that you have 10,000 files that have 'Name = Karl' attached but only one that has 'Name = Kral' attached, then that is an easy fix.

max51 · on Feb 24, 2022

With only a few hundred file, it's easy to look at the list and spot the outliers. In a real world scenario, how would you know that a few dozen files are missing when you search for "karl" and the files tagged with "Kral" don't show up? On a small file collections that only you has access to, you might remember them and notice that they aren't part of the results but that doesn't work for large libraries or if multiple people are collaborating.

With a directory structure at least you can look into the folder of the project, see what's inside and open the files to find the one you were looking for. If you were looking for a specific Word file and only a dozen of them are present in the folder, you can always just open all of them manually to check what's inside regardless of how poorly they were named/managed. Good luck trying to find the Word file with bad tags when searching for "*.docx" return thousands of results.

didgetmaster · on Feb 25, 2022

Cleaning data for tags is about the same as cleaning data in a relational database table. Here is a demo video of how Didgets does that: https://www.youtube.com/watch?v=kqkNeU1LYEQ Just think of each defined tag as one of the columns in the table.

max51 · on Feb 25, 2022

If you were trying to find a physical copy of an important tax related letter, would you prefer to search for it in a folder dedicated to tax document from that year or from a room filled with every single piece of paper that you have ever received by mail in your life?

A pure tag system only works for small libraries, it requires far more discipline by properly tagging every single file, it does not scale and it does not work well when you collaborate with other people. It works well in situation where you can automate the tagging (eg. a collection of pirated moves) but is pure garbage for normal files that you typically use.

It's a lot easier to tell people to place pictures of karl in the "karl" folder than it is to make sure that every single picture gets properly tagged with the word "karl". I can imagine hundreds of different scenarios where it gets tagged slightly wrong. Typos won't be easy to fix because they will simply not show up in the search when you type it. How many files with "K arl", "Carl" or " karrl" are there? no one will know.

didgetmaster · on Feb 25, 2022

There seems to be a lot of confusion here about Didget's tagging system. It is not meant to replace the file hierarchy, but to supplement it. With Didgets you can still organize all your files in a plain old folder hierarchy without tagging everything. Tags just provide a secondary way to search for things. So you can still stick all your photos of Karl in a folder named 'Karl' if you like.

UltraViolence · on Feb 24, 2022

Because such a system would entail its own drawbacks, such as a larger CPU load or a more fragile disk organization, whilst most people wouldn't really need it.

nonameiguess · on Feb 24, 2022

This is obviously hyperbole and you're well aware you haven't read 100 million Hacker News comments, let alone that many with exactly the same basic message.

But I was curious what this might be equivalent to in terms of time investment. A quick style guide check recommends 15-20 words per sentence for English language written communication. Assuming the low end of that, and minimal single-sentence comments, that is still equivalent to reading the entire 14-book Wheel of Time series, which tends to take most people several years, 369 times.

skywhopper · on Feb 24, 2022

I guess the point is, the proposed system wouldn’t actually be easier to organize. The metadata that would make searching so easy is what’s missing. But a new data structure doesn’t solve for the missing metadata. And without that extra metadata, searching would not actually be improved.

diffeomorphism · on Feb 24, 2022

> Yes, why would anyone need better search or a faster, easier to organise file system? I can't think why.

Sounds good, but unfortunately the article is not proposing any of that.

So, counter question: of course you need that, but how would the proposal of the article actual do anything about that?

brigandish · on Feb 24, 2022

Surely that's the tagging. I use tags extensively on my Mac because it's so useful to me but it's clearly an afterthought for Apple, and I struggle to use it at times.

Making tags a first class citizen would improve things immensely. The search index being a first class citizen again, would also improve things - why should I find Spotlight indexes loitering in the dark corners of my filesystem as dot files? I know there's a file index kept somewhere full of inodes and suchlike, why isn't search index data kept with it?

I also don't know why I have to rely on file system watchers that seem to be external to the file system and thus eventually sucking vast amounts of CPU when a hook into the main index would suffice. I don't write file systems so I can't tell why this is the case, or in fact, if it is the case but appears to me that it isn't every time I need to kill a file watcher.

Most of the suggestions in the article seemed good to me (immutable files, smaller meta data pages etc), I'm sure there are others around, but I'm also not sure why there's a need among some to protect the status quo by relying on good behaviour, of all things.

didgetmaster · on Feb 24, 2022

With Didgets, tags are an integral part of the system. They don't get lost or forgotten when you copy a file from one place to another. Searches are a native part of the system as well so you aren't relying on a separate indexing service that has its own database somewhere else. BTW, managing file data using folders and tags are just a few of the features of the system. I found out the columnar stores I used for tagging, were easily used to also form relational tables. I can load in a 100 million row, 40 column table and do queries against it much faster than the same data loaded into Postgres, MySQL or SQL server.

memetomancer · on Feb 24, 2022

> I use tags extensively on my Mac...

In other words, you have developed a disciplined habit of tagging your files. If I had a penny...

brigandish · on Feb 25, 2022

Where's the "don't need X" part? Have I dismissed hierarchical file systems out of hand? Where did I suggest greater discipline should be the approach.

I also use automated tools to help me with the tagging but I think that it's not a magic bullet - did I claim it was?

No, and I didn't do any of the other things I asked for evidence of either.

So, if I had a penny for every time someone misquoted me I'd have a penny more right now.

em-bee · on Feb 24, 2022

i try to organize my stuff, but sometimes i forget where in the organization i put something. then a brute-force search helps. if i keep good directory and filenames, then locate will do the trick. once i found one item, any related other things are usually nearby.

unfocussed_mike · on Feb 24, 2022

Plus, this is how we organise [0] stuff in real life.

Folders/boxes/envelopes in boxes. Boxes in cupboards. Cupboards in rooms.

It's easier to get to hierarchical filesytems from this. Things are found by their group, or their proximity to a more used item.

Filesystems, search, most-recently-accessed lists, an index; they are close to real life things.

In my fantasy world people would e.g. stick to .jpg, .JPG or .jpeg or .JPEG (pick one, damnit) but otherwise I quite like the tools we have.

[0] or try to

em-bee · on Feb 24, 2022

folders are more convenient because they are part of the file system. there is no ls by tag or even a gui filemanager that shows files by tag. that's one reason why tags need to be part of the filesystem, because if they are not, then most filemanagers would not support them.

and technically, file extensions are kind of like tags. and it's really ugly that they are in the filename string. that messes up a lot of things. it would be better if they were proper tags independent of the name. so you can rename a file without changing the tags, similar to the problem with EXIF.

or more importantly, you could reference a file without that reference depending on the tags of the file. your jpg/jpeg example is also a problem caused by this situation. it would go away with proper tags

unfocussed_mike · on Feb 24, 2022

macOS does store tags in the filesystem (which you can access using xattr at the command line) but I have no earthly idea how you find files by tag or really do anything with them.

The master tag list seems to be Finder-specific preference data though.

brigandish · on Feb 25, 2022

This is an example I use to find photos I've stripped the exif data from and tagged:

    mdfind -onlyin . 'kMDItemUserTags=exif-stripped'

unfocussed_mike · on Feb 26, 2022

ohhh thanks, this is something I needed to know.

em-bee · on Feb 24, 2022

linux has xattr too. so technically, our filesystems already support tags.

that means it is now up to the other tools to catch up and make use of them.

here is a discussion about tags and extended attributes in gnome. https://blog.chipx86.com/2005/12/07/tagging-and-the-gnome-de...

it is from 2005, so not really current, but the arguments are interesting.

in short: filesystem attributes are systemwide (but you and i may want to have different tags on the same shared file) and the user needs to have permission on the files, so you can't tag files that you can read but can't change.

i believe these issues are solvable, esp. the latter would work if we have permissions to add tags but not the content of a file. (like you can rename a file even if you don't have write permission to the file)

didgetmaster · on Feb 24, 2022

xattrs can certainly be used to store tagging info. There are a couple major problems with them though. 1) xattrs are not supported by all file systems and they are not enabled by default in some. If you copy a file with xattrs from one file system to another that either doesn't support them or didn't enable their use, then your xattrs are thrown away in the copy. 2)Searching for files based on xattrs in a large folder tree (e.g. several million files across thousands of folders) is exceptionally slow by nature.

em-bee · on Feb 25, 2022

right, but the alternative is no support for tags at all, so xattr gets us halfway there, and filesystems that don't have it need to keep up.

searching can be sped up by building an index. apps that want to use tags will need to do that, just like they build an index of files already. because searching filenames is also slow.

a version of locatedb that supports xattr would help for example, see https://en.wikipedia.org/wiki/Desktop_search

dividedbyzero · on Feb 24, 2022

> Plus, this is how we organise [0] stuff in real life.

The way I organize people I know and places and all sorts of other entities I cannot physically place into boxes and folders is a lot more like the tag approach, though.

unfocussed_mike · on Feb 24, 2022

Oh really? I've got a few boxes of guys stored away and, like Mitt Romney, binders full of women.

(*ba-doom-tish* for the-in-retrospect much-maligned Mitt Romney on this day)

vicda · on Feb 24, 2022

Google photos' style AI driven curation maybe?

I like the idea of having a queryable filesystem, but I wouldn't want that as a complete replacement of the directory structure.

brabel · on Feb 24, 2022

Google photos is pretty amazing. I enter a search for "car" and immediately can see the photos os several of the cars I've owned over the years.

One day I needed to remember when I had travelled to a certain city, searched on my Google photos and it instantly showed the photos I took in the city, including the exact dates.

Yes, I know letting Google know all about my life like that through photos may not be the greatest idea... but wow, does the photo search work nicely?!

derekdahmer · on Feb 24, 2022

The google photos image search is amazing. The other day I was trying to remember how long it had been since I smashed my toe doing yard work so I tried searching “toe nail” and it pulled up exactly the picture I was looking for.

selfhoster11 · on Feb 24, 2022

Unless it's local, hell no. Sounds like a privacy nightmare.

bombcar · on Feb 24, 2022

It sounds somewhat like “gmail for files” which is … problematic because email search works well enough because it’s relatively rarely done.

I suspect a system like this would work, but the tags would eventually be used by many as a way to badly implement a hierarchy.

throw0101a · on Feb 24, 2022

> It sounds somewhat like “gmail for files” which is … problematic because email search works well enough because it’s relatively rarely done.

Gmail does not work for me.

As someone in IT I get some number of automated messages (e.g., cron). With Gmail all I can do is tag them and have a "folder" / view of just those tagged messages. But they also pollute my Archive 'folder' as well.

But I do not want them there, because they are not a priority generally, and they pollute search results.

I want an actual separate folder to file these messages in that is out of the way so as not to pollute the rest of the namespace.

Liquid_Fire · on Feb 24, 2022

You could tag them with a special label (e.g. "ignored"), and then append "-label:ignored" to your searches.

Izkata · on Feb 24, 2022

Kinda like gmail's nested labels. Hierarchies win again.

Nuzzerino · on Feb 24, 2022

> eventually be used by many

And that’s the issue. The status quo may be the best choice for the lowest common denominator. But some power users could get much more out of something with a different approach. You can’t force a one-size-fits-all ontology onto the masses.

People need to wake up and realize that not all software technologies need to be popular to be successful or useful. It seems people around here assume this without even thinking about it first.

prepend · on Feb 24, 2022

My response to these types of proposals is “just imagine that folders are tags and each level of hierarchy is a tag, symlink for multiple tags.”

It’s funny because the author just proposed a different, I think worse due to novelty and minimal benefit, organizing hierarchy.

I think Apple has a decent approach where their spotlight indexes very well (I use hit command+space and the first letter or two instead of navigating finder), and they support tagging files.

didgetmaster · on Feb 24, 2022

When importing files into Didgets, the program automatically gathers information from the source file system and attaches specific tags to each file. For example, the file name is attached as a 'name' tag. Each folder name in its path is attached as a 'folder' tag. The file extension is attached as an 'extension' tag. In addition a SHA1 hash is created from the data stream and attached as a tag. You also imported them by dropping the files or folder onto a 'drop zone' on the create tab in the GUI. Any tags attached to that drop zone are also automatically attached to any file dropped on it. So dropping 100 photos on the 'My Wedding' drop zone might attach the tags 'Event = Wedding' and 'Year - 2022' to every photo. Searches for files that have a tag 'Folder = Microsoft' would find every file that had 'Microsoft' as a folder anywhere in its path.

everforward · on Feb 24, 2022

> it will be constrained to a narrow subset of directories and ignore the other 199.9 million files or whatever.

I think this is a vastly underrated point. I am usually not interested in searching the majority of files on my filesystem. I can't remember the last time I needed to search through system files for normal computer use reasons.

I also think the author completely skips over how to handle related files. If my application needs to load a library, how does it find the file to use? If it's by name, how are name clashes handled? I suppose it could be by tag, with built-in tags, but then you won't be able to change the tags without having to change configs or the binary itself.

friendzis · on Feb 24, 2022

The core problem at keeping the files organized is that unless you are dealing with a stream of effectively pre-tagged files, tagging/categorizing/grouping emerges after sufficient number of files arrive. Therefore organizing is proactive

bborud · on Feb 24, 2022

What this boils down to is that he thinks a flat namespace (tags) offers advantages over hierarchical namespaces (tree). They really don't. Once your tag space grows you will start to struggle with naming, and path-like structures (nested namespaces) start to creep back in. And you are right where you started: paths.

The treatment of immutability is too superficial to make any sense of so I don't know what the author is imagining. Ted Nelson has evolved some ideas on this for decades that might be worth knowing about. Some of which have kind of come to pass (if you squint and look at how non-destructive editing tools for video and audio work, for instance). However, very little of Ted's thinking has ever been burdened by usable implementation.

The concept of having multiple references to the same file already exists. So what he proposes can be realized with existing file systems just by introducing a different naming scheme and making extensive use of sym-/hard-linking.

Yes, a lot of file systems will have terrible lookup and traversal performance, but that problem exists in an orthogonal universe and can be solved. Is, indeed solved, in some fileystems if the marketing blurb doesn't lie.

If you think about how you would realize this using existing filesystems, by organizing them differently, the concept isn't as sexy anymore. Because it doesn't really involve a lot of new stuff and you start to see the inconvenience of having to cope with both novelty and problems you didn't have before.

The problems someone like me wants solved in filsystems are entirely different, and aren't so much about filesystems as it is about how you make the functionality useful to applications.

For instance, there are filesystems that offer snapshot semantics. Including COW-snapshots. This would be useful whenever applications need to do be able to roll back changes, switch between states, do backups while being live etc. Yet I know of no language which has snapshot as part of the standard OS interface. So people generally don't write application that take full advantage of what the underlying system offers.

horsawlarway · on Feb 24, 2022

Path based file systems take advantage of natural semantics we use for navigation. There is a wonderful overlap between how you navigate the real world, and how you navigate a hierarchical file system.

I have never (never ever ever) seen a tag based system actually work once you have large amounts of files and tags - Tags are manual, often duplicated with slight name changes or variations, hard to discover, and literally worse than a folder hierarchy for discoverability in almost every way.

Tags can be nice to have - but only if I also have a path. Otherwise they are utterly inferior.

formerly_proven · on Feb 24, 2022

Tags, being one of the most basic implementations of boolean retrieval, tend to suffer from feast-or-famine a lot, at least in my experience. Once you introduce hierarchical tagging, people will just use them like folders with each item having 1.0x tags on average.

didgetmaster · on Feb 24, 2022

The Didget system was designed to allow both a hierarchical folder tree as well as tags attached to individual files and folders. The tags do not replace the hierarchy unless you want them to. If you have never ever seen a system that actually works, then maybe you should put Didgets to the test. I created 20 million files in it and attached an average of 100 tags to each one. Each tag had a value randomly picked among 1000 choices. Queries to find all files with a certain tag (e.g. Tag_134 = Value_875) each completed in less than a second.

asoneth · on Feb 25, 2022

> I created 20 million files in it and attached an average of 100 tags to each one. Each tag had a value randomly picked among 1000 choices.

That is really impressive, but when the parent commented that they have never seen a tag-based system work with a large number of files and tags, I don't think they were making a statement of technical capability but of human fallibility.

My experience has largely been identical in both personal usage and in enterprise settings. Every time I've used a system that used human-defined tags as the primary organizing mechanism it has always ended in an unusable mess and in every case it is eventually replaced by some kind of hierarchy which usually ends up being a slightly more usable mess.

Perhaps combining them will yield the best of both worlds and perhaps with enough organizational discipline one can make a tag-based organizational system work. And I'm all for better search. But at the end of the day I am skeptical that giving normal people even more flexibility with how they organize their files will make their lives easier.

didgetmaster · on Feb 25, 2022

Most tagging systems that I have seen are free form. Anyone can just tag something with tags like 'James', '2002', or 'Bank Statement'. This makes it difficult to distinguish between them and easily find things like misspellings. All tags must be of the same data type (string). A generic term like 'Tank' might refer to a water storage device, a military vehicle, or someone's nickname.

With Didgets, I decided to go with a contextual approach to tagging. Just like columns in a relational table, a tag must be defined before you can use it and all like tags are managed together. A tag can have a data type so 'Year' can be an Integer, for example. The system comes with a set of pre-defined tags, but users can easily add whatever tags they might need. That way a tag has the form 'Author = James' or 'Device = Camera'. I went further and decided each tag definition would have two levels. '.person.FirstName = James' might be a tag on a picture of someone named James. This makes it easier to search for tags by group (e.g. find all documents that have '.person.*' tags attached). By managing the tag values together, the UI can quickly show a list of values that have been used previously (and order them by use count). When attaching names to photos, it can show you a list of the most used names and let you pick one or ignore the list and add a new one.

This system is far from perfect. Users can still misspell tags or categorize them incorrectly. But this can happen with folder names in existing file systems as well.

Just to be clear, tags in Didgets do not have to be the primary organizing mechanism. It has 'Set Didgets' that contain the IDs of all members and can be arranged in a hierarchy just like folders. When importing files, the UI creates these sets (unless the user specifically turns it off) and preserves the hierarchy of the source file system.

asoneth · on Feb 26, 2022

Those are improvements over free-form tags and I especially like the namespaces for attribute/field names, that's clever. The enterprise software my company makes has decent support for defining and applying structured tags like business terms and attribute values to the data objects our customers manage, and our users do make use of those when filtering and searching.

But (in my experience) many people still seem to gravitate towards storing and navigating objects hierarchically. I can think of a few possible reasons:

First, some people intuitively think of a data element as having a location in an information space. That is, they seem to intuitively remember "where" something is by piggy-backing on spatial memory in a way that tags don't seem to trigger.

Second, navigating a hierarchy involves a sequence of constrained choices, like a wizard. Having a sequence of decisions can be especially helpful for novices. It also generally takes a predictable number of steps to locate an item which can be preferable to something that is faster on average but has slow edge cases.

Third, at each level of the hierarchy you can often display all of the options meaning we can rely on recognition over recall[1].

(You could constrain yourself to hierarchical tags and use hierarchy-like positional language such as "object is in baz" at which point I'd consider it a hierarchy.)

Of course relying on tags has plenty of upsides -- typically faster, better mental model for overlapping sets -- and large-scale data storage systems need both. But at the end of the day they don't seem to be a replacement for hierarchical systems for most people.

[1] https://www.nngroup.com/articles/recognition-and-recall/

weberer · on Feb 24, 2022

>I have never (never ever ever) seen a tag based system actually work once you have large amounts of files and tags

If you've ever done online shopping, you probably have. For example, try going on Amazon or Newegg and searching for a GPU. You're shown a sidebar where you can easily filter results be certain tags such as: brand, price range, memory size, core count, in stock, energy star certified, free shipping, etc.

horsawlarway · on Feb 24, 2022

I don't think these systems work nearly as well as you do.

Simple example right now:

Go on amazon, and search for "intel CPU" - I see the following:

1-16 of 942 results for "intel cpu"

Now go back and search for "cpu", then filter by brand "intel" - I see the following:

1-24 of 835 results for "cpu"

It turns out the tags are exactly what I said they would be - a hodgepodge of things not correctly applied. For example - searching "intel cpu" actually returns items that include intel CPUs (such as motherboard + cpu bundles) that are missing in just the tagged search. But it's still absolutely a valid result if I was interested in buying a cpu.

---

as mostly an aside - I don't really trust Amazon or Newegg to be neutral in their results either, a tagged view is convenient to them as a seller where they can control results.

theamk · on Feb 24, 2022

... and on the top of the list, the first thing you click as a part of the filter, is a hierarchical locator:

Home Components -> Video Cards & Video Devices -> Desktop Graphics -> Cards Search Results: "GPU"

The tags system works for specific areas. For example, tags in photo management apps are great. But they don't really work across separate domains, so what you want is top-level hierarchy, and, where needed, tags for the subtrees. That's how existing tag systems work.

fluidcruft · on Feb 24, 2022

I think one of the problems is that there are many datasets where objects belong to multiple hierarchies and different hierarchies are more efficient for different tasks. For example I work with medical imaging data. Typically that gets organized around the DICOM object models which even defines multiple structures. Typically that's around the patient/encounter model and slight variations of that and data is stored in a database called a PACS, but working with PACS is extremely difficult because DICOM is optimized for clinical use cases. But there are other ways of organizing the data that are more efficient for other tasks for example for quality improvement or assurance or process monitoring. In fact different users of the data are likely to want views based on different hierarchies. Some software expects certain data layouts etc. There are some efforts to standardize file hierarchies and naming for certain tasks, but perhaps you're not doing that task. You can do things like symbolic links, but trees of symbolic links end up being super fragile in my experience and they're not particularly well supported on some operating systems.

bborud · on Feb 24, 2022

I don't think the filesystem is the right place to solve these types of problems. I also think if you try to make the kind of filesystems that solve these problems you'll invariably end up with entirely new problems you really don't want (complexity, performance issues, unclear semantics etc).

You usually want more flexibility and control over how your data is projected to storage. (Say for instance you run out of storage and the scheme doesn't have any way to split the data across multiple filesystems). And you really want integrity constraints that stop you from pointing into thin air and help you clean up. Occasionally you also need to have the concept of identity (how do you refer to a given entity directly) without it being part of a projection that may have stopped existing - like a tag being deleted).

fluidcruft · on Feb 24, 2022

I'm not so sure about that. I don't think anyone really wants to think about that at all. I'm probably spoiled by zfs and OneDrive but I think you just have pools of space and let the filesystem take care of itself. Plug in more space and the system run a "rebalancing" or whatever, etc. Let blobs move into online storage and fetch as needed or whatever. If I want a certain set of data just tell the system to "prepare" it for use.

bborud · on Feb 24, 2022

That supposes that you have a decent filesystem like ZFS and know how to configure it. I run ZFS and I can't even remember how to add a disk and increase the size of a storage pool without referring to the manual page or do a web search. And I only know where to look because I know I'm running ZFS. (If I'm lucky it is in the filesystem command history - I know people who have used ZFS for years who didn't even know ZFS had a command history).

And what if I unplug the disk and pop it into a different machine? Of if you decide to move to Windows?

I do a fair bit of photography. If it taught me one thing it was that if you design software for managing lots and lots of files that craps out if something spans a filesystem border, you'll have a truly miserable time. Photo editing software used to be like that.

fluidcruft · on Feb 24, 2022

I actually use Windows (and OneDrive) quite a lot nowadays. I have very little problem thinking of disk space as a sort of OneDrive cache at this point and basically you have different devices that are faster vs slower. And when that's the case detaching a disk just means removing a copy of the data.

But datasets in general on ZFS are great because you can just mount and unmount them at will. They're there on the disk, but if you don't need them they're not mounted. This is great if you don't want to accidentally modify some subset of data you're not working with. One of my favorite features of ZFS are the encrypted datasets where you can have chunks of data stored with different encryption on the sane disk and all the integrity and migration etc works without needing to decrypt anything. Which is also great because it means ransomware can't touch it. I do think it's useful to have some sort of "chunking" of the data like that which mostly maps to ownership. This data belongs to this client/project, that data belongs to that client/project, etc. Often those divisions have different use restrictions so I've found it great to isolate things that way. And again ZFS is great at this because you can just dump out the full encrypted datasets for handoff/archival or whatever.

But anyway I would point out that filesystem boundaries come from whatever the filesystem implements and exposes to the application. The only real issue with filesystem boundaries is that renames that move files between devices aren't possible. And that's an artifact of that filesystem's design needing to synchronize the hierarchy at a hardware level. If you could call up the same path and it comes from wherever it happens to be then it's not a problem. Like in OneDrive when you open a file that's only online and it needs to suck it down first.

kaba0 · on Feb 24, 2022

I completely agree on the constraints you mention, but I don’t actually see how these can be solved without the knowledge of the file system.

Let’s say you want to create a tagging system on top of a traditional file system. Let’s say you create a folder for a tag and make it have a symlink to each entry with that tag. Now any single move, delete operation will render your tag lib incorrect and there is no cheap way to correct it at each change.

theamk · on Feb 24, 2022

I think this is just a general lack of software for rare cases like clinical data?

I mean, it sounds like what you have described is solved for photos. I use digikam photo manager, and it automatically discovers all the photos on multiple the volumes, and supports showing files by date, location, tags, path -- whatever you like. And it is not very fragile at all -- it identifies photos by metadata-excluded hash, so moving the photos around does not break the links.

And back in Winamp days, it had an MP3 database which had basically the same properties.

I have no doubt that users could use more and better organization, but it seems like UX problem, not a filesystem one.

fluidcruft · on Feb 24, 2022

The problem is that all the data is not the clinical data. Images are fairly well understood for clinical use, it's all the non-imaging things that need to be associated with the images. It's images and a whole bunch of stuff that's associated with them that people want to do with the data that are non-clinical. For example now there's things like BIDS[1] which are basically applying rigid schema to a filesystem. But that's only because so many people got frustrated with everyone doing their own thing and having to spend time restructuring data to different site workflows. But even that only works for neuro, what about cardiac or liver or spectroscopy or... software that doesn't use BIDS etc? And even with BIDS mostly it's just copy rename to the layouts the software you're running actually expects. There's also a huge push for "Vendor Neutral Archives" to augment PACS which will allow things to be sort of structured and managed but that always seems to be uploading and downloading from websites so people still keep copying everything out because everything is so opaque vs filesystem.

> it identifies photos by metadata-excluded hash, so moving the photos around does not break the links

That only solves the problem in one direction. If you run across an image after it's been moved, you know which image it is and can index into the database based on that. But if you want to find an image starting from the metadata after it's been moved, then you're stuck trawling everywhere. The metadata-hash thing exists in medical images DICOM as various GUIDs (in a different format), so you can track things and updates that using that key. But if you have to visit 30 TB of files just to find one that's been renamed or updated, it's basically impossible.

[1] http://fmri.ucsd.edu/pdf/BIDS_Presentation_14NOV2018.pdf

DerArzt · on Feb 24, 2022

Couldn't that still be stored in a traditional folder hierarchy, but all in one folder and keep track of the tags related to file names in a sqlite database?

fluidcruft · on Feb 24, 2022

I think the point is that if you're just using the filesystem as blob storage and interacting with a layer database, then you've actually moved beyond the filesystem. Just replace filenames with inodes and that's what you've done.

formerly_proven · on Feb 24, 2022

But that's exactly what many applications have been doing for a long time, they're using S3-like services as CAS or quasi-CAS and generate views for different workflows on the fly from a separate database.

fluidcruft · on Feb 24, 2022

Why shouldn't that functionality be more user-friendly and accessible to users?

DerArzt · on Feb 25, 2022

It may be worth taking a step back and asking who the users are.

I know that like most folks on HN, I am not your average computer user so my view is probably a bit different. With that out of the way, I can say that I firmly prefer folder hierarchies for organizing my files over tagging. I switched to MacOS when I started my new job, and the fact that tagging is such a integral feature out of the box over plain folder navigation irks me. It's easier to save a file to the correct spot in the hierarchy and find it later rather than searching by tags to me.

Perhaps my view isn't consistent with a lot of computer users, but I will say that most people understand how hierarchical filing works better than tagging as one is more spatial than the other. After all, there is a reason that the memory palace technique for recall relies on imagining a physical space.

fluidcruft · on Feb 25, 2022

A hierarchy is actually a tag, so really what you're saying is you firmly believe in only having a single tag for each file and a specific adhoc schema for those tags. I think once you've used a filesystem that behaves in more complex ways you start to realize that's not necessary. For example with a snapshotting filesystem you add dates and times to the path in order to access older versions of files. I don't really use MacOS but I have noticed the tagging in my limited experience with it, which seemed reminiscent of what the old MacOS used to do where you could set icon colors. So whatever they're doing for tags isn't really the only story. MacOS is a good example though because that is a filesystem that maintains separate metadata in addition to the file contents (they used to be called resource forks, but I'm not up on modern MacOS). Many linux filesystems also can support that sort of thing. But those are just flavoring the files and not really adding different ways to index into the filesystem. A hierarchical index such as nested folders is probably a minimally viable solution and is very useful. But that doesn't mean it's not limiting and that there are not different larger solutions that achieve more useful results. Once you start thinking about snapshots and transparent de-duplication and other ideas like don't-repeat-yourself things wind up becoming far less clear.

TheOtherHobbes · on Feb 24, 2022

I'm not sure any of this really addresses the question - which is how do people really use files.

IME I have a number of live projects which can contain various numbers of source files, images, web links, PDFs and other documents, text files, and so on.

Then there are a number of files I access regularly which may not be associated with a project (like favourite music).

Then there's a mountain of data which is just there in case I ever need it. It includes backups of old projects, documents, music and art I keep because I think it's interesting but haven't read yet, web links that are filed and then (sadly...) forgotten, and so on.

I don't know how typical this is, and it doesn't matter. Because neither a tag based nor a tree based system address the real issue - which is designing a custom file workflow that collects related references of all kinds, doesn't confuse working data with long-term storage, allows off-site backups, allows collaboration, supports versioning on demand, and also makes it easy to find things.

I suppose all of that means some kind of process API which does a lot more than file.open() and file.close().

It could be built on tags, it could be built on trees, it could be built on some combination. Or on something else entirely.

The implementation matters a lot less than a set of available features which streamline common tasks in some fairly standardised and effective way.

cptskippy · on Feb 24, 2022

> The concept of having multiple references to the same file already exists.

It does and it's really bad IMO. The author's suggestion of unique identifiers though would introduce all sorts of new problems, primarily it would make the transparency problems of existing systems worse.

Most applications rely on the location of a file, relative or otherwise to load data (e.g. configuration). That reliance is exploited by software engineers to implement configuration swaps, event processing, and many other features. Referencing files based on UIDs, or a series of tags that aren't guaranteed to be unique or not known to be off limits to regular users, would introduce all manner of complications.

I could also see it being terribly easy to introduce bugs loading files using filtered tags. Would applications need to have relative tags to mitigate these problems? Having unique paths works both as a filter for the user and an encapsulation for a system that allows you to localize your concern. Without that encapsulation by default, you will be spending a lot more time and concern dealing with files and tags.

6510 · on Feb 24, 2022

There is a lot of stuff that should NEVER appear in a mixed view. (Google is full of examples of that.)

Tag coulds and other meta data can still be very useful. The challenge is creating useful tags/meta data automatically. For example a time stamp for every modification or a label for every application that created, modified or loaded the file. Perhaps even the applications you were using when the file was created/modified and the file names of the files loaded into the application. Train some AI to show you files you probably want given your current activity.

magicalhippo · on Feb 24, 2022

> And you are right where you started: paths.

A big difference is that one can naturally have multiple tags, and an entity could share tags with other entities.

Sure you can use hardlinking when it comes to files, but it's tedious and you can't have multiple files hardlinked to the same path.

bborud · on Feb 24, 2022

Directories of links.

And yes, even with a layer on top of the FS to provide an abstraction (as an API, for instance) so you can build shells and applications, a tag + search based system would quite possibly be tedious to use.

I also don't think 64 bit ints provide a good way to definitively name things. Most people can't make sense of a list of 10 ints, but they will be able to remember at least where to look if you give them full paths.

magicalhippo · on Feb 24, 2022

> Directories of links.

Well you'd still have to guarantee uniqueness of the filename within the directory. For example I have several files ala DSC00005.JPG which are not identical, because the camera reset the counter every now and then.

> I also don't think 64 bit ints provide a good way to definitively name things. Most people can't make sense of a list of 10 ints, but they will be able to remember at least where to look if you give them full paths.

I agree that 64bit ints is not a stellar solution. If anything it should be an something like UUID, so it can be unique across filesystems, and something the users shouldn't normally have to deal with.

bborud · on Feb 24, 2022

The easiest way to do this is to use the inode for naming the link.

bin_bash · on Feb 24, 2022

what's wrong with tags on top of a tree? Do tags even need to be part of the filesystem?

oldsecondhand · on Feb 24, 2022

Maybe not part of the filesystem, but part of the OS API, so that every filebrowser can support it.

bborud · on Feb 24, 2022

I think it could be solved outside the OS, but the challenge is that you would need to define some common APIs and get them into the standard libraries of programming languages. You would need a service API so that you can plug in the tagging service backend of your choice (or something that comes with the OS).

There is nothing wrong with having a userland tag management service. In fact, you'd probably want it in userspace if possible.

Implementing a proof of concept for this would have been easy if it wasn't for the fact that getting dirent to inode is fast and getting from inode to dirent(s) is very much not fast (since there is a risk that the file may be renamed).

theamk · on Feb 24, 2022

Part of common GUI toolkit, you mean?

The filebrowsers are not part of the OS.

magicalhippo · on Feb 24, 2022

I was just pointing out that tags bring you something filenames don't.

Sure you could put tags on top of a filesystem like we do now. It's slow and require per-application support.

falcolas · on Feb 24, 2022

> but it's tedious and you can't have multiple files hardlinked to the same path.

Little trick I learned to help sort images: Make a copy of the file in as many locations as you like, then run something like borg backup. One file, hardlinked in as many directories as you want.

timetraveller26 · on Feb 24, 2022

The problem of hierarchical file systems and data location is a really old problem that has had many implementations (I even tried building one many years ago).

Somewhat related:

Tagsistant https://news.ycombinator.com/item?id=14537650

TMSU https://news.ycombinator.com/item?id=11660492

BeOS File System https://news.ycombinator.com/item?id=17468920

TagSpaces https://news.ycombinator.com/item?id=12679597

git-annex https://news.ycombinator.com/item?id=29942796

Names should mean what, not where https://dl.acm.org/doi/10.1145/506378.506399

Unfortunaly it's not easy to get a real solution, and many people don't think that there is a problem at all (based on some comments in this thread).

Now adays I use git-annex, though it does have it's perks it seems a step in the right direction.

unqueued · on Feb 24, 2022

The pattern that has worked out really well for me, is to just organize my data into specialized collections, and not worry too much about the underlying filesystem.

I can't believe how much trouble I went through trying to find the filesystem that could do everything.

I mostly use git-annex, and various tagging systems, or just git repos. Now my data is much more portable and flexible. None of these tools are perfect, but I'm using tools that are mostly good at the job.

Whatever problem you are trying to solve, you probably don't need to solve it for your entire filesystem.

nmlt · on Feb 24, 2022

Also related: https://news.ycombinator.com/item?id=29141800 (discussion of differences between hierarchical and tag based file systems)

zvr · on Feb 25, 2022

Also related (learned this from HN a couple of weeks ago):

SuperTag https://amoffat.github.io/supertag/

ThePhysicist · on Feb 24, 2022

Systems that try to get rid off the "files & folders" abstraction of a file system tend to have much worse usability, in my opinion. I have an iPad Pro, and the lack of true file system abstractions is so painful. Every app has its own way to store and retrieve data, there's almost zero interoperability and it's super painful to copy, paste and move stuff around (I know it has gotten better but it's still so much worse than on any desktop OS).

I'm all for enriching the concept of a file system with additional meta-data (in fact many files do that) but I don't think that needs to happen in the file system itself. For example, software like Picasa leveraged meta-data contained in files to provide a new way of interacting with large number of photos. The author basically proposes to put such functionality directly into the file system, but I'm really not sure if that's a good idea. Right now it's easy to move files between different systems, e.g. from Mac to Windows or Linux. If file systems become meta-data management databases that will become much more difficult.

slightwinder · on Feb 24, 2022

> Systems that try to get rid off the "files & folders" abstraction of a file system tend to have much worse usability,

IMHO it's because those systems just simplify, but don't move very deep in the space they opened up. If you don't offer power, then it's irrelevant which system you offer, they will all suck fast.

> Every app has its own way to store and retrieve data, there's almost zero interoperability and it's super painful to copy, paste and move stuff around (I know it has gotten better but it's still so much worse than on any desktop OS).

Which is kind of a surprise, I would think Apple would be interested to unify that space and offer a good user experience.

> Right now it's easy to move files between different systems, e.g. from Mac to Windows or Linux. If file systems become meta-data management databases that will become much more difficult.

Theoretically, it could be solved by using a meta-file-container. Something like a tar-container, which contains a file for meta-data and the actual content. We have this with specialized container-formats in media and office-filetypes. Making a universal format which would work equally well for any kind of file type could solve this problem of interoperability. This would even open up ways to improve files without changing them directly. Like adding subtitles or notes to a file, by just adding it to the container, not the file itself.

EleanorKonik · on Feb 24, 2022

App sandboxing on iOS is supposed to be a security feature, I think? But it makes it impossible for apps like Obsidian to work with apps like Dropbox, which benefits Apple; they force people to use iCloud.

kalleboo · on Feb 24, 2022

macOS has so much of the prep work for this kind of thing, but Apple has completely dropped the ball on the UI.

Spotlight parses and indexes all the existing metadata in your files (music ID3 tags, photo EXIF tags, etc - run `mdls` on a file in a terminal to see all the stuff it's extracted) and this could all be used to make some pretty powerful UIs, but all Apple has done is made one very handy universal search UI, and then a very poorly designed specific search UI, and then made stored searches (which are also useful, but limited in practicality by how bad the UI to create them is)

mikewarot · on Feb 24, 2022

I have 394,175 photos and videos that I have personally taken since 1997. They are organized by a simple hierarchical system in 5,356 folders.

D:\masterarchive\source\YYYY\YYYYMMDD\photo file name

If I want to find a person, in a photo, I've used Google Picasa (when it was an offline product) and lately digiKam to do face matching, and tagging them with IPTC metadata tags in the photo files. Thus they survive moves across filesystems, etc.

I'm up for seeing alternatives, but there's a very high bar to clear here. People have been using directories and file storage since the middle ages.

loxias · on Feb 24, 2022

You, as a data point (and I, and I imagine many people on HN), are strong evidence of the rule "sufficiently motivated and clever end users will always find a way to do what they want, regardless of the interface".

But that isn't really saying anything about if the interface sucks, or could be improved, just that you're motivated and clever enough to find a good and scalable solution for what you want to do given the limitations of the interface.

MrPatan · on Feb 24, 2022

I do the same. This is the only way to organize things, by date.

I do the same with documents. I don't even want to think about categories, ontologies are always wrong. But today is 2022-02-24, no two ways about it. It's automatic, there's no need to think or decide anything, so it's not a big deal, you just do it. You can't make a mistake.

The thought of needing to properly tag every document I file is enough to make that a task I want to postpone. So it wont get done. That's a worse filesystem right there, because it doesn't exist.

silon42 · on Feb 24, 2022

I agree that sorting by date/time is 90% correct default view of the data. Most file managers have this wrong.

It should be especially easy to have a full sub-tree view sorted by date, but it typically isn't.

mikewarot · on Feb 24, 2022

None of the files in the 1997 tree have a file date anywhere near that old, the drive they are on wasn't created until decades later.

For me, there is only one durable, universally supported tag for those files, which is the folder structure. Due to the way cameras number photos, for any given photo file name, there are likely 5-10 other different photos with the same name.

You might be tempted to then call for a standard tag that would be supported, but what about files relating to things of unknown dates? Fossils, antiques, draft x of the Declaration of Independence, or of things planned in the future, with dates still in flux?

Having one canonical path and filename for a given collection of bits is a really effective tool, that I doubt will be surpassed any time soon.

However, the next best thing, in my humble opinion, is to use a cryptographic hash of the file in question, as Git does internally. You could map a filesystem interface to a data store based on Git, as long as you don't expect high speed writes to work with performance. (because new checksums require computing across the entire file, even if only 1 bit changes)

tremon · on Feb 24, 2022

But look at the path components: they're not organizing things just by date. The files are first organized by purpose (long-term storage), then by origin, and only then by date.

So they've already made the decision to commit the files to long-term storage, and to keep the original photos separate from subsequent edits, and to keep them separate from other image sources (e.g. downloads). That "tagging" required very little effort because they could just navigate to the existing tag in the filesystem, and put the new files there.

kalleboo · on Feb 24, 2022

> This is the only way to organize things, by date.

I am terrible at dates - if I had to find photos by date I'd never find them. For anything older than a month that isn't on a known anniversary like a birthday, 90% of the time I find the photos using the map view in iCloud Photo Library. If I was limited to a filesystem view, my photo library would be far less useful.

samatman · on Feb 24, 2022

What you've done here (and it's no bad thing given filesystems!) is define dates as an ad hoc index over a primary key which is the pair (date, filename).

What I'd like, personally, is a way to expose any sortable EXIF data as a 'filesystem', for example `~/Photos/Longitude/122-123/Latitude/36-37/`.

Most of the commentary on a tag system seems predicated on the idea that we can't derive a large volume of tags automatically from the circumstances and provenance of the data. For instance, instead of a Downloads folder (per se, it would be a view) we could have a "downloaded" tag, which could have "downloaded-by" = "Chrome" and "downloaded-from" = "https://example.com/a-url/".

That's a lot more useful to me than a Downloads folder, especially if those tags endure when I add further metadata of the "canonical folder" variety, also known as "moving" the file.

didgetmaster · on Feb 25, 2022

I actually started coding a way to automatically extract EXIF data from all imported jpeg files and automatically attach them as tags to the file. That way you could search for photos taken with a specific camera, or only photos where the flash was used, or location data (if your camera had GPS), etc.. I just haven't had the bandwidth to get back to that feature and finish it.

samatman · on Feb 25, 2022

If you don't mind some drive-by advice: I suspect you have some kind of good idea here, and I'm having trouble really seeing what it is.

I can also read from your site and comments the frustration you're experiencing in getting your product shipped, and also in explaining the benefits of it. The Internet sucks, it's a hostile place, and unfortunately leaking the bad feelings this invokes in you is off-putting to your audience.

I've had an unpublished blog post sitting around called "file systems suck" so I'm about as sympathetic an audience as you'll find. Good luck with your implementation; I'll be keeping an eye on your project, and I hope to understand it better in some later iteration of the docs.

paulmooreparks · on Feb 24, 2022

People have been using tables of contents and indices for books and other printed materials for a long time, but those are redundant in the era of CTRL-F. At best, they're useful only as an adjunct to searching digital content.

Likewise, imposing archaic methods of organisation on modern storage is sub-optimal. The argument is to move toward something that makes more sense given the capabilities of the medium.

GrumpySloth · on Feb 24, 2022

What tags and searching do not provide is context. One affordance that putting files into a folder provides is reminding oneself that when I look at file A, I should probably remember about file B too. Later I may even forget about the existence of B, but when I go searching for A I'm going to see B as well.

Search and tagging are not contradictory to cataloguing. They're complementary.

It's also not true that old media didn't have any search facilities. Old technical books would each have an index of keywords at the end. That's search, just analog and requiring a bit more work from the publisher. This index didn't make the table of contents redundant.

WesolyKubeczek · on Feb 24, 2022

Listen here, a hot take incoming.

There are two absolute genius inventions in computers so good and timeless that the sliced bread pales in comparison like a stupid troll comment on HN.

1. The keyboard

2. The hierarchical filesystem

Everything and anything else in input devices and data storage builds on these and the best solutions ever always are going to augment these, never replace them.

A good tag system will build on top of a filesystem and coexist with it, and offer value like stupidly fast search. Anything else will be lucky to survive a weekend of dubious fame on twitter, or up to a few months if you actively market it.

pontifier · on Feb 24, 2022

I've never seen a way to organize files that feels like it prioritizes files based on how much they mean to the user.

By that I mean pictures I take, papers I write, things I really wouldn't want to lose, vs 10,000 random system files.

For downloaded files sometimes the history of when it was downloaded, and from where, is almost as important as the contents.

Backups from old computers, and old phones start to pile up, and the chaos of trying to find that picture you took 3 phones ago, or the notes you took, or the recording you made, or the pdf you downloaded, or that code you wrote, or that map you made, is a real pain.

Digital clutter is one of my biggest problems.

I really need a good way to deduplicate and organize ALL my digital stuff. Tags might play a role, but I don't think they quite solve the problem.

ho_schi · on Feb 24, 2022

It feels like the author didn't like tree based file structures? The provided software in the screencast remembers me of iTunes. Which I dismiss because it doesn't provide a logical *tree like structure*. And it is not a filesystem replacement either, it is a database which adds a lot complexity and hides actually data. Furthermore this assumes someone is maintaining the metadata (remember the MP3-Taggers?) instead of the files. Metadata itself is useful but the creator of the file should add it not the user. Regarding file manipulation the proven answer are file permissions but I think CGROUPs are the flexibel, modern approach.

Because I'm seeing "Windows Explorer" in background:

Windows Explorer has degraded in recent years, it is even hard to open your "home directory" and the UI is confusing. Look at the one from NT 4.0 which was much more close the fulfill the task.

And Apple:

I think the regret nowadays the howl iTunes? But instead the pushing hard on apps which contain the data. Now you have to look always into a single app and uses it facilities to retrieve a file. Android failed here, too. But using iOS is hard.

ubermonkey · on Feb 24, 2022

>it is even hard to open your "home directory"

Windows clearly doesn't want people to GET to their home directory, for some reason. That seems goofy. If people don't understand that $user contains the rest of those folders (Documents, Downloads, Pictures, etc) they'll never be able to navigate on their own. That's bad.

In a sane tool that features an address bar, clicking any given directory would show, in the address bar, the path to that location. WinExp only rarely does this. If you click on, say, Desktop, it shows you This PC > Desktop, implying a relationship that is incorrect. Getting to your home folder without typing requires you to start with C: and drill down, which is objectively insane.

Even MORE bananas is that if you start at C: and drill down to Desktop, you DO get the correct path in the address bar. But if you then make a WinExp shortcut of that location, it goes back to the other behavior. WTF.

ho_schi · on Feb 24, 2022

> Windows clearly doesn't want people to GET to their home directory, for some reason. That seems goofy. If people don't understand that $user contains the rest of those folders (Documents, Downloads, Pictures, etc) they'll never be able to navigate on their own. That's bad.

Yes. Windows prevents users nowadays from understanding a straightforward thing, file-systems. I mean it was always a bit clumsy with A:, C: and [D-Z]: and the weird desktop metaphor harmed as well.

Now I'm looking at the often criticized GNOME and the actually venerable Nautilus. They got it! All below / and in addition devices are directly usable (actually still somewhere below /run). The location bar reflects the current position. The desktop was removed because it never fit into a computer and the file system.

Some actions of Google within Chrome are also questionable. "There is not address entry field" because we don't want you to understand how the web is structured. What? File-Systems are a simple thing, hierarchical. And guess what, the web is similar. Compared to "right click", "double click" and it's new friends "long press", "hard press" and "swipe from somewhere" and "guess what the voice assistant can interpret".

olliej · on Feb 24, 2022

The idea of building “indexing” into the file system means either the file system directly understands all file types, ignores those it doesn’t understand (thus requiring an out of fs indexer), or requires the file system itself to be able to dynamically load logic to handle different file types. By the time you get to the last one all you’ve done is build spotlight(or the ms equivalent) into your file system, so now you’ve got all the cost of the indexer only now it’s in the process reading and writing the raw bits, and of course doesn’t index contents of any other filesystem (so you’re still going to be running an indexer).

I also don’t understand how a filesystem is going to store this data in such a meaningfully different way that it uses less space and/or is faster to index.

fxtentacle · on Feb 24, 2022

I don't think file systems will be replaced anytime soon because of psychology. The human mind remembers things best by attaching them to a real or virtual location. That's how all the memory experts do it, they construct a virtual house in their mind. Virtual rooms, shelves, boxes, and folders are no different. So if anything, I'd give the Filesystem different folder icons based on their depth to reinforce this similarity with the real world.

Also, the article seems to use strawman arguments. Nobody needs to remember the exact image file extensions. You just click on the "search for images group" in windows and it'll search all image file extensions for you.

In effect, tags are already there. It's just that they are automatically generated.

enkrs · on Feb 24, 2022

I like the idea, but I think it will not change the world.

Filesystems have already been reduced to storage mechanisms for systems not people.

People just don’t organize files anymore. And that’s a good thing.

Most employees in relativley fresh organizations keep their files in OneDrive and Dropbox. 10..15 folders of random names and good search function that returns recent files on top. The older files just lie there, not botheting anyone because nobody is looking.

Files from other departments are found via links in Mail and Slack search - not as attachments to Email.

People launch Powerpoint (online) and use the recent files menu instead of browsing from the ”C: drive”

To rethink storage ignoring that people don’t store files anymore is futile. It’s nice for organized geeks (like me), but in general file organization is a thing of the past.

tremon · on Feb 24, 2022

People launch Powerpoint (online) and use the recent files menu instead of browsing from the ”C: drive”

Yes, I do that too. But that's because I have to, not because I want to. Onedrive, Sharepoint (and I guess Dropbox too) are impossible to navigate otherwise, so yes, even people that understand hierarchies are forced to use an application's LRU list to find old documents.

That's not a sustainable situation. I foresee huge storage bills for organisations because they won't be able to afford to curate their growing terabytes of disorganized file storage.

theamk · on Feb 24, 2022

The tag-based location of user documents is nice, but why do people want to put it into filesystem layer? This seems like a bad fit.

- Tags in filesystem index too much. For example, if there is a program directory which happened to contain a .jpeg file, it should not be shown to user. Neither should user see files from browser's cache folder.

- Tags in filesystem index too little. Filesystems are device-specific, and a lot of times, you want to index across all devices in system. And maybe some files have no associated device at all, because they were transparently offloaded to cloud?

I think a much better fix would be to have an index database as a separate file, and filesystem providing a general support for it. Author says that the separate indexers might become out of sync or are slow -- but this is not inherent property of indexers, but rather the limitations of the filesystem design. So let's make filesystems more index-friendly:

- Make it fast & easy to detect individual file changes: every file has auto-updateable change time that user cannot mess with (linux already does this). Even nicer would be an extra timestamp which updates when content changes (not metadata) -- together with inode, this can detect renames easily and quickly.

- Make it fast & easy to detect past filesystem changes: There is a way to quickly find all changes made to the disk since some past moment: Merkle hash of directory + all contents is ideal (like ZFS maintains internally), or failing that, NTFS-style change journals can work too.

- Make it fast & easy to detect present filesystem changes: have powerful notification API that can detect all changes on disk. Perhaps also include first few kilobytes written to file for performance (so that file scanners do not have to open every just-written file)?

- Make it possible to "claim" subdirectory: something like a common attribute that advices common file browsers to avoid modifying the content. This way a software can use automatically generated names, and not worry about users just copying random files into arbitrary locations of structured hierarchy. (This should be bypassable by user with appropriate warnings -- this is UX mechanism, not security one)

- Perhaps a standard on how to store tags? All modern filesystems have attribute support, but AFAIK there is no clear consensus on how exactly it'd store the tags.

This way, one could have general tagging system, and winamp music database, and photo management app all looking at the same data and working together.

syntheweave · on Feb 24, 2022

I think most of the friction of filesystems stems from wanting application layer features in a construct that has always been decidedly "systems" and just holds data, while applications themselves have gone the route of appending more and more features into files. Since there's no middle layer, files have increasingly become the Armstrongian "gorilla holding the jungle and a banana", often duplicating state and metadata to get the job done. And it terrifies me when development tools, as they so often do these days, spray around files, because it usually results in broken dependencies somewhere down the line.

Another approach that could get at addressing this is to define frontend protocols to filesystems that do targeted, application-y things. This is done in informal vernacular often enough through things like naming conventions, but what we could really aim for is a specification that's a "form-filler" for each category, that consumes various document and data types and produces the desired kinds of metadata.

The difference between that and doing it as an indexer is that it could be seen in a bidirectional intermediation sense: if the protocol understands all the relevant formats well enough to parse them, it doesn't have to also hold a file, it could simply use internal structures and generate the file representation on demand if needed. But to do it properly these structures would have to have similar security and integrity guarantees to our current filesystems. And exposing a frontend like this does add surface area, with the silver lining of "if it's pushed down the stack, then fewer application coders will have to roll their own terrible version of this functionality".

theamk · on Feb 24, 2022

Don't recent Androids do your "frontend protocol" idea? At least a file chooser on my phone, in addition to a regular file browser, also has an entry for "google drive", which seems to have no corresponding physical location.

em-bee · on Feb 24, 2022

The tag-based location of user documents is nice, but why do people want to put it into filesystem layer?

i don't know if the filesystem layer is the best place, but i don't want to loose the tags when copying or moving files.

so somehow this metadata needs to be associated with the file, but, it also should not be in the binary stream of the file. EXIF in images and other similar metadata systems are nice, but any change there invalidates checksums or other attempts to identify changes in the actual file content. (i want to easily be able to see if two images are identical even if they have different metadata, which i can now only do with specialized tools)

bitwize · on Feb 24, 2022

Another thing you'll want for database-centric file stores, that should be table stakes for every desktop OS, is Amiga style datatypes. That is, allow applications to register readers and writers for their file formats. That will help the database parse files for important metadata.

vidarh · on Feb 24, 2022

I keep wanting to write a basic implementation of datatypes combined with a fuse filesystem to allow access to metadata and transcoding from unaware applications, then realise I don't have time and hope someone beats me to it. Please, someone, beat me to it...

d--b · on Feb 24, 2022

This is one of these ideas that always float around. Files should be located by tags, not folders. Or file systems should be relational databases or file systems shouldn’t exist at all, etc.

But the fact is people are used to files and folders. Tools are built upon files and folders so changing everything is extremely difficult.

Plus all the tools that have tried to do things differently proved to be a pain:

1. Gmail tags: does anyone use the tag any diffrently from folder/file. Having multiple tags on an email means it’ll show up everywhere

2. Iphones didn’t have files, but it was so inconvenient it was added back

3. Microsoft relational file system was never released (i think)

kevincox · on Feb 24, 2022

For GMail I often used "non-folder tags" for example I would tag emails based on the to address and they were clearly marked in my inbox. Or I would tag certian types of emails so that I can review them later. For example SMS would be tagged, but I would read them in my inbox.

I just really wish GMail archiving was a tag. For example I get my video subscriptions into a tag called "Videos" but when I am done I remove the tag and that info was lost. It would be nice if Archiving was just adding an "Archived" tag and it was excluded from tag views by default. That way archiving doesn't forget all the tags. The only workaround I am aware of is making two tags for everything like Videos and Videos-Archive. Apply both in filters then just remove one once you are "done" with them.

Folders have the same problem. Of course trash systems work around this by explicitly recording the original location.

barrkel · on Feb 24, 2022

Tagging is work. Fiddly work that's surprisingly costly in effort if it's not trivially automatable stuff like time and date, location, application, and so on.

Naming is hard work, but tagging means creating and choosing shared names all the time, with the pressure that the combination needs to be reasonably unique, otherwise you won't find stuff.

Tagging is also fiddly if you don't have a really good bulk action UI. You can think of the user-controlled paths in a hierarchy as tags, and moving files is the action of untagging and tagging. By moving 100 files from one directory nested three/levels/deep to another, you are removing 300 "tags" and adding 300 different "tags". And you can rename the "tags". A single click and drag, 600 actions, and you can see the before and after trivially, and undo trivially too (at least in Windows).

Tagging is more useful for ad-hoc "favourite" lists, and the occasional cross-reference (but it's work to hunt down the elements in the xref).

sys_64738 · on Feb 24, 2022

If you're running Windows then install Everything.

whywhywhywhy · on Feb 24, 2022

Boggles the mind how Windows search basically just doesn’t work at this point compared to what you have on other operating systems.

Feels like I can’t even search for a certain file type in a folder.

Really frustrating that Apple, the only company to truly master OS search doesn’t seem that interested in making the type of OS that has files anymore.

tanseydavid · on Feb 24, 2022

>> Boggles the mind how Windows search basically just doesn’t work at this point

Now that you've tried Windows Search on your desktop, please allow us to help (or coerce) you to use MS Bing to search the entire world-wide-web. </sarc>

Stratoscope · on Feb 24, 2022

Yep, Everything is a game changer. I don't worry much about where a file is any more, I just worry about giving it a good filename. Then I will always be able to find it, wherever it is.

Be sure to set a hotkey for it. I use Ctrl+Shift+Spacebar since it didn't seem to conflict with anything else.

Of course before you can use Everything, you have to find Everything. Here's where:

https://www.voidtools.com/

jaclaz · on Feb 24, 2022

Only for the record, there is also Swiftsearch:

https://sourceforge.net/projects/swiftsearch/

that uses the NTFS $MFT directly and that (if needed) is fully portable, see:

http://reboot.pro/index.php?app=downloads&showfile=609

_dain_ · on Feb 24, 2022

Everything is a lovely tool, but I'm continually amazed that it should have to exist at all. Why is the Windows built-in search so atrocious? You literally cannot use it as part of a getting-anything-done-at-all workflow. And it keeps getting worse with every update? Why would I want to mix Bing search results with stuff from my filesystem?

tanseydavid · on Feb 24, 2022

"But he has nothing on at all," said a little child at last. "Good heavens! listen to the voice of an innocent child," said the father, and one whispered to the other what the child had said. "But he has nothing on at all," cried at last the whole people.

I agree completely with your sentiment here and it truly boggles the mind.

Thank God for Search Everything.

oliwary · on Feb 24, 2022

This has completely changed the way I use files. I rarely ever open the explorer to navigate to a folder, but instead open everything to search for a file and then instantly jump to the file location. Naming files well becomes much more important than where they are located.

bitexploder · on Feb 24, 2022

Spotlight on mac. I use recoll on my Linux machine and let it update every few days. Fast local file system indexing is amazing and abstracts away a lot of file system pain. You often don't even need to name files particularly well to find what you want if you index file contents as well.

Lascaille · on Feb 24, 2022

> Naming files well becomes much more important than where they are located.

Aren't you just exchanging the location of relevant metadata from the path to the filename?

AceyMan · on Feb 24, 2022

For me,

  > ls -rec . | ? name -match <substring_in_filename>

becomes muscle memory as a pwsh daily driver-type-person (=> PowerShell 7).

DHowett · on Feb 24, 2022

For simple matches against one pattern, you may prefer...

    > ls -rec . -filter *substring*`

... as filtering can be offloaded to the "provider" when Get-ChildItem (ls) knows about it.

Even if the filesystem provider doesn't handle patterns any differently from Where-Object (?), you can save the cost of hydrating FileInfo objects only to query and discard most of them.

For multiple patterns or anything that you'd need a regular expression for, Where-Object is superior!

AceyMan · on March 3, 2022

Thanks, I'm not usually doing performance required stuff but when I write a function I always want to know the best way to feed the pipeline, I appreciate it. #snoverville

tremon · on Feb 24, 2022

aka

  find . -iname \*<substring_in_filename>*

(=> V5 Unix from 1971, although the case insensitive -iname was never standardized)

bobsmooth · on Feb 24, 2022

Have it pinned to my taskbar. There's also Windows PowerToys that has a quick launcher.

tkot · on Feb 24, 2022

And if you are running Linux then fsearch is worth giving a try: https://cboxdoerfer.github.io/fsearch/

istillwritecode · on Feb 24, 2022

It's time to wrestle control over files away from users. /S

UltraViolence · on Feb 24, 2022

What's wrong with adding metadata to each file and indexing that? I thought this was essentially a solved problem.

Also, the OP solution merely sounds like a slightly altered filesystem. I thought he was going to propose something akin to WinFS, Microsoft's ploy to merge an SQL database with a filesystem, but it turned out to be a dud.

beardog · on Feb 24, 2022

This looks pretty neat (though i will not easily give up files). The author seems pretty frustrated in another post that few people are interested. I am willing to give it a look over at the design+play with it, but i struggled to even find the website, and on said website there is no way to download the software. I only found a sample data archive.

https://didgets.substack.com/p/what-is-wrong-with-you-people...

theamk · on Feb 24, 2022

In the comments here, people named dozens of similar systems. I am sure that in the previous discussions (that must have prompted that frustrated post) the same happened. Author must have read them.. and then went to write:

"I have invented an entirely new way to store and manage all kinds of data"

There are no references to other systems, no comparisons. Did he just ignore all the previous work? Having "digets vs X" table and a section why this time it would work will do a great thing to this projects' credibility.

didgetmaster · on Feb 25, 2022

I apologize for the missing download file (DidgetsBeta.zip) on the website www.Didgets.com It seems the latest upload failed, but I fixed it.

projektfu · on Feb 24, 2022

Data organization is constrained by the worst system, because of interoperability. How do you move these tags across the internet through systems that don’t understand them?

For example, the Mac had “file types” and “creators” as separate metadata since the beginning. Because type wasn’t encoded in the filename, mistakes weren’t made that accidentally changed the type and you didn’t have multiple files of the same name differing only by extension. The file always opened in its creator but power users could easily change the creator. To make a successful round trip to another system, the file would need to be given the right extension and then another program would need to reassign the file type and creator on reentry. If you didn’t do it right, people would complain that they couldn’t open the document.

In addition, experience shows that organization must happen automatically or people will just let it do whatever. At this point, most users probably have all their documents in one folder and all their downloads in another. If they weren’t indexed automatically, they’d just give up and say they don’t have the documents anymore.

Come up with an intelligent way to organize automatically and it will be a real revolution. I’d like to be able to find that photo I saw a few weeks ago when I need it. I want all the documents that are similar to the one I found that isn’t the exact version I wanted. I want all the photos taken in Brazil as well as unlabeled photos that might be Brazil. I want the EPS version I have of this jpg logo