Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The time has come to replace file systems (didgets.substack.com)
177 points by pabs3 on Feb 24, 2022 | hide | past | favorite | 295 comments


Reading this article left me wondering just where the 200 million tags this guy needs are supposed to come from. Manual curation?! Automatically derived by file extension? file headers? what is the cost of opening a file, parsing its filetype, comparing against a reference, writing it to a database, etc. How is that cheaper than current indexers (which all seem to work fine btw)?

I rarely waste effort trying to remember filenames in the first place, much less needing some expensive tag curation to locate files. I simply use a bit of discipline organizing the directory structure(s). If I do ever need to actually search for something, it will be constrained to a narrow subset of directories and ignore the other 199.9 million files or whatever.

Moreover, I just don't have the problem of searching for filename fragments to begin with. Nor do I see a reasonable way to use a whole host of powerful unix techniques with a whackadoodle tiny tags filesystem. Or the need to produce a list of 20 million images in 2 seconds. What use would that be anyway? I'm not going to read a list like that - I'm going to operate on it.

Please correct me if I'm wrong, but the versatility of `find` is far more powerful if you actually need to handle/sort through that many files, and something like `fzf` probably curtails all these complaints in the first place.


If I had a penny for every time someone on HN responds with something like this - "just become more disciplined and you don't need X" - I'd be a millionaire. Doesn't matter what it is, type systems, memory safety, a better UI for Git… there's always someone ready to chime in with how their workflow means these problems don't happen, or, even better, asking the question why would anyone need this?

Yes, why would anyone need better search or a faster, easier to organise file system? I can't think why.


A better search and tagging can be valuable tools. But no matter how good search gets, it will not stop users from putting files untagged into one big junk folder.

Being able to think about how to order your files is a fundamental skill in this day and age and doing this on a big scale does indeed require discipline.

IMO it is just a false hope to think tags would help with the root cause of a lack of care about the data.


Being able to think about how to order your files is a fundamental skill in this day and age and doing this on a big scale does indeed require discipline.

I'm not sure that's true, because no one does that on mobile devices. Some people have even suggested that young people who've grown up with mobile phones struggle with filesystems because they have no experience of file management despite having plenty of experience of computing.

Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.

The same is true to a lesser extent with online office suites. You don't need to know the name of a file in Google Docs - you refer to things by their titles.

Moving from file names to tags, or any meta data really, would be possible. Whether it'd be better is a matter of opinion.


> Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.

I think the way Android does it is completely the wrong way around, as it makes it completely centered around apps, not documents. Making you a slave to the App, which in turn gets used to force you into using cloud services. It goes so far that you don't even have control over your files anymore, if you delete and App, the files created by that App will get deleted with it, you don't even get a warning.

I rarely use Android, but every interaction with it has been god awful. And from what I hear new version of Android will start making tools like SSHelper impossible, so you can't even workaround the madness anymore.


How is an iPhone (or other mobile OS) better?


Android has had file explorer capability for over a decade? iOS hasn’t even had it for 5 years?


There are two main places where Android apps store files: within an application-private slice of the main filesystem, and the shared /sdcard (which, as its name implies, was originally a removable SD card, but nowadays is just another slice of the main filesystem). What the parent is complaining about is the former (and a per-application directory on the later), which is removed whenever the application is uninstalled (or the user tells Android to clear the application state); and unless you have root on the phone (or are looking at the per-application directory below /sdcard), or the application explicitly exports it, it's not even visible to any file explorer.


> Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.

That is an inherent weakness of mobile OS's and prevents them from competing with traditional computers in workplaces. It works as long as you never want to do anything complicated with your computer, but most people working in offices needs the organization offered by a file system, not to mention how crippling this would be to use as a development environment.


We have systems, but they don’t need to be “files”.

On development environments: we manage all our code under version control, our IDE index all the sources and most of us navigate by pretty unusual ways. I guess we would be fine without any direct access to the fs as long as we had a layer that gave back streams of files we look for and a way to commit changes to git. For a lot of devs I think everything is already abstracted by their IDE.

A lot of people work the same. For instance navigating exclusively through links sent to them, their office app’s “recent documents” and the “open” dialogue with the folder they stored all their docs and might not completely understand where it is exactly (except if it’s straight their “Desktop” folder). I think a ton of adults get by with that on their everyday job.


I’m still largely pro-file-systems, but your comment made me think. Here’s some loose thoughts, just to get into the problem space.

I navigate my codebase at work primarily using the name of the entity I’m aiming to inspect or work on next (e.g. “Popup” or “uiEventStream”)

- usually using fuzzy search. This matches by file ame, but feasibly could operate by entity symbolic name to the same effect

- increasingly using VSCode’s “find references”, which already operates by entity name (at least that’s now the UI appears)

However.. I also use the file tree, because important and meaningful application structure is encoded in the tree. The tree (and its node namez) gives me sections of the app, collections of entity types, and hints how they’re connected. This is invaluable. It helps new colleagues learn the application structure and it helps old hands get to what they want faster. It forms a “silent” background context against which all entity-based decisions get made.

The structure could be encoded as tags, with all the files dumped in a single directory. I have yet to see a tagging interface work as well for tag hierarchies as a directory tree works.

Tag hierarchies are a specialised use, or extension of generalised tags. Tagging systems typically emphasise (through UI and explanatory notes) the unstructured approach. Structure and unstructure are basically opposed so making a single UI work for both seems problematic. Educating users, most of whom wont use the word “taxonomy” in daily life, how to use a tool supporting a model with an almost inherent self-contraction seems like a mammoth task.


To add to that, there is IMO a renaissance of “fuzzy” navigation, with macos’ quick search (the Sherlock/quicksilver clone), and tools like Obsidian spreading the model of moving to apps and file by a set of partial keywords.

I agree with you that a tree structure is important, the same way people still look at trees to navigate pages on most sites, going through categories, sub-categories etc.. More and more the tree is just dissociated with the actual representation, the same ways urls don’t exactly match the site structure on so many sites.

I’d imagine the tags would have some hierarchical relations if we were to use tags exclusively.


Many programming languages have a tagging system for that: namespaces, packages, etc.

Often they have a close mapping to filesystem, but with IDE support that isn't strictly needed. (In reality however it currently is, as the filesystem is the language agnostic common interface between version control system, IDE, etc.)


I guess we could have "Emacs files" and "gcc files" and you could transfer one to the other using a Downloads folder or something? Pretty convenient.


>Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.

... and completely cripple the user's creative powers, making him a passive consumer. you cannot get serious work done on mobile and that is as true now as it was ten years ago.


> Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.

It's ok when you have a file type exclusive to an app AND the app provides export functionality. But it breaks as soon as you need to share files between apps.

Arguably, this is more about implementations than the principle, but ask any musician about the IOS music apps and they'll tell you they're great... except the file management.


> Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.

File browsers got very popular very quickly on Android and there is a bundled one on there now. So mobile has shown that despite the designers attempt to deprecate files, it didn't work out.


I strongly doubt that regular users ever bothered to use Android's file system. If there are even files involved, most people will just use the app that received or created that file to interact with it. Especially since apps sometimes put files in completely random places, that even I as a technical person have trouble finding.


I imagine the number of users who have opened the file browser app is a low single digit percentage of Android users. Maybe even less than 1%.


So 3 million people in the US alone


1% of American Android users, not 1% of Americans. Far less than 3MM.


330m Americans, almost all have at least one smart phone, about half are androids, still taking millions of people, hindered of millions world wide.


I am the 1 percent. Yay.


> I'm not sure that's true, because no one does that on mobile devices.

Even my mother sorts her pictures into gallery folders. Granted – a lot of sorting on mobile happens automatically (per app).

But "consumer devices don't need an accessible file system" is not a good argument to extrapolate that to machines people use productively. Don't get me wrong, I do think we can improve filesystems in terms of usability – I just don't think having some of it in your head will go away any time soon (and if it does, it will not be an improvement).

My point is, that in a productive environment the filesystem becomes part of your brain, just like a carpenter's workshop becomes part of their brain. This is not a bug, it is a feature. You don't need to think about where things are, because you arranged your environment in a way that suits the tasks you are doing 99% of the time. Now if someone came in and arranged the tools for you, moved them around automatically by their own logic, chances are that it doesn't fit your current task, your personal preferences, etc.

Moving from a world where you blindly know where something is, to one where you have to guesstimate what another entity "thought" would be an appropriate place for the thing they are looking for is not progress. If you were to make a automatic system that can read thoughts and put the file precisely in the place people are expecting it to be – that would be an improvement, but everything else not so much.

The key difference for mehere is the one between productive work and consumption: If you are in a space where you are consuming (e.g. a food on a buffet) it is totally acceptable to not have it your way. Who cares if it takes you 5 seconds more to find the balsamico for your salad? Tasks that you don't do productively like looking at pictures on your smartphone – who cares if it takes you a minute more to find a thing? But if you are a professional photographer and you look for that one picture you took in a specific session 4 years ago not a lot will beat a well built folder structure.


> no one does that on mobile devices.

Because you don't really have valuabe data on a mobile phone. It's mainly just photos, and they are all in one folder ordered by date. So adding tags to that is a feasible strategy.

Everything else on the phone most people don't consider as permanent data, so it's not worth organizing it. You contacts are in the cloud, so are your chat logs,... And app configuration data can always be recreated with some effort.


Let's remove street addresses altogether, because that requires hard memorization, right? Instead let's all put any house of a city all along a contiguous space and refer to them via a description of their appearance.


I don't disagree with your premise that mobile abstracts away the file system. However many people do put rather a lot of effort into organizing files on mobile. I clean and sort my downloads folder just like on my computer, but more importantly the majority of people I know use folders to handle the now thousands of pictures we generate on mobile.


> Some people have even suggested that young people who've grown up with mobile phones struggle with filesystems because they have no experience of file management despite having plenty of experience of computing.

Currently teaching introductory programming at college level, can confirm.


> But no matter how good search gets, it will not stop users from putting files untagged into one big junk folder.

I personally don't see folders (or traditional file organization) and tags as competitive technologies. I think they're complementing each other very well. I generally put my stuff under well defined folders, but tag the notes (or the files if I have the capability).

95% of the time, I can just go to the folder and get what I need, but sometimes I need to search something which I don't remember whether I have it or think I misplaced. In that case file indexing and search really comes in handy.

I apply these methods at least in three places very enthusiastically: Pagico, Evernote and Tiddly Wiki. Both have hierarchical organization models (It's fixed in Evernote, not mandated in Pagico, and Tiddly is just free floating by nature), but they're meticulously tagged. I rarely use search in either of these. However this doesn't mean tags save me serious time or effort. I think both ways of organization is very useful, at the end of the day.

As a pet peeve, I really don't like this strong worded titles and posts. You don't have to kill something working well to enhance it with something to make it better for some or every use case.


I’m on the side of peoples who gave up on organization entirely, and have a stream of scanned documents with only the scanned date as the title for 99% of them.

The docs are OCRed and I retrieve them mostly by search, with tags for a few critical docs and by approximate dates for the rest if search by content fails completely.

This is viable, and there’s no way I’ll go back to manually setting up tags and names on all the docs that we scan in case it’s needed some day. It’s like asking everyone to do inbox zero with their mail, why would you put your time in the hand of uncontrolled external forces feeding you more info day after day?


> But no matter how good search gets, it will not stop users from putting files untagged into one big junk folder.

The difference is it's more likely the user will notice other tags the apply in each thing that would have been moved to 'junk' then if they had to more coarsely categorize all the junk in advance.


I’m all for optimising usability but there comes a point where one over optimises. While I love the app-orientated design of Android and iOS (ie you share data between apps rather than apps sharing a file system), they are effectively toy OSs for toy devices. Sure some people are hugely productive on them these days but they’re the exception rather than the norm. Whereas I depend on a file system to organise my data. In fact the file system is directly interpreted as name spacing in a number of different programming languages.

I get that most people are either too lazy or too technologically inept to own a computer but this race to the bottom to support everyone who doesn’t give a crap has to end somewhere. You see it with Windows UI in Win11 removing popular options so the designers can streamline the UI. You see it with this article too. Some people are always going to struggle simply because they are forced to use a computer day in and day out rather than them wanting to use it. But designing a unified system pandering for them but servicing everyone just makes the experience shit for those of us how genuinely know how to use a computer and depend on these features.

To use a car analogy (because for some reason people love comparing cars to computers…) I have no issue with track cars being sold without air con, a radio, etc because they’re a toy not a tool. So you optimise for that single purpose: racing on the track. But I sure as hell want the kitchen sink thrown into my family car.

I sometimes wonder if the problem isn’t computers but rather our assumption that everyone should be able to use a computer without training. If your job depends on using a file system correctly then you should be trained on that in exactly the same way that you’re taught how to use any of the specialist applications. In fact pre-computers, companies did exactly that with training their staff in how the filing system works!


By analogy: if you were to store all your paper copies of bank statements, bills, mortgage papers, etc... Would you just dump them in one big pile in the middle of your living room, or would you sort them into vaguely themed folders to impose some organisation? Hierarchical filesystems are valuable to those that want to organise data in ways that tags can only emulate.


Can I shout "Plumber invoice 2021" at that pile and the right document will come flying out? Or just "invoices 2021"? A pile that could do this would probably be fine for me.


The difference is that if a file isn't sorted in a file system, it just sits on your Desktop or Download folder and is still easily visible. If you use tags and your file doesn't get tagged properly it just is lost in the pile and impossible to retrieve.

I find tags work very well for discovering other peoples content on the web, but really don't help much with organizing your own data.

One operation that seems especially problematic with tags is copying. If I want to modify something on a file system and keep a backup, I just copy the directory tree before I do my modifications. If I copy something with tags, I end up with two things with the same tags, which is not very useful for keeping them separate.

Also how do you deal with removable media (USB, DVD) in a purely tagged system? What if the tags on the media conflict with your own tags? Once you allow filtering by device, you are just reinventing the file system again.


The solution should be a combination of folders and tags. E.g. imagine a folder that just contains all your photos without any substructures. It would be easy to just select e.g. all "2018 photos", "birthday photos" or "photos of your parents" in there without needing specific subfolders for those things (especially since those subfolders would conflict with eachother).

But I agree that I wouldn't want those pictures to be mixed with e.g. other random screenshots or drawings that I made, even if they could be separated by tags somehow. So folders as a hard separation would still make sense.


> "The difference is that if a file isn't sorted in a file system, it just sits on your Desktop or Download folder and is still easily visible. If you use tags and your file doesn't get tagged properly it just is lost in the pile and impossible to retrieve."

Wouldn't a shortcut to a view of un-tagged files sorted by recent basically serve the same role as an unsorted Downloads/whatever folder?


Anecdotally, my short answer to this is 'no'. At least a Downloads folder has a loosely defined purpose. I have often gone back to try and find something I downloaded that's no longer available (or too big to download again), and it's tricky at best to find something already.

Now, imagine how many files are being shifted around on a regular basis. Temp files, cached downloads, automatic installs of software updates, all sorts of crap your IT department may remotely put on your work laptop, etc. Sorting by date isn't all that useful. At least with folders, we have an 'enforced' blast radius if random junk shows up; my temp files stay in my temp folders, install files stay where they should, system files stay with my OS, etc.

And don't get me started on how things can end up in odd places on mobile devices OSs.


The lost file might not be recent and it might not even be untagged, it might just be tagged incorrectly (e.g. `vacation-2019` vs `vaccination2019`).

The nice thing with a hierarchical directory structure is that every file has a place, even if a file is misplaced or misnamed, there is a good chance it will be near where it needs to be.

With tagging you don't really have that, it's just a pile and you have to hope that you can remember a query to make the file show up again.

The biggest problem however is that I don't see how you can actually work within a tagged system. How do you extract a `.zip`? How do you copy a file? How do you deal with removable media (DVD, USB)? Finding a file and handing it over to an app is not the only way we deal with files.

Your local file system is a work environment, where you are the one creating and modifying files. Tagging seems to works best when it comes to exploring around other peoples content, but that kind of exploration is not something I do on my local machine with my own files, since I already know where I put them.


As long as you know exactly what's in the pile I guess? A good folder/directory structure tells you pretty quickly what you've got available and makes it easy to browse and explore even if you've got no idea what you might find going in.


A directory structure : Invoices > 2021 > Plumber is easy to browse and makes it easy for a machine to retrieve it as well.


It gets annoying though when you have just single invoices or documents that don't quite fit in your structure. I eventually started throwing all received documents in one folder and used the name as tags basically. E.g. 20210214_plumber_invoice.pdf or 20181204_someshop_invoice_playstation.pdf... (And yes I use the filename for tagging because I don't trust e.g. Windows tagging system to be there forever or be copied properly onto other operating systems.)


Please can I have all the Plumber correspondence from Joe the Plumber, regardless of year or document type?


Do you mean Joe the Plumber from London, UK or Joe the Plumber from London, Ohio ?

We all know how this ends up. It ends up being like Google where the search engine uses word embeddings and the like and removes word from your search queries or replaces November by December because they are both months so you can substitute one for the other right ?


Don't care, just give me both. Since I've never been to London Ohio I don't think it will be a problem.


Modern shells have double-star globs for traversing arbitrary numbers of layers, so I would do something like this with:

  ls **/*Plumber*


Thanks, you just got me all the docs from Samantha the Plumber and Amit the Plumber too!


And tags don't help with that, when you just tagged everything with "Plumber".


They could help if the tags were 'Occupation = Plumber' and 'Name = Joe'. Now a search for all files where both tags are present will get you Joe the Plumber's invoices. If you want everything from any plumber or from any person named Joe, then just leave off one of the tags from your search. It is very much like when querying rows in a relational database, just adjust your WHERE clause.


You just summarised the whole argument : you can go very far with a well structured relational database.


I agree. Unfortunately, right now the 'well structured relational database' is completely separate from the file system. Didgets was designed to combine the two into a single coherent system so that you can't update one without the other. By 'combining' doesn't mean I did what WinFS tried to do and just take a filesystem and a database and stick them together somehow. I built a completely new system from the ground up that incorporates traditional filesystem features (block allocation, stream management, metadata control, folder hierarchies, etc.) with solid relational database features (schema, tags).


If your memory is anything like mine, chances are you momentarily don't remember the word "plumber".

Hierarchical structures, while inflexible and sometimes prone to mis-categorization, provide navigational cues that tags don't provide. It's almost like with GUIs vs CLI - if you know what's already possible and want to express yourself precisely, you want a CLI (tags with lots of Boolean operators to precisely include/exclude). And conversely, if you don't know what's already possible, but could figure it out if you have the options laid out in front of you, then GUI (a hierarchy with all choices already laid out) will be more relevant.


brute force grep -R for the win.


I keep passports, degree certificates, deeds, health insurance docs.. the things I would grab if I was running out the door, in a single box file.

Everything else is basically unsorted, maybe vaguely sorted by date of putting on top of the pile, by placing things together after searching for them once, or by 'I think I know where I saw it'.

I have tried putting everything in themed folders, it's a waste of time. The time spent searching for something is much less than the time spent organizing everything in advance. The modal piece of paper will be thrown away after a few years without ever having been needed.


Is that not effectively a first-level hierarchy with no further subdivisions? The "important stuff" category and "everything else" category are already a useful taxonomy, even if very minimalist.


One of the biggest problems with folder hierarchies is that files can often be classified in several different ways. To take your paper statements analogy, do you organize by year, by institution, or by category? What if you have a 2002 bank statement? Do you put it in the '2002' pile or the 'bank statements' pile? Using existing file hierarchies allow you to store the digital document in the '2002' folder and then create a hard or soft link in the 'bank statements' folder, but that can be a hassle. Tags allow you to attach them to documents, photos, videos, etc. without worrying about how you might organize them. Luckily, Didgets lets you organize your file using either a hierarchical folder structure or just by using tags. It is your choice.


To me, it seems best to exclude as many possibilities as I can during each step of the (naturally recursive) search. Filtering by "is bank account statement" excludes a lot more files than "was incorporated into my files in 2002", since most people only have a few bank accounts but a lot of photos, videos and other things that they create or download in a given year.

I think the best system is actually a mix of hierarchy and tags. Top-level, very broad "semantic zones" (aka is this .PDF a bank statement, a cake recipe, a textbook, or some temporary file from the browser cache) would lend themselves to being represented as a shallow hierarchy, and items within a specific semantic zone could be then freely tagged or further subdivided into a hierarchy, whichever approach makes sense for that particular semantic zone.


You assume that there are fewer bank statements than 2002 files in your argument. What if you loaded in a million bank statements in 2005 but only created a few thousand files in 2002? With Didgets, I can tell how many objects have each tag attached so I order the search to eliminate based on how likely the set I am searching for has each tag.


How many bank accounts does an average person have? Even for the most extreme cases, we're looking at the low hundreds of statements annually, at the maximum. If you're not an average person but instead a business or an archivist, then you need a custom system anyway.

I'm really not trying to criticise or diminish the value of your system. All I'm saying is that even without an additional tag (or hybrid tag+hierarchy) overlay, a hierarchical system can be quite useful as long it's well thought-out by the user.


Having every file pollute a global namespace seems to require more discipline than the current hierarchical system where you can easily copy a directory tree without having to worry about breaking something else.

That is the main problem with these so called "solutions", they usually take more effort and discipline than the problem they originally set out to solve. The right solution is just to learn the original system properly rather than trying to invent an even worse way to work around it.


Indeed; on one occasion I had to help my wife troubleshoot something on a shared folder in her workplace's Google Drive...

I was shocked by my wife's colleagues extensive use of special characters because they wanted their file to appear first.

The proposed solution won't be any better if the average user doesn't know how to name things properly or how to search for them.


Yeah, you can see the downside in the demo video. When he shows off the search for pictures, there's a random mixture of actual photos and things like toolbar icons and whatnot. Sure, you could fix this by tagging everything and doing a more complex search, but that sounds like a lot of work and discipline, more than eg the guy doing the demo was willing to put into it.


Actually no. I wanted the demo video to be short (4 minutes) so I didn't do a lot of complex searches. I have other videos, but to show everything takes a 20 minute video and I didn't think that was a good length for an introduction.


The article proposes to replace the current file system approach (which works just fine for me, by the way, thank you very much) with something different to solve a problem that I (just like the post you reply to) have no interest in.

Better search? Sure! Improved speed of storage and retrieval? Great! But either show that it is not degrading current functionality or be ready for pushback from people suspicious that their current setups will break. My 2c.


The solution being proposed here seems to also involve a lot of discipline (files aren't going to tag themselves or at least not usefully)


don't you think a tag system would require even more discipline?

what do you think happens if you make a mistake with your tags and/or there are typos in the filename? With a directory structure, you can navigate to the location and see the list of items to quickly identify what you were looking for. It is far more forgiving when it comes to poor organisation or mistakes. With a pure tag system, a file with the wrong name/tags is pretty much forever lost.


Not necessarily. Missing or misspelled tags could be discovered just like a row in a database that has a column value that is missing or misspelled can be. For example if you want all your photos to have a 'Year' tag attached for when the photo is taken, just query for all photos WHERE 'Year = NULL'. The same goes for values like names. If you see that you have 10,000 files that have 'Name = Karl' attached but only one that has 'Name = Kral' attached, then that is an easy fix.


With only a few hundred file, it's easy to look at the list and spot the outliers. In a real world scenario, how would you know that a few dozen files are missing when you search for "karl" and the files tagged with "Kral" don't show up? On a small file collections that only you has access to, you might remember them and notice that they aren't part of the results but that doesn't work for large libraries or if multiple people are collaborating.

With a directory structure at least you can look into the folder of the project, see what's inside and open the files to find the one you were looking for. If you were looking for a specific Word file and only a dozen of them are present in the folder, you can always just open all of them manually to check what's inside regardless of how poorly they were named/managed. Good luck trying to find the Word file with bad tags when searching for "*.docx" return thousands of results.


Cleaning data for tags is about the same as cleaning data in a relational database table. Here is a demo video of how Didgets does that: https://www.youtube.com/watch?v=kqkNeU1LYEQ Just think of each defined tag as one of the columns in the table.


If you were trying to find a physical copy of an important tax related letter, would you prefer to search for it in a folder dedicated to tax document from that year or from a room filled with every single piece of paper that you have ever received by mail in your life?

A pure tag system only works for small libraries, it requires far more discipline by properly tagging every single file, it does not scale and it does not work well when you collaborate with other people. It works well in situation where you can automate the tagging (eg. a collection of pirated moves) but is pure garbage for normal files that you typically use.

It's a lot easier to tell people to place pictures of karl in the "karl" folder than it is to make sure that every single picture gets properly tagged with the word "karl". I can imagine hundreds of different scenarios where it gets tagged slightly wrong. Typos won't be easy to fix because they will simply not show up in the search when you type it. How many files with "K arl", "Carl" or " karrl" are there? no one will know.


There seems to be a lot of confusion here about Didget's tagging system. It is not meant to replace the file hierarchy, but to supplement it. With Didgets you can still organize all your files in a plain old folder hierarchy without tagging everything. Tags just provide a secondary way to search for things. So you can still stick all your photos of Karl in a folder named 'Karl' if you like.


Because such a system would entail its own drawbacks, such as a larger CPU load or a more fragile disk organization, whilst most people wouldn't really need it.


This is obviously hyperbole and you're well aware you haven't read 100 million Hacker News comments, let alone that many with exactly the same basic message.

But I was curious what this might be equivalent to in terms of time investment. A quick style guide check recommends 15-20 words per sentence for English language written communication. Assuming the low end of that, and minimal single-sentence comments, that is still equivalent to reading the entire 14-book Wheel of Time series, which tends to take most people several years, 369 times.


I guess the point is, the proposed system wouldn’t actually be easier to organize. The metadata that would make searching so easy is what’s missing. But a new data structure doesn’t solve for the missing metadata. And without that extra metadata, searching would not actually be improved.


> Yes, why would anyone need better search or a faster, easier to organise file system? I can't think why.

Sounds good, but unfortunately the article is not proposing any of that.

So, counter question: of course you need that, but how would the proposal of the article actual do anything about that?


Surely that's the tagging. I use tags extensively on my Mac because it's so useful to me but it's clearly an afterthought for Apple, and I struggle to use it at times.

Making tags a first class citizen would improve things immensely. The search index being a first class citizen again, would also improve things - why should I find Spotlight indexes loitering in the dark corners of my filesystem as dot files? I know there's a file index kept somewhere full of inodes and suchlike, why isn't search index data kept with it?

I also don't know why I have to rely on file system watchers that seem to be external to the file system and thus eventually sucking vast amounts of CPU when a hook into the main index would suffice. I don't write file systems so I can't tell why this is the case, or in fact, if it is the case but appears to me that it isn't every time I need to kill a file watcher.

Most of the suggestions in the article seemed good to me (immutable files, smaller meta data pages etc), I'm sure there are others around, but I'm also not sure why there's a need among some to protect the status quo by relying on good behaviour, of all things.


With Didgets, tags are an integral part of the system. They don't get lost or forgotten when you copy a file from one place to another. Searches are a native part of the system as well so you aren't relying on a separate indexing service that has its own database somewhere else. BTW, managing file data using folders and tags are just a few of the features of the system. I found out the columnar stores I used for tagging, were easily used to also form relational tables. I can load in a 100 million row, 40 column table and do queries against it much faster than the same data loaded into Postgres, MySQL or SQL server.


> I use tags extensively on my Mac...

In other words, you have developed a disciplined habit of tagging your files. If I had a penny...


Where's the "don't need X" part? Have I dismissed hierarchical file systems out of hand? Where did I suggest greater discipline should be the approach.

I also use automated tools to help me with the tagging but I think that it's not a magic bullet - did I claim it was?

No, and I didn't do any of the other things I asked for evidence of either.

So, if I had a penny for every time someone misquoted me I'd have a penny more right now.


i try to organize my stuff, but sometimes i forget where in the organization i put something. then a brute-force search helps. if i keep good directory and filenames, then locate will do the trick. once i found one item, any related other things are usually nearby.


Plus, this is how we organise [0] stuff in real life.

Folders/boxes/envelopes in boxes. Boxes in cupboards. Cupboards in rooms.

It's easier to get to hierarchical filesytems from this. Things are found by their group, or their proximity to a more used item.

Filesystems, search, most-recently-accessed lists, an index; they are close to real life things.

In my fantasy world people would e.g. stick to .jpg, .JPG or .jpeg or .JPEG (pick one, damnit) but otherwise I quite like the tools we have.

[0] or try to


folders are more convenient because they are part of the file system. there is no ls by tag or even a gui filemanager that shows files by tag. that's one reason why tags need to be part of the filesystem, because if they are not, then most filemanagers would not support them.

and technically, file extensions are kind of like tags. and it's really ugly that they are in the filename string. that messes up a lot of things. it would be better if they were proper tags independent of the name. so you can rename a file without changing the tags, similar to the problem with EXIF.

or more importantly, you could reference a file without that reference depending on the tags of the file. your jpg/jpeg example is also a problem caused by this situation. it would go away with proper tags


macOS does store tags in the filesystem (which you can access using xattr at the command line) but I have no earthly idea how you find files by tag or really do anything with them.

The master tag list seems to be Finder-specific preference data though.


This is an example I use to find photos I've stripped the exif data from and tagged:

    mdfind -onlyin . 'kMDItemUserTags=exif-stripped'


ohhh thanks, this is something I needed to know.


linux has xattr too. so technically, our filesystems already support tags.

that means it is now up to the other tools to catch up and make use of them.

here is a discussion about tags and extended attributes in gnome. https://blog.chipx86.com/2005/12/07/tagging-and-the-gnome-de...

it is from 2005, so not really current, but the arguments are interesting.

in short: filesystem attributes are systemwide (but you and i may want to have different tags on the same shared file) and the user needs to have permission on the files, so you can't tag files that you can read but can't change.

i believe these issues are solvable, esp. the latter would work if we have permissions to add tags but not the content of a file. (like you can rename a file even if you don't have write permission to the file)


xattrs can certainly be used to store tagging info. There are a couple major problems with them though. 1) xattrs are not supported by all file systems and they are not enabled by default in some. If you copy a file with xattrs from one file system to another that either doesn't support them or didn't enable their use, then your xattrs are thrown away in the copy. 2)Searching for files based on xattrs in a large folder tree (e.g. several million files across thousands of folders) is exceptionally slow by nature.


right, but the alternative is no support for tags at all, so xattr gets us halfway there, and filesystems that don't have it need to keep up.

searching can be sped up by building an index. apps that want to use tags will need to do that, just like they build an index of files already. because searching filenames is also slow.

a version of locatedb that supports xattr would help for example, see https://en.wikipedia.org/wiki/Desktop_search


> Plus, this is how we organise [0] stuff in real life.

The way I organize people I know and places and all sorts of other entities I cannot physically place into boxes and folders is a lot more like the tag approach, though.


Oh really? I've got a few boxes of guys stored away and, like Mitt Romney, binders full of women.

(*ba-doom-tish* for the-in-retrospect much-maligned Mitt Romney on this day)


Google photos' style AI driven curation maybe?

I like the idea of having a queryable filesystem, but I wouldn't want that as a complete replacement of the directory structure.


Google photos is pretty amazing. I enter a search for "car" and immediately can see the photos os several of the cars I've owned over the years.

One day I needed to remember when I had travelled to a certain city, searched on my Google photos and it instantly showed the photos I took in the city, including the exact dates.

Yes, I know letting Google know all about my life like that through photos may not be the greatest idea... but wow, does the photo search work nicely?!


The google photos image search is amazing. The other day I was trying to remember how long it had been since I smashed my toe doing yard work so I tried searching “toe nail” and it pulled up exactly the picture I was looking for.


Unless it's local, hell no. Sounds like a privacy nightmare.


It sounds somewhat like “gmail for files” which is … problematic because email search works well enough because it’s relatively rarely done.

I suspect a system like this would work, but the tags would eventually be used by many as a way to badly implement a hierarchy.


> It sounds somewhat like “gmail for files” which is … problematic because email search works well enough because it’s relatively rarely done.

Gmail does not work for me.

As someone in IT I get some number of automated messages (e.g., cron). With Gmail all I can do is tag them and have a "folder" / view of just those tagged messages. But they also pollute my Archive 'folder' as well.

But I do not want them there, because they are not a priority generally, and they pollute search results.

I want an actual separate folder to file these messages in that is out of the way so as not to pollute the rest of the namespace.


You could tag them with a special label (e.g. "ignored"), and then append "-label:ignored" to your searches.


Kinda like gmail's nested labels. Hierarchies win again.


> eventually be used by many

And that’s the issue. The status quo may be the best choice for the lowest common denominator. But some power users could get much more out of something with a different approach. You can’t force a one-size-fits-all ontology onto the masses.

People need to wake up and realize that not all software technologies need to be popular to be successful or useful. It seems people around here assume this without even thinking about it first.


My response to these types of proposals is “just imagine that folders are tags and each level of hierarchy is a tag, symlink for multiple tags.”

It’s funny because the author just proposed a different, I think worse due to novelty and minimal benefit, organizing hierarchy.

I think Apple has a decent approach where their spotlight indexes very well (I use hit command+space and the first letter or two instead of navigating finder), and they support tagging files.


When importing files into Didgets, the program automatically gathers information from the source file system and attaches specific tags to each file. For example, the file name is attached as a 'name' tag. Each folder name in its path is attached as a 'folder' tag. The file extension is attached as an 'extension' tag. In addition a SHA1 hash is created from the data stream and attached as a tag. You also imported them by dropping the files or folder onto a 'drop zone' on the create tab in the GUI. Any tags attached to that drop zone are also automatically attached to any file dropped on it. So dropping 100 photos on the 'My Wedding' drop zone might attach the tags 'Event = Wedding' and 'Year - 2022' to every photo. Searches for files that have a tag 'Folder = Microsoft' would find every file that had 'Microsoft' as a folder anywhere in its path.


> it will be constrained to a narrow subset of directories and ignore the other 199.9 million files or whatever.

I think this is a vastly underrated point. I am usually not interested in searching the majority of files on my filesystem. I can't remember the last time I needed to search through system files for normal computer use reasons.

I also think the author completely skips over how to handle related files. If my application needs to load a library, how does it find the file to use? If it's by name, how are name clashes handled? I suppose it could be by tag, with built-in tags, but then you won't be able to change the tags without having to change configs or the binary itself.


The core problem at keeping the files organized is that unless you are dealing with a stream of effectively pre-tagged files, tagging/categorizing/grouping emerges after sufficient number of files arrive. Therefore organizing is proactive


What this boils down to is that he thinks a flat namespace (tags) offers advantages over hierarchical namespaces (tree). They really don't. Once your tag space grows you will start to struggle with naming, and path-like structures (nested namespaces) start to creep back in. And you are right where you started: paths.

The treatment of immutability is too superficial to make any sense of so I don't know what the author is imagining. Ted Nelson has evolved some ideas on this for decades that might be worth knowing about. Some of which have kind of come to pass (if you squint and look at how non-destructive editing tools for video and audio work, for instance). However, very little of Ted's thinking has ever been burdened by usable implementation.

The concept of having multiple references to the same file already exists. So what he proposes can be realized with existing file systems just by introducing a different naming scheme and making extensive use of sym-/hard-linking.

Yes, a lot of file systems will have terrible lookup and traversal performance, but that problem exists in an orthogonal universe and can be solved. Is, indeed solved, in some fileystems if the marketing blurb doesn't lie.

If you think about how you would realize this using existing filesystems, by organizing them differently, the concept isn't as sexy anymore. Because it doesn't really involve a lot of new stuff and you start to see the inconvenience of having to cope with both novelty and problems you didn't have before.

The problems someone like me wants solved in filsystems are entirely different, and aren't so much about filesystems as it is about how you make the functionality useful to applications.

For instance, there are filesystems that offer snapshot semantics. Including COW-snapshots. This would be useful whenever applications need to do be able to roll back changes, switch between states, do backups while being live etc. Yet I know of no language which has snapshot as part of the standard OS interface. So people generally don't write application that take full advantage of what the underlying system offers.


Path based file systems take advantage of natural semantics we use for navigation. There is a wonderful overlap between how you navigate the real world, and how you navigate a hierarchical file system.

I have never (never ever ever) seen a tag based system actually work once you have large amounts of files and tags - Tags are manual, often duplicated with slight name changes or variations, hard to discover, and literally worse than a folder hierarchy for discoverability in almost every way.

Tags can be nice to have - but only if I also have a path. Otherwise they are utterly inferior.


Tags, being one of the most basic implementations of boolean retrieval, tend to suffer from feast-or-famine a lot, at least in my experience. Once you introduce hierarchical tagging, people will just use them like folders with each item having 1.0x tags on average.


The Didget system was designed to allow both a hierarchical folder tree as well as tags attached to individual files and folders. The tags do not replace the hierarchy unless you want them to. If you have never ever seen a system that actually works, then maybe you should put Didgets to the test. I created 20 million files in it and attached an average of 100 tags to each one. Each tag had a value randomly picked among 1000 choices. Queries to find all files with a certain tag (e.g. Tag_134 = Value_875) each completed in less than a second.


> I created 20 million files in it and attached an average of 100 tags to each one. Each tag had a value randomly picked among 1000 choices.

That is really impressive, but when the parent commented that they have never seen a tag-based system work with a large number of files and tags, I don't think they were making a statement of technical capability but of human fallibility.

My experience has largely been identical in both personal usage and in enterprise settings. Every time I've used a system that used human-defined tags as the primary organizing mechanism it has always ended in an unusable mess and in every case it is eventually replaced by some kind of hierarchy which usually ends up being a slightly more usable mess.

Perhaps combining them will yield the best of both worlds and perhaps with enough organizational discipline one can make a tag-based organizational system work. And I'm all for better search. But at the end of the day I am skeptical that giving normal people even more flexibility with how they organize their files will make their lives easier.


Most tagging systems that I have seen are free form. Anyone can just tag something with tags like 'James', '2002', or 'Bank Statement'. This makes it difficult to distinguish between them and easily find things like misspellings. All tags must be of the same data type (string). A generic term like 'Tank' might refer to a water storage device, a military vehicle, or someone's nickname.

With Didgets, I decided to go with a contextual approach to tagging. Just like columns in a relational table, a tag must be defined before you can use it and all like tags are managed together. A tag can have a data type so 'Year' can be an Integer, for example. The system comes with a set of pre-defined tags, but users can easily add whatever tags they might need. That way a tag has the form 'Author = James' or 'Device = Camera'. I went further and decided each tag definition would have two levels. '.person.FirstName = James' might be a tag on a picture of someone named James. This makes it easier to search for tags by group (e.g. find all documents that have '.person.*' tags attached). By managing the tag values together, the UI can quickly show a list of values that have been used previously (and order them by use count). When attaching names to photos, it can show you a list of the most used names and let you pick one or ignore the list and add a new one.

This system is far from perfect. Users can still misspell tags or categorize them incorrectly. But this can happen with folder names in existing file systems as well.

Just to be clear, tags in Didgets do not have to be the primary organizing mechanism. It has 'Set Didgets' that contain the IDs of all members and can be arranged in a hierarchy just like folders. When importing files, the UI creates these sets (unless the user specifically turns it off) and preserves the hierarchy of the source file system.


Those are improvements over free-form tags and I especially like the namespaces for attribute/field names, that's clever. The enterprise software my company makes has decent support for defining and applying structured tags like business terms and attribute values to the data objects our customers manage, and our users do make use of those when filtering and searching.

But (in my experience) many people still seem to gravitate towards storing and navigating objects hierarchically. I can think of a few possible reasons:

First, some people intuitively think of a data element as having a location in an information space. That is, they seem to intuitively remember "where" something is by piggy-backing on spatial memory in a way that tags don't seem to trigger.

Second, navigating a hierarchy involves a sequence of constrained choices, like a wizard. Having a sequence of decisions can be especially helpful for novices. It also generally takes a predictable number of steps to locate an item which can be preferable to something that is faster on average but has slow edge cases.

Third, at each level of the hierarchy you can often display all of the options meaning we can rely on recognition over recall[1].

(You could constrain yourself to hierarchical tags and use hierarchy-like positional language such as "object is in baz" at which point I'd consider it a hierarchy.)

Of course relying on tags has plenty of upsides -- typically faster, better mental model for overlapping sets -- and large-scale data storage systems need both. But at the end of the day they don't seem to be a replacement for hierarchical systems for most people.

[1] https://www.nngroup.com/articles/recognition-and-recall/


>I have never (never ever ever) seen a tag based system actually work once you have large amounts of files and tags

If you've ever done online shopping, you probably have. For example, try going on Amazon or Newegg and searching for a GPU. You're shown a sidebar where you can easily filter results be certain tags such as: brand, price range, memory size, core count, in stock, energy star certified, free shipping, etc.


I don't think these systems work nearly as well as you do.

Simple example right now:

Go on amazon, and search for "intel CPU" - I see the following:

1-16 of 942 results for "intel cpu"

Now go back and search for "cpu", then filter by brand "intel" - I see the following:

1-24 of 835 results for "cpu"

It turns out the tags are exactly what I said they would be - a hodgepodge of things not correctly applied. For example - searching "intel cpu" actually returns items that include intel CPUs (such as motherboard + cpu bundles) that are missing in just the tagged search. But it's still absolutely a valid result if I was interested in buying a cpu.

---

as mostly an aside - I don't really trust Amazon or Newegg to be neutral in their results either, a tagged view is convenient to them as a seller where they can control results.


... and on the top of the list, the first thing you click as a part of the filter, is a hierarchical locator:

Home Components -> Video Cards & Video Devices -> Desktop Graphics -> Cards Search Results: "GPU"

The tags system works for specific areas. For example, tags in photo management apps are great. But they don't really work across separate domains, so what you want is top-level hierarchy, and, where needed, tags for the subtrees. That's how existing tag systems work.


I think one of the problems is that there are many datasets where objects belong to multiple hierarchies and different hierarchies are more efficient for different tasks. For example I work with medical imaging data. Typically that gets organized around the DICOM object models which even defines multiple structures. Typically that's around the patient/encounter model and slight variations of that and data is stored in a database called a PACS, but working with PACS is extremely difficult because DICOM is optimized for clinical use cases. But there are other ways of organizing the data that are more efficient for other tasks for example for quality improvement or assurance or process monitoring. In fact different users of the data are likely to want views based on different hierarchies. Some software expects certain data layouts etc. There are some efforts to standardize file hierarchies and naming for certain tasks, but perhaps you're not doing that task. You can do things like symbolic links, but trees of symbolic links end up being super fragile in my experience and they're not particularly well supported on some operating systems.


I don't think the filesystem is the right place to solve these types of problems. I also think if you try to make the kind of filesystems that solve these problems you'll invariably end up with entirely new problems you really don't want (complexity, performance issues, unclear semantics etc).

You usually want more flexibility and control over how your data is projected to storage. (Say for instance you run out of storage and the scheme doesn't have any way to split the data across multiple filesystems). And you really want integrity constraints that stop you from pointing into thin air and help you clean up. Occasionally you also need to have the concept of identity (how do you refer to a given entity directly) without it being part of a projection that may have stopped existing - like a tag being deleted).


I'm not so sure about that. I don't think anyone really wants to think about that at all. I'm probably spoiled by zfs and OneDrive but I think you just have pools of space and let the filesystem take care of itself. Plug in more space and the system run a "rebalancing" or whatever, etc. Let blobs move into online storage and fetch as needed or whatever. If I want a certain set of data just tell the system to "prepare" it for use.


That supposes that you have a decent filesystem like ZFS and know how to configure it. I run ZFS and I can't even remember how to add a disk and increase the size of a storage pool without referring to the manual page or do a web search. And I only know where to look because I know I'm running ZFS. (If I'm lucky it is in the filesystem command history - I know people who have used ZFS for years who didn't even know ZFS had a command history).

And what if I unplug the disk and pop it into a different machine? Of if you decide to move to Windows?

I do a fair bit of photography. If it taught me one thing it was that if you design software for managing lots and lots of files that craps out if something spans a filesystem border, you'll have a truly miserable time. Photo editing software used to be like that.


I actually use Windows (and OneDrive) quite a lot nowadays. I have very little problem thinking of disk space as a sort of OneDrive cache at this point and basically you have different devices that are faster vs slower. And when that's the case detaching a disk just means removing a copy of the data.

But datasets in general on ZFS are great because you can just mount and unmount them at will. They're there on the disk, but if you don't need them they're not mounted. This is great if you don't want to accidentally modify some subset of data you're not working with. One of my favorite features of ZFS are the encrypted datasets where you can have chunks of data stored with different encryption on the sane disk and all the integrity and migration etc works without needing to decrypt anything. Which is also great because it means ransomware can't touch it. I do think it's useful to have some sort of "chunking" of the data like that which mostly maps to ownership. This data belongs to this client/project, that data belongs to that client/project, etc. Often those divisions have different use restrictions so I've found it great to isolate things that way. And again ZFS is great at this because you can just dump out the full encrypted datasets for handoff/archival or whatever.

But anyway I would point out that filesystem boundaries come from whatever the filesystem implements and exposes to the application. The only real issue with filesystem boundaries is that renames that move files between devices aren't possible. And that's an artifact of that filesystem's design needing to synchronize the hierarchy at a hardware level. If you could call up the same path and it comes from wherever it happens to be then it's not a problem. Like in OneDrive when you open a file that's only online and it needs to suck it down first.


I completely agree on the constraints you mention, but I don’t actually see how these can be solved without the knowledge of the file system.

Let’s say you want to create a tagging system on top of a traditional file system. Let’s say you create a folder for a tag and make it have a symlink to each entry with that tag. Now any single move, delete operation will render your tag lib incorrect and there is no cheap way to correct it at each change.


I think this is just a general lack of software for rare cases like clinical data?

I mean, it sounds like what you have described is solved for photos. I use digikam photo manager, and it automatically discovers all the photos on multiple the volumes, and supports showing files by date, location, tags, path -- whatever you like. And it is not very fragile at all -- it identifies photos by metadata-excluded hash, so moving the photos around does not break the links.

And back in Winamp days, it had an MP3 database which had basically the same properties.

I have no doubt that users could use more and better organization, but it seems like UX problem, not a filesystem one.


The problem is that all the data is not the clinical data. Images are fairly well understood for clinical use, it's all the non-imaging things that need to be associated with the images. It's images and a whole bunch of stuff that's associated with them that people want to do with the data that are non-clinical. For example now there's things like BIDS[1] which are basically applying rigid schema to a filesystem. But that's only because so many people got frustrated with everyone doing their own thing and having to spend time restructuring data to different site workflows. But even that only works for neuro, what about cardiac or liver or spectroscopy or... software that doesn't use BIDS etc? And even with BIDS mostly it's just copy rename to the layouts the software you're running actually expects. There's also a huge push for "Vendor Neutral Archives" to augment PACS which will allow things to be sort of structured and managed but that always seems to be uploading and downloading from websites so people still keep copying everything out because everything is so opaque vs filesystem.

> it identifies photos by metadata-excluded hash, so moving the photos around does not break the links

That only solves the problem in one direction. If you run across an image after it's been moved, you know which image it is and can index into the database based on that. But if you want to find an image starting from the metadata after it's been moved, then you're stuck trawling everywhere. The metadata-hash thing exists in medical images DICOM as various GUIDs (in a different format), so you can track things and updates that using that key. But if you have to visit 30 TB of files just to find one that's been renamed or updated, it's basically impossible.

[1] http://fmri.ucsd.edu/pdf/BIDS_Presentation_14NOV2018.pdf


Couldn't that still be stored in a traditional folder hierarchy, but all in one folder and keep track of the tags related to file names in a sqlite database?


I think the point is that if you're just using the filesystem as blob storage and interacting with a layer database, then you've actually moved beyond the filesystem. Just replace filenames with inodes and that's what you've done.


But that's exactly what many applications have been doing for a long time, they're using S3-like services as CAS or quasi-CAS and generate views for different workflows on the fly from a separate database.


Why shouldn't that functionality be more user-friendly and accessible to users?


It may be worth taking a step back and asking who the users are.

I know that like most folks on HN, I am not your average computer user so my view is probably a bit different. With that out of the way, I can say that I firmly prefer folder hierarchies for organizing my files over tagging. I switched to MacOS when I started my new job, and the fact that tagging is such a integral feature out of the box over plain folder navigation irks me. It's easier to save a file to the correct spot in the hierarchy and find it later rather than searching by tags to me.

Perhaps my view isn't consistent with a lot of computer users, but I will say that most people understand how hierarchical filing works better than tagging as one is more spatial than the other. After all, there is a reason that the memory palace technique for recall relies on imagining a physical space.


A hierarchy is actually a tag, so really what you're saying is you firmly believe in only having a single tag for each file and a specific adhoc schema for those tags. I think once you've used a filesystem that behaves in more complex ways you start to realize that's not necessary. For example with a snapshotting filesystem you add dates and times to the path in order to access older versions of files. I don't really use MacOS but I have noticed the tagging in my limited experience with it, which seemed reminiscent of what the old MacOS used to do where you could set icon colors. So whatever they're doing for tags isn't really the only story. MacOS is a good example though because that is a filesystem that maintains separate metadata in addition to the file contents (they used to be called resource forks, but I'm not up on modern MacOS). Many linux filesystems also can support that sort of thing. But those are just flavoring the files and not really adding different ways to index into the filesystem. A hierarchical index such as nested folders is probably a minimally viable solution and is very useful. But that doesn't mean it's not limiting and that there are not different larger solutions that achieve more useful results. Once you start thinking about snapshots and transparent de-duplication and other ideas like don't-repeat-yourself things wind up becoming far less clear.


I'm not sure any of this really addresses the question - which is how do people really use files.

IME I have a number of live projects which can contain various numbers of source files, images, web links, PDFs and other documents, text files, and so on.

Then there are a number of files I access regularly which may not be associated with a project (like favourite music).

Then there's a mountain of data which is just there in case I ever need it. It includes backups of old projects, documents, music and art I keep because I think it's interesting but haven't read yet, web links that are filed and then (sadly...) forgotten, and so on.

I don't know how typical this is, and it doesn't matter. Because neither a tag based nor a tree based system address the real issue - which is designing a custom file workflow that collects related references of all kinds, doesn't confuse working data with long-term storage, allows off-site backups, allows collaboration, supports versioning on demand, and also makes it easy to find things.

I suppose all of that means some kind of process API which does a lot more than file.open() and file.close().

It could be built on tags, it could be built on trees, it could be built on some combination. Or on something else entirely.

The implementation matters a lot less than a set of available features which streamline common tasks in some fairly standardised and effective way.


> The concept of having multiple references to the same file already exists.

It does and it's really bad IMO. The author's suggestion of unique identifiers though would introduce all sorts of new problems, primarily it would make the transparency problems of existing systems worse.

Most applications rely on the location of a file, relative or otherwise to load data (e.g. configuration). That reliance is exploited by software engineers to implement configuration swaps, event processing, and many other features. Referencing files based on UIDs, or a series of tags that aren't guaranteed to be unique or not known to be off limits to regular users, would introduce all manner of complications.

I could also see it being terribly easy to introduce bugs loading files using filtered tags. Would applications need to have relative tags to mitigate these problems? Having unique paths works both as a filter for the user and an encapsulation for a system that allows you to localize your concern. Without that encapsulation by default, you will be spending a lot more time and concern dealing with files and tags.


There is a lot of stuff that should NEVER appear in a mixed view. (Google is full of examples of that.)

Tag coulds and other meta data can still be very useful. The challenge is creating useful tags/meta data automatically. For example a time stamp for every modification or a label for every application that created, modified or loaded the file. Perhaps even the applications you were using when the file was created/modified and the file names of the files loaded into the application. Train some AI to show you files you probably want given your current activity.


> And you are right where you started: paths.

A big difference is that one can naturally have multiple tags, and an entity could share tags with other entities.

Sure you can use hardlinking when it comes to files, but it's tedious and you can't have multiple files hardlinked to the same path.


Directories of links.

And yes, even with a layer on top of the FS to provide an abstraction (as an API, for instance) so you can build shells and applications, a tag + search based system would quite possibly be tedious to use.

I also don't think 64 bit ints provide a good way to definitively name things. Most people can't make sense of a list of 10 ints, but they will be able to remember at least where to look if you give them full paths.


> Directories of links.

Well you'd still have to guarantee uniqueness of the filename within the directory. For example I have several files ala DSC00005.JPG which are not identical, because the camera reset the counter every now and then.

> I also don't think 64 bit ints provide a good way to definitively name things. Most people can't make sense of a list of 10 ints, but they will be able to remember at least where to look if you give them full paths.

I agree that 64bit ints is not a stellar solution. If anything it should be an something like UUID, so it can be unique across filesystems, and something the users shouldn't normally have to deal with.


The easiest way to do this is to use the inode for naming the link.


what's wrong with tags on top of a tree? Do tags even need to be part of the filesystem?


Maybe not part of the filesystem, but part of the OS API, so that every filebrowser can support it.


I think it could be solved outside the OS, but the challenge is that you would need to define some common APIs and get them into the standard libraries of programming languages. You would need a service API so that you can plug in the tagging service backend of your choice (or something that comes with the OS).

There is nothing wrong with having a userland tag management service. In fact, you'd probably want it in userspace if possible.

Implementing a proof of concept for this would have been easy if it wasn't for the fact that getting dirent to inode is fast and getting from inode to dirent(s) is very much not fast (since there is a risk that the file may be renamed).


Part of common GUI toolkit, you mean?

The filebrowsers are not part of the OS.


I was just pointing out that tags bring you something filenames don't.

Sure you could put tags on top of a filesystem like we do now. It's slow and require per-application support.


> but it's tedious and you can't have multiple files hardlinked to the same path.

Little trick I learned to help sort images: Make a copy of the file in as many locations as you like, then run something like borg backup. One file, hardlinked in as many directories as you want.


The problem of hierarchical file systems and data location is a really old problem that has had many implementations (I even tried building one many years ago).

Somewhat related:

Tagsistant https://news.ycombinator.com/item?id=14537650

TMSU https://news.ycombinator.com/item?id=11660492

BeOS File System https://news.ycombinator.com/item?id=17468920

TagSpaces https://news.ycombinator.com/item?id=12679597

git-annex https://news.ycombinator.com/item?id=29942796

Names should mean what, not where https://dl.acm.org/doi/10.1145/506378.506399

Unfortunaly it's not easy to get a real solution, and many people don't think that there is a problem at all (based on some comments in this thread).

Now adays I use git-annex, though it does have it's perks it seems a step in the right direction.


The pattern that has worked out really well for me, is to just organize my data into specialized collections, and not worry too much about the underlying filesystem.

I can't believe how much trouble I went through trying to find the filesystem that could do everything.

I mostly use git-annex, and various tagging systems, or just git repos. Now my data is much more portable and flexible. None of these tools are perfect, but I'm using tools that are mostly good at the job.

Whatever problem you are trying to solve, you probably don't need to solve it for your entire filesystem.


Also related: https://news.ycombinator.com/item?id=29141800 (discussion of differences between hierarchical and tag based file systems)


Also related (learned this from HN a couple of weeks ago):

SuperTag https://amoffat.github.io/supertag/


Systems that try to get rid off the "files & folders" abstraction of a file system tend to have much worse usability, in my opinion. I have an iPad Pro, and the lack of true file system abstractions is so painful. Every app has its own way to store and retrieve data, there's almost zero interoperability and it's super painful to copy, paste and move stuff around (I know it has gotten better but it's still so much worse than on any desktop OS).

I'm all for enriching the concept of a file system with additional meta-data (in fact many files do that) but I don't think that needs to happen in the file system itself. For example, software like Picasa leveraged meta-data contained in files to provide a new way of interacting with large number of photos. The author basically proposes to put such functionality directly into the file system, but I'm really not sure if that's a good idea. Right now it's easy to move files between different systems, e.g. from Mac to Windows or Linux. If file systems become meta-data management databases that will become much more difficult.


> Systems that try to get rid off the "files & folders" abstraction of a file system tend to have much worse usability,

IMHO it's because those systems just simplify, but don't move very deep in the space they opened up. If you don't offer power, then it's irrelevant which system you offer, they will all suck fast.

> Every app has its own way to store and retrieve data, there's almost zero interoperability and it's super painful to copy, paste and move stuff around (I know it has gotten better but it's still so much worse than on any desktop OS).

Which is kind of a surprise, I would think Apple would be interested to unify that space and offer a good user experience.

> Right now it's easy to move files between different systems, e.g. from Mac to Windows or Linux. If file systems become meta-data management databases that will become much more difficult.

Theoretically, it could be solved by using a meta-file-container. Something like a tar-container, which contains a file for meta-data and the actual content. We have this with specialized container-formats in media and office-filetypes. Making a universal format which would work equally well for any kind of file type could solve this problem of interoperability. This would even open up ways to improve files without changing them directly. Like adding subtitles or notes to a file, by just adding it to the container, not the file itself.


App sandboxing on iOS is supposed to be a security feature, I think? But it makes it impossible for apps like Obsidian to work with apps like Dropbox, which benefits Apple; they force people to use iCloud.


macOS has so much of the prep work for this kind of thing, but Apple has completely dropped the ball on the UI.

Spotlight parses and indexes all the existing metadata in your files (music ID3 tags, photo EXIF tags, etc - run `mdls` on a file in a terminal to see all the stuff it's extracted) and this could all be used to make some pretty powerful UIs, but all Apple has done is made one very handy universal search UI, and then a very poorly designed specific search UI, and then made stored searches (which are also useful, but limited in practicality by how bad the UI to create them is)


I have 394,175 photos and videos that I have personally taken since 1997. They are organized by a simple hierarchical system in 5,356 folders.

D:\masterarchive\source\YYYY\YYYYMMDD\photo file name

If I want to find a person, in a photo, I've used Google Picasa (when it was an offline product) and lately digiKam to do face matching, and tagging them with IPTC metadata tags in the photo files. Thus they survive moves across filesystems, etc.

I'm up for seeing alternatives, but there's a very high bar to clear here. People have been using directories and file storage since the middle ages.


You, as a data point (and I, and I imagine many people on HN), are strong evidence of the rule "sufficiently motivated and clever end users will always find a way to do what they want, regardless of the interface".

But that isn't really saying anything about if the interface sucks, or could be improved, just that you're motivated and clever enough to find a good and scalable solution for what you want to do given the limitations of the interface.


I do the same. This is the only way to organize things, by date.

I do the same with documents. I don't even want to think about categories, ontologies are always wrong. But today is 2022-02-24, no two ways about it. It's automatic, there's no need to think or decide anything, so it's not a big deal, you just do it. You can't make a mistake.

The thought of needing to properly tag every document I file is enough to make that a task I want to postpone. So it wont get done. That's a worse filesystem right there, because it doesn't exist.


I agree that sorting by date/time is 90% correct default view of the data. Most file managers have this wrong.

It should be especially easy to have a full sub-tree view sorted by date, but it typically isn't.


None of the files in the 1997 tree have a file date anywhere near that old, the drive they are on wasn't created until decades later.

For me, there is only one durable, universally supported tag for those files, which is the folder structure. Due to the way cameras number photos, for any given photo file name, there are likely 5-10 other different photos with the same name.

You might be tempted to then call for a standard tag that would be supported, but what about files relating to things of unknown dates? Fossils, antiques, draft x of the Declaration of Independence, or of things planned in the future, with dates still in flux?

Having one canonical path and filename for a given collection of bits is a really effective tool, that I doubt will be surpassed any time soon.

However, the next best thing, in my humble opinion, is to use a cryptographic hash of the file in question, as Git does internally. You could map a filesystem interface to a data store based on Git, as long as you don't expect high speed writes to work with performance. (because new checksums require computing across the entire file, even if only 1 bit changes)


But look at the path components: they're not organizing things just by date. The files are first organized by purpose (long-term storage), then by origin, and only then by date.

So they've already made the decision to commit the files to long-term storage, and to keep the original photos separate from subsequent edits, and to keep them separate from other image sources (e.g. downloads). That "tagging" required very little effort because they could just navigate to the existing tag in the filesystem, and put the new files there.


> This is the only way to organize things, by date.

I am terrible at dates - if I had to find photos by date I'd never find them. For anything older than a month that isn't on a known anniversary like a birthday, 90% of the time I find the photos using the map view in iCloud Photo Library. If I was limited to a filesystem view, my photo library would be far less useful.


What you've done here (and it's no bad thing given filesystems!) is define dates as an ad hoc index over a primary key which is the pair (date, filename).

What I'd like, personally, is a way to expose any sortable EXIF data as a 'filesystem', for example `~/Photos/Longitude/122-123/Latitude/36-37/`.

Most of the commentary on a tag system seems predicated on the idea that we can't derive a large volume of tags automatically from the circumstances and provenance of the data. For instance, instead of a Downloads folder (per se, it would be a view) we could have a "downloaded" tag, which could have "downloaded-by" = "Chrome" and "downloaded-from" = "https://example.com/a-url/".

That's a lot more useful to me than a Downloads folder, especially if those tags endure when I add further metadata of the "canonical folder" variety, also known as "moving" the file.


I actually started coding a way to automatically extract EXIF data from all imported jpeg files and automatically attach them as tags to the file. That way you could search for photos taken with a specific camera, or only photos where the flash was used, or location data (if your camera had GPS), etc.. I just haven't had the bandwidth to get back to that feature and finish it.


If you don't mind some drive-by advice: I suspect you have some kind of good idea here, and I'm having trouble really seeing what it is.

I can also read from your site and comments the frustration you're experiencing in getting your product shipped, and also in explaining the benefits of it. The Internet sucks, it's a hostile place, and unfortunately leaking the bad feelings this invokes in you is off-putting to your audience.

I've had an unpublished blog post sitting around called "file systems suck" so I'm about as sympathetic an audience as you'll find. Good luck with your implementation; I'll be keeping an eye on your project, and I hope to understand it better in some later iteration of the docs.


People have been using tables of contents and indices for books and other printed materials for a long time, but those are redundant in the era of CTRL-F. At best, they're useful only as an adjunct to searching digital content.

Likewise, imposing archaic methods of organisation on modern storage is sub-optimal. The argument is to move toward something that makes more sense given the capabilities of the medium.


What tags and searching do not provide is context. One affordance that putting files into a folder provides is reminding oneself that when I look at file A, I should probably remember about file B too. Later I may even forget about the existence of B, but when I go searching for A I'm going to see B as well.

Search and tagging are not contradictory to cataloguing. They're complementary.

It's also not true that old media didn't have any search facilities. Old technical books would each have an index of keywords at the end. That's search, just analog and requiring a bit more work from the publisher. This index didn't make the table of contents redundant.


Listen here, a hot take incoming.

There are two absolute genius inventions in computers so good and timeless that the sliced bread pales in comparison like a stupid troll comment on HN.

1. The keyboard

2. The hierarchical filesystem

Everything and anything else in input devices and data storage builds on these and the best solutions ever always are going to augment these, never replace them.

A good tag system will build on top of a filesystem and coexist with it, and offer value like stupidly fast search. Anything else will be lucky to survive a weekend of dubious fame on twitter, or up to a few months if you actively market it.


I've never seen a way to organize files that feels like it prioritizes files based on how much they mean to the user.

By that I mean pictures I take, papers I write, things I really wouldn't want to lose, vs 10,000 random system files.

For downloaded files sometimes the history of when it was downloaded, and from where, is almost as important as the contents.

Backups from old computers, and old phones start to pile up, and the chaos of trying to find that picture you took 3 phones ago, or the notes you took, or the recording you made, or the pdf you downloaded, or that code you wrote, or that map you made, is a real pain.

Digital clutter is one of my biggest problems.

I really need a good way to deduplicate and organize ALL my digital stuff. Tags might play a role, but I don't think they quite solve the problem.


It feels like the author didn't like tree based file structures? The provided software in the screencast remembers me of iTunes. Which I dismiss because it doesn't provide a logical *tree like structure*. And it is not a filesystem replacement either, it is a database which adds a lot complexity and hides actually data. Furthermore this assumes someone is maintaining the metadata (remember the MP3-Taggers?) instead of the files. Metadata itself is useful but the creator of the file should add it not the user. Regarding file manipulation the proven answer are file permissions but I think CGROUPs are the flexibel, modern approach.

Because I'm seeing "Windows Explorer" in background:

Windows Explorer has degraded in recent years, it is even hard to open your "home directory" and the UI is confusing. Look at the one from NT 4.0 which was much more close the fulfill the task.

And Apple:

I think the regret nowadays the howl iTunes? But instead the pushing hard on apps which contain the data. Now you have to look always into a single app and uses it facilities to retrieve a file. Android failed here, too. But using iOS is hard.


>it is even hard to open your "home directory"

Windows clearly doesn't want people to GET to their home directory, for some reason. That seems goofy. If people don't understand that $user contains the rest of those folders (Documents, Downloads, Pictures, etc) they'll never be able to navigate on their own. That's bad.

In a sane tool that features an address bar, clicking any given directory would show, in the address bar, the path to that location. WinExp only rarely does this. If you click on, say, Desktop, it shows you This PC > Desktop, implying a relationship that is incorrect. Getting to your home folder without typing requires you to start with C: and drill down, which is objectively insane.

Even MORE bananas is that if you start at C: and drill down to Desktop, you DO get the correct path in the address bar. But if you then make a WinExp shortcut of that location, it goes back to the other behavior. WTF.


> Windows clearly doesn't want people to GET to their home directory, for some reason. That seems goofy. If people don't understand that $user contains the rest of those folders (Documents, Downloads, Pictures, etc) they'll never be able to navigate on their own. That's bad.

Yes. Windows prevents users nowadays from understanding a straightforward thing, file-systems. I mean it was always a bit clumsy with A:, C: and [D-Z]: and the weird desktop metaphor harmed as well.

Now I'm looking at the often criticized GNOME and the actually venerable Nautilus. They got it! All below / and in addition devices are directly usable (actually still somewhere below /run). The location bar reflects the current position. The desktop was removed because it never fit into a computer and the file system.

Some actions of Google within Chrome are also questionable. "There is not address entry field" because we don't want you to understand how the web is structured. What? File-Systems are a simple thing, hierarchical. And guess what, the web is similar. Compared to "right click", "double click" and it's new friends "long press", "hard press" and "swipe from somewhere" and "guess what the voice assistant can interpret".


The idea of building “indexing” into the file system means either the file system directly understands all file types, ignores those it doesn’t understand (thus requiring an out of fs indexer), or requires the file system itself to be able to dynamically load logic to handle different file types. By the time you get to the last one all you’ve done is build spotlight(or the ms equivalent) into your file system, so now you’ve got all the cost of the indexer only now it’s in the process reading and writing the raw bits, and of course doesn’t index contents of any other filesystem (so you’re still going to be running an indexer).

I also don’t understand how a filesystem is going to store this data in such a meaningfully different way that it uses less space and/or is faster to index.


I don't think file systems will be replaced anytime soon because of psychology. The human mind remembers things best by attaching them to a real or virtual location. That's how all the memory experts do it, they construct a virtual house in their mind. Virtual rooms, shelves, boxes, and folders are no different. So if anything, I'd give the Filesystem different folder icons based on their depth to reinforce this similarity with the real world.

Also, the article seems to use strawman arguments. Nobody needs to remember the exact image file extensions. You just click on the "search for images group" in windows and it'll search all image file extensions for you.

In effect, tags are already there. It's just that they are automatically generated.


I like the idea, but I think it will not change the world.

Filesystems have already been reduced to storage mechanisms for systems not people.

People just don’t organize files anymore. And that’s a good thing.

Most employees in relativley fresh organizations keep their files in OneDrive and Dropbox. 10..15 folders of random names and good search function that returns recent files on top. The older files just lie there, not botheting anyone because nobody is looking.

Files from other departments are found via links in Mail and Slack search - not as attachments to Email.

People launch Powerpoint (online) and use the recent files menu instead of browsing from the ”C: drive”

To rethink storage ignoring that people don’t store files anymore is futile. It’s nice for organized geeks (like me), but in general file organization is a thing of the past.


People launch Powerpoint (online) and use the recent files menu instead of browsing from the ”C: drive”

Yes, I do that too. But that's because I have to, not because I want to. Onedrive, Sharepoint (and I guess Dropbox too) are impossible to navigate otherwise, so yes, even people that understand hierarchies are forced to use an application's LRU list to find old documents.

That's not a sustainable situation. I foresee huge storage bills for organisations because they won't be able to afford to curate their growing terabytes of disorganized file storage.


The tag-based location of user documents is nice, but why do people want to put it into filesystem layer? This seems like a bad fit.

- Tags in filesystem index too much. For example, if there is a program directory which happened to contain a .jpeg file, it should not be shown to user. Neither should user see files from browser's cache folder.

- Tags in filesystem index too little. Filesystems are device-specific, and a lot of times, you want to index across all devices in system. And maybe some files have no associated device at all, because they were transparently offloaded to cloud?

I think a much better fix would be to have an index database as a separate file, and filesystem providing a general support for it. Author says that the separate indexers might become out of sync or are slow -- but this is not inherent property of indexers, but rather the limitations of the filesystem design. So let's make filesystems more index-friendly:

- Make it fast & easy to detect individual file changes: every file has auto-updateable change time that user cannot mess with (linux already does this). Even nicer would be an extra timestamp which updates when content changes (not metadata) -- together with inode, this can detect renames easily and quickly.

- Make it fast & easy to detect past filesystem changes: There is a way to quickly find all changes made to the disk since some past moment: Merkle hash of directory + all contents is ideal (like ZFS maintains internally), or failing that, NTFS-style change journals can work too.

- Make it fast & easy to detect present filesystem changes: have powerful notification API that can detect all changes on disk. Perhaps also include first few kilobytes written to file for performance (so that file scanners do not have to open every just-written file)?

- Make it possible to "claim" subdirectory: something like a common attribute that advices common file browsers to avoid modifying the content. This way a software can use automatically generated names, and not worry about users just copying random files into arbitrary locations of structured hierarchy. (This should be bypassable by user with appropriate warnings -- this is UX mechanism, not security one)

- Perhaps a standard on how to store tags? All modern filesystems have attribute support, but AFAIK there is no clear consensus on how exactly it'd store the tags.

This way, one could have general tagging system, and winamp music database, and photo management app all looking at the same data and working together.


I think most of the friction of filesystems stems from wanting application layer features in a construct that has always been decidedly "systems" and just holds data, while applications themselves have gone the route of appending more and more features into files. Since there's no middle layer, files have increasingly become the Armstrongian "gorilla holding the jungle and a banana", often duplicating state and metadata to get the job done. And it terrifies me when development tools, as they so often do these days, spray around files, because it usually results in broken dependencies somewhere down the line.

Another approach that could get at addressing this is to define frontend protocols to filesystems that do targeted, application-y things. This is done in informal vernacular often enough through things like naming conventions, but what we could really aim for is a specification that's a "form-filler" for each category, that consumes various document and data types and produces the desired kinds of metadata.

The difference between that and doing it as an indexer is that it could be seen in a bidirectional intermediation sense: if the protocol understands all the relevant formats well enough to parse them, it doesn't have to also hold a file, it could simply use internal structures and generate the file representation on demand if needed. But to do it properly these structures would have to have similar security and integrity guarantees to our current filesystems. And exposing a frontend like this does add surface area, with the silver lining of "if it's pushed down the stack, then fewer application coders will have to roll their own terrible version of this functionality".


Don't recent Androids do your "frontend protocol" idea? At least a file chooser on my phone, in addition to a regular file browser, also has an entry for "google drive", which seems to have no corresponding physical location.


The tag-based location of user documents is nice, but why do people want to put it into filesystem layer?

i don't know if the filesystem layer is the best place, but i don't want to loose the tags when copying or moving files.

so somehow this metadata needs to be associated with the file, but, it also should not be in the binary stream of the file. EXIF in images and other similar metadata systems are nice, but any change there invalidates checksums or other attempts to identify changes in the actual file content. (i want to easily be able to see if two images are identical even if they have different metadata, which i can now only do with specialized tools)


Another thing you'll want for database-centric file stores, that should be table stakes for every desktop OS, is Amiga style datatypes. That is, allow applications to register readers and writers for their file formats. That will help the database parse files for important metadata.


I keep wanting to write a basic implementation of datatypes combined with a fuse filesystem to allow access to metadata and transcoding from unaware applications, then realise I don't have time and hope someone beats me to it. Please, someone, beat me to it...


This is one of these ideas that always float around. Files should be located by tags, not folders. Or file systems should be relational databases or file systems shouldn’t exist at all, etc.

But the fact is people are used to files and folders. Tools are built upon files and folders so changing everything is extremely difficult.

Plus all the tools that have tried to do things differently proved to be a pain:

1. Gmail tags: does anyone use the tag any diffrently from folder/file. Having multiple tags on an email means it’ll show up everywhere

2. Iphones didn’t have files, but it was so inconvenient it was added back

3. Microsoft relational file system was never released (i think)


For GMail I often used "non-folder tags" for example I would tag emails based on the to address and they were clearly marked in my inbox. Or I would tag certian types of emails so that I can review them later. For example SMS would be tagged, but I would read them in my inbox.

I just really wish GMail archiving was a tag. For example I get my video subscriptions into a tag called "Videos" but when I am done I remove the tag and that info was lost. It would be nice if Archiving was just adding an "Archived" tag and it was excluded from tag views by default. That way archiving doesn't forget all the tags. The only workaround I am aware of is making two tags for everything like Videos and Videos-Archive. Apply both in filters then just remove one once you are "done" with them.

Folders have the same problem. Of course trash systems work around this by explicitly recording the original location.


Tagging is work. Fiddly work that's surprisingly costly in effort if it's not trivially automatable stuff like time and date, location, application, and so on.

Naming is hard work, but tagging means creating and choosing shared names all the time, with the pressure that the combination needs to be reasonably unique, otherwise you won't find stuff.

Tagging is also fiddly if you don't have a really good bulk action UI. You can think of the user-controlled paths in a hierarchy as tags, and moving files is the action of untagging and tagging. By moving 100 files from one directory nested three/levels/deep to another, you are removing 300 "tags" and adding 300 different "tags". And you can rename the "tags". A single click and drag, 600 actions, and you can see the before and after trivially, and undo trivially too (at least in Windows).

Tagging is more useful for ad-hoc "favourite" lists, and the occasional cross-reference (but it's work to hunt down the elements in the xref).


If you're running Windows then install Everything.


Boggles the mind how Windows search basically just doesn’t work at this point compared to what you have on other operating systems.

Feels like I can’t even search for a certain file type in a folder.

Really frustrating that Apple, the only company to truly master OS search doesn’t seem that interested in making the type of OS that has files anymore.


>> Boggles the mind how Windows search basically just doesn’t work at this point

Now that you've tried Windows Search on your desktop, please allow us to help (or coerce) you to use MS Bing to search the entire world-wide-web. </sarc>


Yep, Everything is a game changer. I don't worry much about where a file is any more, I just worry about giving it a good filename. Then I will always be able to find it, wherever it is.

Be sure to set a hotkey for it. I use Ctrl+Shift+Spacebar since it didn't seem to conflict with anything else.

Of course before you can use Everything, you have to find Everything. Here's where:

https://www.voidtools.com/


Only for the record, there is also Swiftsearch:

https://sourceforge.net/projects/swiftsearch/

that uses the NTFS $MFT directly and that (if needed) is fully portable, see:

http://reboot.pro/index.php?app=downloads&showfile=609


Everything is a lovely tool, but I'm continually amazed that it should have to exist at all. Why is the Windows built-in search so atrocious? You literally cannot use it as part of a getting-anything-done-at-all workflow. And it keeps getting worse with every update? Why would I want to mix Bing search results with stuff from my filesystem?


"But he has nothing on at all," said a little child at last. "Good heavens! listen to the voice of an innocent child," said the father, and one whispered to the other what the child had said. "But he has nothing on at all," cried at last the whole people.

I agree completely with your sentiment here and it truly boggles the mind.

Thank God for Search Everything.


This has completely changed the way I use files. I rarely ever open the explorer to navigate to a folder, but instead open everything to search for a file and then instantly jump to the file location. Naming files well becomes much more important than where they are located.


Spotlight on mac. I use recoll on my Linux machine and let it update every few days. Fast local file system indexing is amazing and abstracts away a lot of file system pain. You often don't even need to name files particularly well to find what you want if you index file contents as well.


> Naming files well becomes much more important than where they are located.

Aren't you just exchanging the location of relevant metadata from the path to the filename?


For me,

  > ls -rec . | ? name -match <substring_in_filename>
becomes muscle memory as a pwsh daily driver-type-person (=> PowerShell 7).


For simple matches against one pattern, you may prefer...

    > ls -rec . -filter *substring*`
... as filtering can be offloaded to the "provider" when Get-ChildItem (ls) knows about it.

Even if the filesystem provider doesn't handle patterns any differently from Where-Object (?), you can save the cost of hydrating FileInfo objects only to query and discard most of them.

For multiple patterns or anything that you'd need a regular expression for, Where-Object is superior!


Thanks, I'm not usually doing performance required stuff but when I write a function I always want to know the best way to feed the pipeline, I appreciate it. #snoverville


aka

  find . -iname \*<substring_in_filename>*
(=> V5 Unix from 1971, although the case insensitive -iname was never standardized)


Have it pinned to my taskbar. There's also Windows PowerToys that has a quick launcher.


And if you are running Linux then fsearch is worth giving a try: https://cboxdoerfer.github.io/fsearch/


It's time to wrestle control over files away from users. /S


What's wrong with adding metadata to each file and indexing that? I thought this was essentially a solved problem.

Also, the OP solution merely sounds like a slightly altered filesystem. I thought he was going to propose something akin to WinFS, Microsoft's ploy to merge an SQL database with a filesystem, but it turned out to be a dud.


This looks pretty neat (though i will not easily give up files). The author seems pretty frustrated in another post that few people are interested. I am willing to give it a look over at the design+play with it, but i struggled to even find the website, and on said website there is no way to download the software. I only found a sample data archive.

https://didgets.substack.com/p/what-is-wrong-with-you-people...


In the comments here, people named dozens of similar systems. I am sure that in the previous discussions (that must have prompted that frustrated post) the same happened. Author must have read them.. and then went to write:

"I have invented an entirely new way to store and manage all kinds of data"

There are no references to other systems, no comparisons. Did he just ignore all the previous work? Having "digets vs X" table and a section why this time it would work will do a great thing to this projects' credibility.


I apologize for the missing download file (DidgetsBeta.zip) on the website www.Didgets.com It seems the latest upload failed, but I fixed it.


Data organization is constrained by the worst system, because of interoperability. How do you move these tags across the internet through systems that don’t understand them?

For example, the Mac had “file types” and “creators” as separate metadata since the beginning. Because type wasn’t encoded in the filename, mistakes weren’t made that accidentally changed the type and you didn’t have multiple files of the same name differing only by extension. The file always opened in its creator but power users could easily change the creator. To make a successful round trip to another system, the file would need to be given the right extension and then another program would need to reassign the file type and creator on reentry. If you didn’t do it right, people would complain that they couldn’t open the document.

In addition, experience shows that organization must happen automatically or people will just let it do whatever. At this point, most users probably have all their documents in one folder and all their downloads in another. If they weren’t indexed automatically, they’d just give up and say they don’t have the documents anymore.

Come up with an intelligent way to organize automatically and it will be a real revolution. I’d like to be able to find that photo I saw a few weeks ago when I need it. I want all the documents that are similar to the one I found that isn’t the exact version I wanted. I want all the photos taken in Brazil as well as unlabeled photos that might be Brazil. I want the EPS version I have of this jpg logo


I'm a huge fan of tags (as I promulgated the idea in the first place in the early 2000s) but they have a bunch of problems.

They're better understood as a memory extension system rather than a sole filing system. The idea being that it improves recall of objects if you add some attributes when saving as you are likely to use some of the same attributes when recalling.

But the vast majority of objects on a filesystem are mechanically generated and never touched by the human using it (assuming a sole user.)

The model gets much more complicated when many users are interacting with the same system.

As noted elsewhere, the flat namespace gets cluttered very quickly. I do think that there is use for a hierarchical separator since many times objects are fully inside some other concept. And when looking at massive userbases creating tags for memory, there is a distinct ordering of generic to specific when creating tags in order.

Also, filesystems also allow a bunch of workflow that aren't completely obvious under tagging. For example, a business might copy their template folder and rename it for a new customer, and inside it it has a bunch of documents with the same names. I think this a bit like having a bunch of objects (in the programming sense) thus creating things that all have the same method names (except now they are files, for example)


I am old enough to remember the talk of Microsoft Cairo (successor to Chicago aka Windows 95) and the Object File System which would do a lot of this.

If Microsoft at the heyday of its monopoly power could not pull off something like this, I don’t see it becoming widespread now.


It would’ve been really interesting if WinFS actually was released. It had a very interesting type system. Probably would’ve been more suited for servers than desktops/clients.


I wouldn't be surprised if Microsoft could have pulled it off if they had designed the system from the ground up. Instead they tried to do it on the cheap by taking two existing systems (NTFS and SQL Server) and combine them in ways they were never designed to handle. The product was either was too slow or too fragile (or both).


This article seems to conflate two separate issues:

1) File systems could be modernised.

2) Files and folders as a metaphor break down when you have enough files, vs, say, search.

(1) is almost certainly true. (2) I don't think I agree with. If people use something like Confluence, things still get disorganised if people (such as me) make documents and folders in a way that's a mess, then they'll be a mess no matter the metadata.


Related: can anyone please tell me why Google Drive search is so bad?

It’d be bad if it was built by some random tech company. But it’s built by Google who I believe have some reasonable capability in search…

It’s just absolutely useless. Like: I’m a half assed php developer and I could probably knock something up that returned more useful results from a drive search.

What gives?


The proverb for this is, "The cobbler's children go unshod." An organisation that has specialised expertise usually deploys that expertise where it has the largest impact, because it is limited. Hence, the cobbler spends all his time making shoes for paying customers, and he has not time to make shoes for his own children.

So, the Google Drive search implementation is probably closer to what you supposed than what you'd expect if they threw the full weight of their search expertise at it.


I'm sorry, but this sounds like my 13-year old daughter rambling about why her desktop has become complex and why not be just like the simple iPad. However, we may not like it; there will always be a file system - for things to have a location.

Yes, we can encapsulate and put a different layer on it -- tags, containers, types, relationships. However, understanding how they (the file system) work would be an immense help.

I have been teaching my daughter how files are located, added, size and why these details matter.


You say that like the file system is the ground truth to how we use a disk. It is nothing but another abstraction we've invented to manage blocks on some persistent medium. The vast majority of people wouldn't notice if you replaced their FAT partition with a key-value store, where the keys were paths delimited by slashes (with some UX to traverse paths).


Are you confusing semantics (open, readdir, cwd, ".." etc..) and implementation (directory entries, ".." pointer, etc..)?

The blogpost talks about semantic change -- "open" would mean something else entirely.

You seem to be talking about implementation, and yes, users or programs don't care. In fact, many network filesystems like DAVFS and SMB treat filesystem as key-value store, where a single call can get any file by full path. It would be interesting to see an on-disk filesystem that is based on key/value store, but sadly the blogpost's author does not talk about it at all.


The point I was trying to make was that the directory hierarchy isn't fundamental to how people reason about their data. That you could introduce a little UX over a KV store and those that wanted to mimic the old abstractions could continue to do so. But we could also change the abstractions (like the blog post says) to something like tags, and it wouldn't be any lower or higher level, compared to what we have now.


All the "tags" systems I have seen rely on some underlying store -- like in database-based methods, there is main table data (K-V store) and indices (another K-V store); or in symlink-based tags, we use underlying filesystem as data store.

So the fundamental filesystem abstraction is K-V store. For efficiency reasons, based on common access pattern, this is often (but not always) implemented as a special kind of a tree.

The tags system builds on that and thus is higher level -- even in the Didgets, the main primitive is "open by id", an example of K-V store operations. The whole tags thing is a way to query that store.

The hierarchical filesystem also build on that -- but this time the key is (almost) arbitrary string, and the search operator does /-delimited prefix match.

There is no real abstraction change when we move from hierarchical files to tags. There is still same old underlying K-V, but we just added an alternative to readdir() (and maybe also forced every file to live in root directory)


Files seem like a pretty natural abstraction: a blob of data, an array of bytes that forms a self-contained unit. There's something that seems natural about that. You can attach all kinds of meta-data to files, but then you run into the issue of which types of tags people care about or are supported by different OSes, etc. The idea of hierarchically separating files, and giving them names, and also types in the form of the extension, seems like a pretty good system to me.

In terms of tagging files, I feel like, with good machine learning, that can be done automatically. No human effort required. You could just search all your files with a vague natural language description of what you're looking for. Kind of like tagging webpages... Does a good search engine even care about your tags?


I think you're talking about file systems as an implementation detail, whereas the author seems to be talking about file systems as a user experience.

For example, iOS devices have a file system, but for 99%+ of iPhone/iPad users it doesn't. Presumably, the author's "Digits" object database is persisted via a file system.

Also, a user-facing file system isn't needed for things to have a location, and things can have a location without needing a user-facing file system.


iOS strongly opposed exposing the traditional file system but eventually caved in (with the Files app), as for non-trivial use cases it is simply the better way.


Well, on the one hand, I tend to agree, but on the other hand, we should explore these concepts more deeply. The Von Neumann computing architecture hasn't changed in like 80 years and is still pretty darn dominant. I am fine with people exploring this space without dismissing them out of hand. Sure, file systems don't seem like they are going anywhere in their present form, but this is how we advance computing. Paradigm shifts can and will happen, eventually.


To me, what makes hierarchical file systems compelling, and what a replacement needs to be able to achieve, is that it provides logical filtering. E.g. I rarely want to search all image files on my machine. I want to search screenshots or personal photos or desktop backgrounds. For some categories, like my source code, the hierarchy is deeper.

You can emulate that with tags, certainly, and for some things that's a better fit (a photo can be both a personal photo and a desktop background). But what matters is that the full set of operations, including "this is no longer an X, but it's a Y" aka move/rename, and "find Xs with these attributes" must be just as fast and easy to do.


Is Didgets a front-end for a SQLite system?

Because that would seem a great way to accomplish much of this functionality, with your choice of UX wrapper.


29 years ago, I worked for a little software company called PC DOCS that wrapped WordPerfect with the ability to have long file names, keywords, and full-text search. It was like sheer magic in the days of 8.3 DOS filenames. After I'd been there a while, I figured that's what the future would look like one day: no more hierarchical file systems or file names.

Current-day me would be embarrassed to tell 1993 me that things haven't really gotten much better. Sure, we expanded the 8 in 8.3, and the 3 grew to 4 or so, but I still spend an inordinate amount of time staring at a spinny widget whilst waiting for the OS to find a document.


The article and everyone in this discussion is talking about photo tagging and such.

What I want to know is how configuration management, source code organization, build systems, packaging systems, and other experts tools will work without a hierarchical filesystem.

If you're designing a camera application, sure, maybe you don't need a hierarchical filesystem if you have other means of organization and discovery. But who's going to be able to code that camera app in the first place? And when "it" is installed, what exactly happens?


These systems were designed for a hierarchical filesystem so it is natural that that is what they support. I'm sure that if tag-based filesystems were the default they would have an equally suitable design that worked well there. This could be as simple as specifying the "project" as a set of tags. For example project=camera and maybe compile a submodule with project=camera,module=ui. Then the files inside could be processed by the build tool.

Of course for any tag-based filesystem to take over it will probably need a compatibility mode. I can imagine something simple as a path=myproj/src/main.rs.


My point wasn't that it would be impossible to create a new kind of solution. It's enough that it would be a massive undertaking for unclear benefit, if any.

If we get rid of hierarchical filesystems, what replaces got, for instance? Perforce has some ability to project different filesystem hierarchies, but even if that were enough (it probably isn't), porting all those tools and projects to perforce would be an incredible amount of work without a clear upside for users of version control systems.


This is the piece I've been struggling to get at. I understand why filesystems exist and how they help with interacting with the physical structures that underlie computers, but I don't understand what benefit a database alternative to directories can offer (and if you tell me a directory is just a specialized database please be prepared to explain using very clear language).

Maybe there is one, but even when I talk to experienced devs with engineering (not just software) backgrounds, I haven't gotten a good explanation. Perhaps because there isn't one?

But I've been curious about this for weeks.


'Replacing file systems with something better' does not mean you have to throw out the hierarchical folder structure that people are used to. Didgets does use folders to organize files if that is what the users want (when importing files, this is the default).


I've been seeing this shit for decades.

The problem is as soon as you end up with a tag/search system with any complexity, you end up recreating paths/folders/whatever again. As soon as you have a multiuser system, you're forced into it.

No one has ever justified to me why i need to stop referring /etc/whatever/config or /users/username/whatever

We'd either need to scrap everything we use today, or emulate it anyways to use things like cp or mv.

The answer is both. Tags are useful, but not a replacement.


I never have issues finding files. I am not particularly discipled and keep things very tidy, but I have folders for everything. Documents have their folder, software projects have their folders, photos have their folders, and so on. If I need to search for something I know where to look.

I won't necessary dislike another system to keep files organized and find them, beside a hierarchical one, but I have yet to meet one which is better than a file system.


I totally agree with this article - I've made similar points myself.

I would also add, that how terrible is it if you have a file that applies to 2 or more areas, areas that you distinguish by folders? Eg say its a photo - you might want to have a 'family occasions' and '2019' folder - ie one photo might apply to 2 areas that you want to distinguish. Well, that level of organisation is effectively punished - you can either duplicate the photo in 2 places, or lose the reference in one. If you improve the photo using say, photoshop in one place, you better remember that you have a copy and remember to copy it over there too!

For me file systems were designed wrongly. For personal files there is the data and the meta-data. But the only meta-data we can add is in the file attributes.

Data should go into a 'bucket' and be stored by the OS somewhere, never duplicated. Meta-data relating to the file should describe features of the data, where to get it (perhaps in multiple places). It should be possible to apply multiple labels to it too, in my example 'family occasions' and '2019'.


for this you have links, i.e. ln https://en.wikipedia.org/wiki/Ln_(Unix)


I know about links, and there are one or 2 other solutions I've looked at 2.

But that's not really what I'm getting at.

Links are pretty cumbersome, get broken, etc. They sort of work in the avoiding the duplication issue I mentioned. They don't work in the sense that what the (non-techy) user wants to do with data is an afterthought - ie the order of priority is computer, then user.

What I'm talking about is that the meta-data describing personal files should be where the user adds their value to - they should be able to describe what the data means to them and that should only need to be done once. The data itself should be handled by the OS; where it is etc is not really a big concern for the user themselves.


This doesn't seem great. Soft links are not resilient to renames or removals of the "canonical tag". Hard links are not resilient to improving the file in photoshop.


I'm always excited to see and learn about new and different concepts, but I struggle with two main points:

1. It is not the implementation of a thing that I worry about, but rather how it will be misused.

We have a generally good understanding of how end users misuse the current file system hierarchy. Before adopting, or even advocating for, any alternative it would due a huge benefit to sit down and consider the ways in which such a system would be mishandled, abused, and used maliciously by end-users or bad actors.

2. Do the gains outweigh the growing pains?

In the event that the potential gains of change seem particularly appealing - and the concerns of point 1 have already been thought out and addressed - are the gains of that change significant enough to follow through on actually doing it? There are many cases where something has been vaguely improved on in some way or another across many industries, but it is a rare thing that the improvements of those changes have justified the time and investment by the participants of that industry as a whole.

So: if the primary gain of Digits is to make searching faster, does that gain of search speed outweigh the task of changing a very fundamental aspect of how computers operate today?

My personal opinion is "No, but..."

There is always room for improvement. I think that if such a system finds more ways to entice people to adopt it, to find more depth and measurable, tangible benefits of adopting it, it will have a much stronger case to make. Speed is great, but we need more than that.

---

From my perspective, I have very little issue finding files I need. I take pretty good care in keeping my files/folders well organized. I find it relaxing to organize things in a logical way. I even like organizing file structures at work, and for people in my personal life. I see this as a "non-problem" - it's a thing I actively take satisfaction from fixing.

In the event the author of the post reads this, I give you this challenge: Give me a better alternative. Give me a system that can reach the same sort of satisfaction that I get from highly organized file system layouts. Justify why I should no longer need to do that. I'm not opposed to change entirely, I just don't see a very clear gain from it, in it's current form.


I thought this was going to be about replacing the inherently race-ridden Posix file system interface with something that has a well thought out, defensible design and API.

In the old days, that would mean I need to do one myself. But nowadays, you assume somebody else must have done that already, so you go looking for it. If it doesn't exist, it was probably too hard, so you give up.

Or maybe you find some, but all are unsatisfactory in some way. Then you might synthesize what you like from them all, and add some private improvements, and publish that.

Or maybe you find one you really like, and you download and try it. Then you write an article, something like "The Time Has Come to Replace File Systems" or "File Systems are Racist" on your blog. Somebody posts that to HN, and it generates passionate interest. Shortly after, it appears as a module for Linux, or maybe for some popular embedded kernel used to implement USB keys. And, it's off to the races. Or, off to no more races. Or something.


Maybe what needs to be replaced is the meme that people can't deal with hierarchical filesystems.

That may have been true 20 years ago when the average person didn't use a computer too much. Is that still true today? Will it be true in 10 years?

Is designing a filesystem the product people's equivalent of the developer's game engine?


If one has forgotten where they stored a file, then no database will help them either. A file system in the first place is also a database, and whether hierarchical or relational database, they are only as useful as the data they contain and the tools we shot at them.

The problems this aims to solve are from the user layer, but the majority of files are used from the system itself. The system does not need such solutions, it's working well with the established solutions. Or is there any research in potential benefits for distributions when using a relational file system, instead of hierarchical ones?

Another point is that this would probably bring up new issues. Is there any system with real experience on this that can say what will change, what will be better, what will be worse?

But, it's also true that we should improve and innovate this at user level. Having every app separately indexing files and handling them is a poor situation. Working toward a generalized solution would be beneficial. Maybe a universal relational database layer as a foundation for all apps would make sense? Something that can have a tight integration with the Apps, and is on top of the normal file system? Thus, is would be the job of Desktop Environment to offer and integrate this.

KDE did try it at some point, but seems to have failed with it for many reasons. Maybe they should restart this, but more pragmatic and enforced, and probably in a fork as a testing ground. I remember that one problem they had, was working on multiple systems in parallel, each limited to certain apps. But there should be only one system, and it must be integrated into every app the same to be successful. KDE with it's centralized, and mature libs would be perfect to enforce this. Gnome might be able to pull something similar, but I don't know how the state of them is today. Of course this would only be a first step, but if some major DE would solve this for them, others could join it, and they could start building an independent solution. And maybe, from there, they could work toward a specialized file system that improves things even more, if really necessary.


File systems are pretty instrinsic to how modern computers work, replacing them would be a monumental task not just for users but for the architecture of computers.

Of course you don't nessecarily need to replace it, you can just use new tools on top of it, like you do with iOS and to some extent android. Both of those use file systems even if you never see them. But in any of those cases, you're still using a file system, with all the apparent problems that come with them, you just have an abstraction layer on top to make it easier (or at least different) to navigate.

For me, file systems are about as tried and true as you can get, they're not perfect but they do the job well enough to consider the cost of replacing them too great.


> File systems are pretty instrinsic to how modern computers work, replacing them would be a monumental task not just for users but for the architecture of computers.

Is this true for spinning disks only or does this apply to SSDs as well? I understand that APFS tried to shoehorn SSD-like operations into the FS layer.


I’d love this. Been wanting something better forever.

A bit related: I want to click on a project name and it should set up my environment ideally for that project. Filter the files, include/exclude the apps I see, filter bookmarks. Just show me what I need for this one project, not all the crud across my whole life. Where anything is linked to multiple projects, let me include/exclude at will. This would all be especially useful when I haven’t touched a project in a year or two and can’t remember exactly where everything is.

Two monitors? Project view on each, drag between them.


I'd be perfectly happy if all file systems were flattened into a sql interface. The relational model is the only thing flexible enough to give all users what they want.

Imagine being able to define custom triggers that update cross reference tables when certain file system operations take place. Or, building custom views of storage system for various use cases.

Having the user shell tightly integrated would be nice too. I'd like to be able to write a query like:

  SELECT foobar2000_enqueue(t.id)
  FROM vTechnoBunkerTracks t
  WHERE t.PublishedYear > 2021


So what you want is the circa-1970's-era Pick OS which had a database (albeit not relational) for storage rather than a conventional filesystem.

All that's old is new again...


Literally BeOS and now Haiku.


Sheesh, organization and hierarchy are not the problems with file systems. It's the APIs and the network awareness. Can someone name a network filesystem that doesn't suck horribly in some way? Samba? NFS? CephFS? GlusterFS? They all have some glaring problem that holds them back.

We need storage that works everywhere and that doesn't suck. DropBox, et al, tries to make it happen but they're held back by mediating everything through the filesystem layer. Traditional filesystem concepts aren't going to make that happen.


it'd be cool to see an information theoretical explanation of the complexity of hierarchical -vs- non-hierarchical file systems, and why human beings have such an initially easy time with hierarchies.

there's the obvious inheritance factor: the parent of a child in a hierarchy can provide a _heuristic_ about properties of the child. you sort of get this with a tagging system, but the inheritance can really only go one level (unless you make a hierarchy of tags, which brings you back to tree structures)


Folders impose a tree shape to file organization which then has to be awkwardly worked around with symlinks, indexes and whatever the heck those library things on windows used to be.

That said folders are useful as a concept, but maybe flip things so that the folder is a tag on the file, and I can "move" or "copy" a file by changing its tags. And still present a visual folder structure by default in the file browser.

The biggest loss with this approach is namespacing... Not sure what the best option there is.


Files give you ownership of the data. Tags make you slave to an intermediate layer between the data and you.

Let me jnow when you have replaced your filing cabinet with just tags.


If filesystems were replaced with a new paradigm today it would be a centralized SaaS where every operation requires a round trip to the cloud and ads show up alongside your data unless you pay a monthly subscription. Every single thing you do on your computer would be streamed to advertising analytics farms. Without an Internet connection it would be impossible to access your data.

... being sarcastic ... I think ... maybe ...


Didn’t the resource fork for Macs enable some of this stuff decades ago? At least the file still being locatable even if it was moved or renamed.


heh, the tag names in that demo video follow a hierarchical structure - .didget.Name, .didget.SHA1, .fileSystem.Ancestor, .fileSystem.Folder, ....

So now not only do you have a hierarchical (but optional, which is worse imo) structure for files, you also have a hierarchy for tag names (which, unlike with folders, you can't really rename if they conflict because they're singular keys).


Keep it simple, stupid. Hierarchical FS's work. Just don't be lazy and tidy up yourself.

Most of innovation on tech is not needed. Would you redesign a spoon or a fork being centuries old? No. It works.

Something like BeOS/Haiku's filesystem would have more sense, but yet useless on sharing the files with the rest of FS. Just look what happened with resource forks on Classic Mac.


Disappointed article never mentioned WinFS which indicates to me that the article didn’t actually have much research put into it.


The article was never meant to fully describe the Didget system or all the issues that went into its design. One of the reasons I started working on it years ago was because Microsoft cancelled its WinFS product and I wanted to see it in action. I decided I had to try and build something on my own. I can assure you that I put a lot of thought and research into it.


> The unique identifier for a Didget is a 64 bit number. It remains consistent throughout the life of that Didget.

How are safe-saves handled in Didgets? That's usually a case where a unique file-ID gets changed.

(Is "safe-save" the canonical name for this? Or atomic-save maybe? Where you save to a new filename, then move that file into place.)


In Windows, when replacing a file with another using ReplaceFile the replacement inherits the original file's NTFS OBJECTID.

I would expect that a similar operation would be supported.


The hierarchical folders-and-files metaphor/implementation is second on my list of software paradigms that need to die. And I say that as an "if it ain't broke, don't fix it" guy who does not jump on bandwagons and want to replace everything every other season with the new new thing.


Why? Hierarchies seem like a natural fit for the human mind. We can know only the relevant parts, which might change according to what we're working on and confidently exclude entire large complex branches like the contents of C:\Windows or some program's temporary working files or whatever from our awareness.

They also combine in an obvious way. If you mount another filesystem somewhere, it's contained in that somewhere, not mixed in with the existing one in some unexpected complex way.

They're not a natural fit for real world data, but there's a tradeoff between serving humans and serving nature.


> Hierarchies seem like a natural fit for the human mind

I'd argue they only seem that way because we've been raised and taught to use them but haven't been taught sufficiently to think about other ways to organize knowledge.

> They're not a natural fit for real world data

In "A City is not a Tree"[1], Christopher Alexander discusses the semilattice in relation to city planning and human societies. Clay Shirky has previously noted[2] that categories and ontologies are too brittle to serve human thinking.

As an example of an ongoing attempt to create a hierarchical ontology that isn't helpful, look at any list of music genres. Also note that social networks are not organized hierarchically – humans have the ability to handle interconnected structures just fine.

This also relates to why many people don't understand or can't quite relate to true distributed peer-to-peer computing or systems where there is no central controller or fail to grasp emergent behavior. But nature is fine with completely decentralized systems: see, for example, foraging ants. See, for example, Emergence: The Connected Lives of Ants, Brains, Cities, and Software by Steven Johnson

1. https://blogs.ischool.berkeley.edu/i103su12/files/2011/07/19... 2. https://oc.ac.ge/file.php/16/_1_Shirky_2005_Ontology_is_Over...


If hierarchies were adequate we wouldn’t have tags or symbolic links/shortcuts.


Small nitpick:

>The size of this record in popular file systems can range from 256 bytes (Ext4) to 1024 bytes (NTFS).

On 512 and 512e disks, where the NTFS record is two sectors, in 4K "native" disks the NTFS record becomes 4K, i.e. e single sector.


Good catch. When I ran 'fsutil fsinfo ntfsinfo c:' it reported the bytes per FRS as 1024, but then I realized my c: drive was an SSD which must have 512 byte block emulation on. The hard drive shows 4096 bytes per FRS. I forgot about this when writing the article. That makes the difference between my record size and NTFS's even greater (1/64 vs 1/16).


As a side note, for extremely small files, the actual file content is written directly the $MFT entry, the size limit is around 700 bytes for 1024 bytes $MFT records and around 3750 for 4096 bytes $MFT record.

Some reference, JFYI:

https://www.forensicfocus.com/forums/general/mft-resident-da...


This is what we have information systems and databases for, on top of filesystems.


The article breezes past the obvious (and widely implemented) idea of indexing on top of existing file systems.

I think it would be a lot more useful to improve on that than pursue yet another quixotic file system idea.


The time has come to replace <enter whatever you dislike just now>


This post is full of interesting new ideas. I'd love to see these implemented in a mainstream Linux distro (or Windows/Mac), but the backward compatibility considerations are a huge impediment. Literally all the software in the past 4 decades (at least) has used directory and file based file systems.

One way to encourage usage and let developers find advantages of the new system could be to offer this as an alternative file system that can be used in parallel with existing ones. That would also uncover bugs/problems in the new system which could be fixed.

PS: I do think the name "didget" could be improved. It is not as natural to write or speak as "file".


Is it really full of new ideas? This looks like a fairly generic tagging system which seems to pop up on HN every few months.

If you are interesting in tagging systems, I recommend you to look at Microsoft WinFS (part of Longhorn). This took idea of tags very deeply -- not only it had user-specified tags, but it would parse many existing file formats and automatically generate tags based on contents -- for example, if it saw a word doc and detected it is of "resume" type, it'd extract "Name, Educational Qualification, Experience" values from it.

Sadly, there is no good single overview I know of. You can start from wikipedia (https://en.wikipedia.org/wiki/WinFS) and follow the links.


I use the name Didget (short for Data Widget) because it encompasses much more than file data. A 'File Didget' is an object where its data stream has unstructured data and is unmanaged by the system. But there are other Didgets (Schema Didgets, Set Didgets, Tag Didgets, etc.) where their data stream contents are completely managed.

Like NTFS that uses files for its internal data structures and many databases that use relational tables to manage databases, I chose to enclose all my internal structures (e.g. tags, file tables, allocation bitmaps, etc.) within other Didgets.


I am not sure OP knows what a filesystem mean in the first place.


What isthe difference between a tag and filesystem (directory)?

I guess a file having more than one "tag" - but multiple hard links have been a thing in file systems since forever.


> but multiple hard links have been a thing in file systems since forever.

The UI around creating/managing/deleting them is worse than with a properly supported tagging system, though. Plus they're prone to breakage by any program that attempts to do atomic saves (i.e. renames the old file first).

While I certainly wouldn't want to give up hierarchical file systems as the default, there are a few things where I'd really wish for some OS/file system-level support for tagging, too


What’s the state-of-the-art of command line file searching? I have a macOS machine and mdfind is awesome but something similar for Linux is not yet in my toolbox.


fzf is nice, but I'm not even sure it is what you're searching for (pardon the pun).


I badly want to try it out and see how it fares in practice, together with some integration with open() calls for interoperability. But where to get it?



Still waiting for tabs in WIndows Explorer...


You can use http://qttabbar.wikidot.com for native tabs and loads of other useful features. Even works on Windows 11.


Yes I've been using this solution for a few years now it works great with a ton of options.


I'm fine with the filesystem, but I would like each file to have a unique identifier instead of path


His numbered list of issues makes perfect sense. And apparently he has solved all of those problems?

This feels like one of those projects that corrects some structural flaws in the status quo.Therefore it should be commended and adopted.

However, since the judgement and behavioral patterns of most humans is very similar to that of ruminants, the likely response will be indifference or hostility.


Not sure if you are joking or not, but no, his numbered list of issues does not make a perfect sense.

Item 1 is irrelevant -- one very rarely cares about fixed-size metadata record for file, because one doesn't normally cache metadata for each file on filesystem in memory. Items 3 and 1 contradict -- if the file record has an arbitrary number of tags, it cannot be fixed-size. Item 4 is solved by MacOS (via access-via-inode) and Windows (via file tracker service). I cannot quite understand what he means by item 5 but maybe he is talking about Windows not having trusted "ctime" field? If so, this just Windows' problem, all Unixes solved it long time ago.

You may want to check the technical merits before recommending the project for adoption.


Every attempt to replace it has just been proof of how great the file system is IMHO.


Just no


This person conflates file and file system in some ways that reveal him to be a rank amateur. This is a textbook example of chesterton's fence.


His CV says he worked on NFS for Novell for 7 years and then PowerQuest (who made PartitionMagic, DriveCopy, DriveImage, and ServerMagic) for 6 years. He's clearly not a rank amateur.


While true it's not impossible for engineers to make mistakes, even experienced senior engineers.

I think the mistake here is perhaps the idea that what is proposed are actually problems that need solving. The idea of indexes being managed by the OS is sort of handwaved away in the article because: "they have to store their indexing information in a separate database. It is easy for the database to become out of synchronization with the file system. Also, to speed up the indexing process, users often only index a portion of the file system so using the index might not turn up the file(s) you were looking for."

A second mistake is thinking it's possible to solve the "what is this file" problem:

    The metadata record does not have a file classification system. To determine what is in a file, the file name or the data stream must be examined.
This will always be true. You can make a best effort based on some indicators but it will always be prone to problems. What ever system you create that you believe will address this problem will be prone to failures, almost by necessity, because a "type" of a file is largely a philosophical thing that everyone pretends is technical. Even if you solve for 90% of cases, you're still going to have enough problem cases that it won't be reliable. It may work for simpler cases like text files, videos, zip archives, and images. How do you classify a docx though? Is it a zip file? A "document" file? Is something Javascript just because it ends in .js?

Is adding this level of overhead to writes useful? You can see in the video his test framework sets up "hundreds of millions" in seconds and yet copying actual live data from his Program Files directory takes 1m10s for 24,431 files. This suggests to me that at the minimum the "hundreds of millions" of objects are not representative of real live data, since that takes approximately 2 seconds to create.

I'd also suggest that if this is truly innovative and these problematic things from the presentation are actually just artificial and that it is doing things under the hood, it should be using a true OS filesystem browser to do this same work. To convince me it could replace standard filesystems, demo me an implementation of the OS running on it, or at the very least using it as a storage drive.


What you've written is very different from the knee-jerk "rank amateur" dimissal that added no value to HN which I was replying to.


I'm a little confused too, Didgets apears to just be a database engine + ui.

And this application of it acts as a literal indexing service.

It's a *long* way from :

> ...time has come to completely replace file systems with something better!


What in particular?


Oh boy, web3 really is back!



man locate


The pessimism here is ridiculous. I thought the demonstration video was pretty cool, seems like the didgets concept is more sophisticated than meets the eye. Open your mind a little, people.


Sometimes a bad idea comes back from the 1970-90s and people here are on average young enough they start falling for it again.

But the last serious shot at this was recent enough and a big enough disaster (Longhorn's WinFS), and/or a few years into iPad ownership when people started wanting to use it as a more serious computing device with an awkward few years until it got "file management" (it always had an FS). Both are too recent. Try again in another 10 years.


Let’s replace something that works with an unknown just because time has come is like saying let’s replace transportation with teleporting except that we’re in 2022 and we don’t even know if it’s still possible.


First off, this is clearly just a promotion for "Didgets"... which seems like a cool idea, but the problem is the article presents a solution by framing normal-computer-user file organization as a problem to be solved (it really isn't). OP does this instead of identifying an actual use case. OP - you are losing people this way. Normal users do not have "200 million records" they are trying to search for by file type or name. Get a better pitch than this for whatever you trying to solve.

> If you have ever forgotten where you stored a file in a file system Nope, never happened to me.

> File systems let you store any file in any folder, regardless of whether the folder path is appropriate for the file YES, exactly, that is a feature, not a problem. If it is not appropriate for the file, you only have yourself to blame for putting it there in the first place.

> Even if you remember some parts of the file name > searching by file extension may or may not turn up the file you were looking for Who searches for file by name? Names mean nothing. You find the file by navigating down the relevant hierarchy based on what you are looking for. By file type, are you kidding me?

A hierarchical file system is ideal as the basis of file organization. Tags and search can be added on top of that (and admittedly have room for improvement) if you really struggle to handle organizing your files by yourself (or some unstated use case the OP has in mind).

I'm struggling to figure out how the OP operates... at some level they seem like a computer illiterate person who saves everything to the desktop then struggles to find stuff. But, I'm sure that is not actually the case... so do they just fail at making an attempt to organize? I'd be interested to better understand what type of files they have so many of that they have to resort to constantly searching by file name or type and need a faster way to do that.


I'm sorry if the article came off as such. I meant to start a discussion about the shortcomings of filesystems and get people to look at possible alternatives. I think what I built (so far) has some great features, but it is entirely possible that the eventual replacement for file systems will be something much different. I spent years working with filesystems and databases and came up with a long list of shortcomings I saw in both. I don't pretend to know all the answers, but I also don't like people dismissing ideas without taking a serious look at them.


I don't really have any of those problems.

And I can rip `find -name` through a 4TB SSD pretty quickly. Usually only a subset is required, because I remember roughly where it is.

For my NAS I spend a bit of time on keeping files organized.

I don't generally want full-text search either, it tends to be cluttered with crap and I don't use full-text indexers (invariably they seem to decide to reindex whenever I least want them to).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: