Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This would work if you only visited like 5 sites. If you visit thousands, and more like tens of thousands when you consider 3rd party embedded content, you can't figure out which sites a user visits, but rather only which ones they don't.


It worked pretty effectively.

Think about the 10,000s of JavaScript libraries out there, and the 100s of versions of each.


number of libraries * number of versions ~= bits of information to classify you

A good intuition is that although there's low certainty which site you have visited, there's high certainty which sites you HAVE NOT visited

I think is easier to see how you can fingerprint someone based on the set of sites they HAVE NOT visited

Edit: I'm not so sure anymore, I think that requires to test looots of libraries, it's not practical unless you are a really nasty ad company that tests hundreds of libraries in the background really sucking up your bandwidth


I got a better idea

Imagine ad companies choose libraries that are roughly used by 50% of users, if they test you with 10 of those, they learn ~10bits of entropy to classify you, i.e. which of the 1024 classifications you belong




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: