The game continues. Back in 2010 when I was writing the first in-browser bot det...

narag · on Feb 19, 2023

What are bots used for? I can think of a few reasons, wrote a scraper/submitter myself in the 90's for a cooperative of subcontractors that was being forced to use an extremely sluggish web app by the big company that provided their gigs.

But I guess there are all kind of purposes, some benign some nefarious, and that they somehow influence the bot operation and detection.

323 · on Feb 19, 2023

People are paying $500 for bots used to buy the latest Nike/Adidas/... limited edition sneakers. Or videocards a few years ago (for crypto mining).

It's a whole industry.

> If we consider a user base of ~175 users, and a minimum bot price of 200 euros (175 users x 200 euros), then the bot developers made at least 35K euros (~$37K USD) in initial bot sales.

https://datadome.co/threat-research/inside-sneaker-bot-busin...

waynesonfire · on Feb 19, 2023

Artificial scarcity in sneakers is their design decision. These shenanigans should have zero impact on browser policy.

20after4 · on Feb 19, 2023

I thought about building something like that for photographers to get gigs from large real-estate photography contractors who sub-contract the work to independent photographers. Automated tools would benefit the photographers greatly. The benefit comes at the expense of those not using automated tools, so the morality of such a tool is at least somewhat questionable.

_jpys · on Feb 19, 2023

"The signals have to be well protected, otherwise bot authors will just read your JS to see what they have to patch next. Signal collection and obfuscation work best when the two are tightly integrated together."

JS sounds like a bad match for this task. I perform similar checks from the backend with http headers and Python.

Is there a compelling reason to stick with JS despite the added complexity of obfuscation?

Edit: My use case is different than yours as it's part of a pid-free analytics application. However, bot detection is still an important component of that product.

mh- · on Feb 19, 2023

If you're only relying on http headers, you're missing all but the most trivial of "bots". There are other things you could do with a backend-only approach but if your code doesn't run where the device connects to (e.g. you're behind a load balancer or other reverse proxy), those are largely unworkable.

_jpys · on Feb 19, 2023

"If you're only relying on http headers, you're missing all but the most trivial of bots"

Very true. Capturing, processing, and storing analytics data long-term is expensive. If I eliminate even 50% of that noise, the savings will be worth it.

I'm attempting to identify the bulk of bots with http headers and real-time session monitoring. I also have an unauthorized list (known bad actors) and an ignore list (search bots, etc.). It works pretty well but definitely doesn't begin address the problem as a whole (from a security perspective).

It's an interesting and complex topic.

20after4 · on Feb 19, 2023

Re: your ad.

This sounds like a solid product / startup idea to me. I worked on spambot detection in a previous job and it's not at all trivial to solve. Though we were specifically interested in detecting the abusive use of bots, not bots in general, so I focused simply on detecting unusual resource consumption rather than fingerprinting.

mike_hearn · on Feb 19, 2023

There are startups doing this sort of thing already, the article is written by the head of research at one. But tech firms often like to have their own in-house stack with the source code.

ospider · on Feb 20, 2023

Yeah, but for non-tech companies, like Nike/Adidas as mentioned in other comments, they will need this kind of bot-detection services.

asdadsdad · on Feb 20, 2023

they already have that too.

https://www.nike.com/lWnPP-y6g/W/y8Puj8lw/p7XikLGhL5uf/ITULb...

https://www.adidas.co.uk/tm65VzrFd0bv0U_g70I302afoiE/L75QhJk...

kbuck · on Feb 19, 2023

What do you mean by a "mesh-oriented obfuscation"? My best guess is: serving a different subset of the VM detection code to each client?

mike_hearn · on Feb 21, 2023

There's lots of techniques that can fall under that heading. The idea is to tie together your logic and obfuscation so that the things you have to do to undo the obfuscation end up breaking access to other parts of the program. Using the output of hash functions as decryption keys is one famous approach but there are others.

ggambetta · on Feb 19, 2023

Heh, I had a feeling you'd show up here. Hi, Mike :)

mike_hearn · on Feb 19, 2023

Long time no see mate :)