Is Google reCAPTCHA GDPR Compliant?

Lornedon · on June 22, 2023

> We’re using “cookies” as a shorthand for any technologies that can access or store information on a person’s device. This can also include beacons, pixels, scripts, and other technologies.

That is a weird use of the word "cookie". In normal usage, it doesn't mean a technology that accesses or stores the information, it is that information itself.

"Pixels" is also weird. I think that they mean tracking pixels, which are one-pixel images that are just there so that the browser has to request them from the server and the server can notice that request. They are a subclass of "Beacons". Calling them a "technology that can access or store information on a person's device" seems misleading. Also, reCaptcha wouldn't need them. They already have Javascript running on my PC, they don't need a tracking pixel to contact the server.

jefftk · on June 22, 2023

I actually thought it was a pretty decent caveat when writing for a non-technical audience. They're listing a bunch of different technologies that are (almost all the time) legally equivalent to cookies in the EU. And in EU privacy writing it's already common to use "cookies" to describe this whole area.

(The main exception to their list is scripts, which are only cookie-like to the extent that they use cookies or other client side storage)

awad · on June 22, 2023

While "pixel" was originally the 1x1 way back in the day, it's a generic term now used by industry to refer to analytics, often in tracking of user behavior and conversion events.

ahoka · on June 22, 2023

Cookies mean exactly that. You are just being overly pedantic.

And yes, running JS in a users browser to store or access information is exactly the same. Taking a browser fingerprint and storing it server side is also the same, just harder to get caught doing. This whole trying to maliciously take these laws literally need to stop.

Lornedon · on June 22, 2023

Yes, I may be overly pendantic. This part just made me suspect that the author didn't have a technological background. That isn't bad per se, as this is mostly a legal topic.

Also, I'm not trying to "maliciously take these laws literally". The law isn't limited to cookies, so you can't get around it by using a narrow definition of the word "cookie".

lucb1e · on June 22, 2023

Nothing in the text makes me think the authors do or do not have a technical background. Everything sounds correct both technically and legally to me (me being a technical person and not a lawyer).

tredre3 · on June 22, 2023

> This part just made me suspect that the author didn't have a technological background.

Or perhaps the author does have a technical background, which is why they attempted to give a clear yet simple explanation for the non-technical?

People hear cookies are bad, they get pestered by cookie banners, so it makes sense to use cookie as an umbrella term. Their definition of tracking matches the law which is the important part.

duffmancd · on June 22, 2023

I've heard "pixels" used generically to refer to the bundle of tracking code from a particular vendor in the marketing department at work. e.g. "Have you enabled the Facebook pixel?" means have you embedded the JavaScript snippet (usually with a fallback 1x1 pixel) that Facebook provides for tracking.

Akronymus · on June 22, 2023

Pixels likely refers to 1x1 tracking images (image fetched from a server, and through that, they can get some data from you, like IP address or visited before)

Lornedon · on June 22, 2023

Sorry, I sent my comment before it was done. It's edited now.

Akronymus · on June 22, 2023

Yeah, I see your edit. With JS you can also detect whether it was fetched before, so that is a way of storing information about whether you have visited something before. Which can then be sent to the server. So, it kinda is storing data, kinda not.

almostnormal · on June 22, 2023

> "Pixels" is also weird.

Could be a unique pixel that does or does not exist in the browser cache (is or isn't fetched (fought?)).

A pixel can store information as long as the cache works as expected.

smeagull · on June 22, 2023

In the end, you just use the cookie to store a unique id, and store everything on the server.

jwr · on June 22, 2023

CAPTCHAs are overused because of groupthink and fashions/fads. Before you use a CAPTCHA of any kind, consider very carefully if you really need one.

I've seen this a number of times in design meetings: someone will say "oh, an account registration form, we will of course need a CAPTCHA there", everyone will nod their heads and move on. In reality, in most of those cases, no one will ever conceivably even try to automate/script the thing being designed.

janpieterz · on June 22, 2023

Thought the same, had a pleasant signup form for a small SaaS platform nobody really knows about, with no captcha. Then someone or some group found it and there's been a barrage of attacks varying in intensity, vectors etc. Cost us so much money in vendor costs the small company is now in danger of going bankrupt.

I appreciate the sentiment, as I had it, but rest assured any future publicly accessible form I build will get at least a CAPTCHA in front of it.

newaccount74 · on June 22, 2023

I have a bunch of publicly accessible forms and none of them have captchas.

I did once run into an issue where a signup form was abused by a spammer, but that was a simple fix (tip: in verification emails, do not include any information that the user typed in the form).

If you are careful with your forms, you don't need captchas. Captchas add a lot of friction for some users, so if they can be avoided, they should be.

mpalmer · on June 22, 2023

Many captchas add friction for some users, but some types don't; there are relatively fast "proof of work" captchas that aren't surfaced to the user at all.

AnthonyMouse · on June 22, 2023

CAPTCHA: Completely Automated Public Turing test to tell Computers and Humans Apart

Proof of work isn't a CAPTCHA.

throwawaymobule · on June 23, 2023

can you explain what you mean by that tip? was this spammer using your verification emails to send spam or something?

or was it more complicated, like not needing to store which fake account had which details?

newaccount74 · on June 23, 2023

The registration form had a name and an email, and I sent a message similar to the following:

Hi <name>, thank you for signing up...

The spammers put their spam message in the name field, so my server started sending messages like this:

Hi Get free cialis now http://example.com, thank you for signing up...

lapser · on June 22, 2023

A long time ago, I was still in college (UK college, i.e., pre-university), and still learning.

I discovered a classmate was involved in some event, and found the event's website. They didn't have a captcha. By your logic, this was the right choice.

In reality, my dumb ass decided it would be fun to script something that would register millions of users (another classmate ran the script with me). After a few hundred thousand registration, the website was brought to its knees. I was a bit shook, but didn't think much of it.

Next morning I come into class, and was reprimanded by my teacher. Turns out, the owner of said event had threatened to sue the school and me, among other things. What had happened was their servers were down, their email server was brought to its knees, their web servers had died, and generally I had caused a lot of damage without even thinking about it. It caused them to potentially lose some money. None of this was my intention, of course, but I didn't know much better.

Point is, kids will kid, and spammers will spam. There are plenty of bots that just scrape the internet and fill out forms indiscriminately.

Captcha may or may not be the best option here (I'm always of the opinion it's not, especially not reCAPTCHA), but something has to be put in place, even if to stop the majority of bad actors.

asddubs · on June 22, 2023

you can also just limit the amount of sign ups from one IP each day. There's more simple heuristics to prevent unsophisticated abuse like that

Quarrel · on June 22, 2023

You can, but then you discover that places like Bangladesh and Cambodia, that do a fair bit of freelance work on the 'net use a surprisingly tiny number of IPv4 addresses to do it.

For lots of these countries their total allocation of IPv4 addresses is < 20 per 1000 people and the nature of their access (through glorified internet cafes) mean that you will have some IP addresses that really are totally legit, yet have LOTS of users.

One size fits all is very dangerous on the Internet.

lucb1e · on June 22, 2023

How is the IPv6 roll-out over there?

One the one hand, I assume bad due to cheap equipment. On the other, it's not like v6 addresses are expensive and you need some way of addressing every subscriber anyway. As more people sign up (as the country gets more people with internet access), you need more equipment which could support v6 out of the box, and the excuse for CGNAT I've always heard is old equipment that is harder to upgrade than to put a NAT router in front of. Could go either way from my POV.

If the roll-out is good, then all those people are already taken care of and the minority left on v4 CGNAT aren't bothered by the collective rate limit.

(To preempt the eventual remark that users can generate a billion addresses in v6: rate limiting on v6 works by limiting whatever prefix the ISP gives out to subscribers, like /56, not individual addresses the way it's often done with v4.)

As an aside, it should also be kept in mind that not every use case involves signing entire countries up for their service, even in an ideal case.

ornitorrincos · on June 22, 2023

To give another example, in spain most mobile carriers will place everyone behind a cgnat with no ipv6.

In fiber some do the same, although thankfuly most place v4 behind a cgnat while offering ipv6.

The whole 1 ip 1 user even if dynamic quite false and is a mess.

lucb1e · on June 22, 2023

That has been my experience on any mobile network, also in 2007 or so when v4 addresses were still available (because my 15-year-old self wanted to seed torrents with my unlimited data bundle ...on GPRS). It's a fair point that one has to consider this part of the market, though I was primarily thinking of wired connections.

Quarrel · on June 22, 2023

It isn't good and purely being on IPv6 is still a terrible web experience in any event. Huge % of major websites don't properly support IPv6 yet. It's ridiculous.

lucb1e · on June 22, 2023

I know, but this is about hosting a service, not about trying to use existing services that got v4 addresses before it was cool

FateOfNations · on June 22, 2023

IPv6 doesn’t address disambiguating people using public computers at places like Internet cafes.

raverbashing · on June 22, 2023

Is your site even relevant to Bangladesh and Cambodia?

If you're collecting sign-up data for something local, then most likely not.

Quarrel · on June 23, 2023

Yes.

FWIW, I learnt about this the hard way.

raverbashing · on June 23, 2023

Yes if it is relevant then for all means make it work for them

CWuestefeld · on June 22, 2023

No way. In the B2B world at least, I expect hundreds of users coming from behind the same corporate proxies.

distcs · on June 22, 2023

> you can also just limit the amount of sign ups from one IP each day

This is a classic example of how "just do this" kind of thinking can lead to terrible results.

Do you now see how "just limiting the sign ups from one IP each day" can go very very wrong?

AnthonyMouse · on June 22, 2023

What you could do is, use both. One sign up from each IP per day before you get a CAPTCHA. Then you're not subjecting 99% of your users to training Google's AI for free but the people at a cafe in Bangladesh can still sign up.

xeromal · on June 22, 2023

This sounds like extra work to solve the problem you said didn't exist.

AnthonyMouse · on June 23, 2023

It's extra work because it's better. You're not subjecting 99% of your users to training Google's AI for free.

hansvm · on June 22, 2023

> limit the amount of sign ups from one IP each day

one per library per day...

wongarsu · on June 22, 2023

one per coworking space, one per office location for each company

chollida1 · on June 22, 2023

Life as a developer has taught me to take the other side of your argument. I'd disagree on this.

Once you release something to the wild you need to have robust controls in place to prevent one person or group of people from using all your resources.

I wouldn't release a product that doesn't have rate limiting of some kind, of which a captcha is one way to rate limit.

Always trust people to push the boundaries of your app as far as they possibly can. I have yet to build a system where someone doesn't. And that includes tools I've built for inhouse users:(

Whether intentionally or not, they always find a way to push the boundaries:)

danuker · on June 22, 2023

> no one will ever conceivably even try to automate/script the thing being designed.

Spammers will spam everywhere they can. My minuscule personal site suffers from it very rarely, but I can imagine anyone getting a lot of page views making it worth it.

yonixw · on June 22, 2023

On my custom built site I have none of those. But, on my WordPress site, I had to install captcha the second days. Spammers are just using scripts, which cost next to nothing...

eli · on June 22, 2023

If a site gets even a little popular it attracts spam and security scanners and other nonsense no matter how it's built.

heipei · on June 22, 2023

I don't know. I run a SaaS that allows free user signup and significantly more than 50% of my daily signups are just signup "spam", without any visible motivation for doing so. The user name or information doesn't show up anywhere publicly and there is no inherent value in having a free user account. I've implemented some basic countermeasures (dummy form fields which reject the submission) which wasn't enough. I've added reCaptcha, and I'm still getting 50% spam signups from working (!) gmail addresses, meaning someone is able to receive emails on these. The majority of these are from places like India, Bangladesh, Vietnam, etc.

I don't event want to know what my site would look like without my own countermeasures + reCaptcha + if it was a service where a user account has any kind of "value"...

daveoc64 · on June 22, 2023

Is there a particular problem if someone signs up for an account on your system and doesn't use it?

Is such an account using a lot of resources?

philote · on June 22, 2023

Blocking others from signing up or using the username they'd prefer. For example, the bad actor could spam the registration with valid email addresses and depending on how your system handles registration it could either send validation emails to those addresses or block the person that owns the address from signing up.

heipei · on June 22, 2023

It doesn't use up too many resources, but I still don't like having more than 50% of the user database be essentially spam. If you ever want to sell your company you want to have somewhat accurate numbers of actual user registrations. Ever since I realised the extent of the issue I have become very doubtful about reported user counts from startups. In our case the only reason we've realised this is spam is because we annotate signups with GeoIP and also allow users to fill in a name and title which will then be something like "find escorts in Chennai" and the signup will be from India etc. If you only look at the email addresses all you'll see are many gmail addresses with western names, so you might be fooled into thinking that all of these are legitimate users.

iudqnolq · on June 23, 2023

Is it possible that the emails are real, non-spammers and the spammers are abusing your registration form to send a short message entered in the name field to those emails like this person described? https://news.ycombinator.com/item?id=36446532

SiempreViernes · on June 22, 2023

Well, you can't use raw signup numbers as a tracker of how well you are doing?

vincnetas · on June 22, 2023

CAPTCHA on registration page removed quite a bit of automated registrations. What are other options to prevent/reduce automated registrations? (one from top of my head email/phone verifications)

realusername · on June 22, 2023

hidden fields will remove most of the non targeted attacks.

And if they really are targeted, I don't think CAPTCHA will help much.

revicon · on June 22, 2023

Google reCAPTCHA v3 has worked pretty well for us, we saw many instances in our logs where v2 was solved by scammers on our site using some automated tools that had a 25% success rate (plenty for automated scammer scripts) but upgrading to v3 stopped all the automated attacks. So far we haven’t seen successful solves of v3 in the wild and we’re a payments company so we see a lot of attempts.

giancarlostoro · on June 22, 2023

We use Auth0 which determines when to show a captcha, I think "Smarter Captcha" should be the industry standard. If you don't suspect the end-user being a bad actor, why show them a captcha every time. In fact, Google's Captcha is awful for literally almost always showing it, tells you they dont care about stopping bots, only the data they get from user inputs.

Edit: And come to think of it, A TON of websites do "smarter captcha" or whatever you want to call it, because in one of my computer I enabled the resist fingerprinting setting on Firefox, and I get a captcha every visit on some sites that NEVER show a captcha (I think it might be cloudflare driven, but unsure). Like Walmart comes to mind, it shows me a pill looking thing where I have to hold the mouse click until it fills.

a_c · on June 22, 2023

It took me one incidence to turn from "no one will ever conceivably even try to..." to "everyone will nod their heads and move on"

staringback · on June 22, 2023

Years ago I had a blood test taken at a local pathology place, the form they were submitting had a CAPTCHA and pictures they were given weren't easy by any means. I'm talking the kind of stuff you get trying to go to google.com on Tor browser.

As far as I could tell this was an internal form that wasn't publicly accessible

mschuster91 · on June 22, 2023

> In reality, in most of those cases, no one will ever conceivably even try to automate/script the thing being designed.

There are more than enough people running automated crawlers, probably fed from Google "inurl: contact-form" searches or whatever, and just blanket spam you.

efields · on June 22, 2023

We ignored them until we needed them. Then we needed them.

V__ · on June 22, 2023

This is in line with my experience as well. For most sites, CAPTCHAs are overkill and an accessibility problem. Hidden honeypot, maybe a simple “How much is 5 + 2” keeps 99% of spam out. I had a few more difficult cases, which were solved by blocking some geographic IP regions and adding blacklists for certain words, like “crypto” for example.

frodowtf · on June 22, 2023

I'm not an expert on honeypot inputs but wouldn't it be super easy to check for type=hidden or opacity=0 if you'd like to spam?

V__ · on June 22, 2023

Yes, but most bots don't. There are also some more elaborate methods. Giving the input a tabindex="-1", aria-hidden="true" and then moving it left: -100vh works pretty well.

revicon · on June 22, 2023

Remember that Google reCAPTCHA v3 is invisible to the user. No accessibility issues.

Nullabillity · on June 22, 2023

It's invisible until it isn't.

revicon · on June 29, 2023

Hows that?

Pxtl · on June 22, 2023

If you writing your own account registration form instead of using something off-the-shelf that provides captcha service for you, or even better are just using an oAuth or similar technology so users don't have to manage yet-another-password? I already hate you.

smeagull · on June 22, 2023

Spam is ever present, and Captchas protect from the massive torrent of trash.

Zardoz84 · on June 23, 2023

i had to put a CAPTCHA system on a public register form for digital libraries, because they were getting spammed by bots.

TekMol · on June 22, 2023

The mistake here in Europe is that an ip address is considered personal information:

https://www.ra-plutte.de/lg-muenchen-dynamische-einbindung-g...

This makes it impossible to use any components hosted by third parties without getting consent by the users. And for components hosted in a different country than the visitor, even consent might not make using those external components legal.

This is bad and not in line with reality. The IP can only be turned into personal information via cooperation of the users internet provider.

So in Europe, the whole internet is made illegal based on a wrong assumption.

mk89 · on June 22, 2023

As a European, I don't consider this a mistake, for the simple reason that the IP address is so easily abused by trackers and people with bad intentions - the extent of abuse that we have experienced until now is absolutely ridiculous.

Hell, even a small startup with a few thousand euros can start to track and trace user behaviour on a massive scale that in reality you wouldn't or shouldn't be able to do.

The tooling (free, cheap and not) at our disposal nowadays makes everything so easy that even something that in theory should not serve as identifier can be used to identify you - so let's start with the most common ones: IP, email, etc.

The Internet in Europe is not illegal - it's just pure BS that a simple page like reuters.com contains references to 14 external scripts when loaded, when actually all you need is 2 maybe 3 scripts (the CDN to load images and videos + the page itself) - the rest is crap used explicitly to identify and market people - that's it: Ads, Ads and probably uglier things to do just to profile people online.

jansan · on June 22, 2023

As another European I consider this a grave mistake and I am not surprised that we do not see many successful startups in Europe. It is part of the narrow-minded micro managing mindset that too many European politicians (especially the greens) have.

Instead of finding an innovative solution (how about mandating ISPs to make IP addresses unmappable to a user?) they only know one solution: Making things illegal, even if in almost all cases the use is benign or even makes a lot of sense.

mk89 · on June 22, 2023

> It is part of the narrow-minded micro managing mindset that too many European politicians (especially the greens) have.

Ironically, even in the USA with the CCPA they are trying to regulate this somehow. This is just not something you can leave untouched. Without even mentioning other non-EU countries (China/Russia/India, etc.) that have even stricter data regulations.

> Making things illegal, even if in almost all cases the use is benign or even makes a lot of sense.

What is there benign about sending your IP address to 10-15 external services that have nothing to do with providing you the service in the first place? Try to install the NoScript extension and you'll see that 90% of the websites only require the main domain to be unblocked to be used. A few more websites require allowing 1-2 additional more scripts (CDN or ... payment systems/redirects, etc..). The rest is just your data being treated like toilet paper for marketing purposes. And I wouldn't even mind if they stopped there (marketing website A buys the data, and they target you with some crappy ads you won't even click on): no, they sell, and the buyer resells, etc.

Is this really what makes a country innovative? Advertisements? I thought that innovation was self-driving cars, electro cars, new ways to fight diseases, digitalizing bureaucracy (which can be done without giving away your personal data to random online companies), etc.

jansan · on June 22, 2023

No, advetisements do not make a country innovative, and I never said that, and I have no idea why you would even bring this up.

But if you are a small company in Europe that for example is producing a better and lighter cardboard box, you will have to hire a data privace specialist because you may get sued by some random idiot if you accidentially link to Google Fonts as a convenient way to display fonts, this certainly binds a lot of valuable mindshare and resources and therefore is bad for innovation. You cannot believe how much startups here have to think about GDPR even if they do not collect any data at all. Read the other answers in this thread. There are others who say that linking to Google Fonts is equal to selling your customers' data to Google. What a joke!

Tryk · on June 22, 2023

Of course we should just give up and give all our data to anyone who wants it, lest we burden the developers in any way.

jacquesm · on June 22, 2023

If you don't understand why including third party resources for developer convenience is a bad idea then you probably shouldn't be building commercial websites.

Of course startups here - and everywhere else! - have to think about applicable law. Just like if you want to start a transportation company you'll have to think about applicable law. If a company does not collect any data at all then the GDPR is a complete no-brainer.

Try this and see if you can make it work for you: to identify an individual you need ~33 bits. An IP address, even though it doesn't quite get you there gives you a very large fraction of the number of bits required. Adding a few more makes the individual identified. So therefore it is a very good idea not to allow access to the IP address to unrelated parties from a privacy point of view.

jansan · on June 22, 2023

That ad hominem attack was a bit uncalled for, especially since tablomonto.nl is sending their users' IP addressed to Google by using Google Fonts, too.

jacquesm · on June 22, 2023

Ah good old 'e tu, Brutus'. But you know what: I don't have any influence on how tablomonto.nl is implemented. But if you check out jacquesmattheij.com you'll get a much better idea of my attitude towards the privacy of the visitors to my website(s).

hnfong · on June 23, 2023

You probably should send an unsolicited email to your colleague with this then:

<quote> If you don't understand why including third party resources for developer convenience is a bad idea then you probably shouldn't be building commercial websites.

Tryk · on June 22, 2023

Well said. I'd recommend Surveillance Capitalism by Shoshana Zuboff for an in-depth study of this unprecedented and spectacular surveillance that is becoming so commonplace.

makeitdouble · on June 22, 2023

> how about mandating ISPs to make IP addresses unmappable to a user?

I can't see how this would work in practice: you'd still have the same IP when visiting site A where your real identity is known, and site B where you didn't give them consent to access your information. The only reasonable course of action is to classify your IP as an identifying info, as site A and B exchanging info would result in your exposure (or site A and B being the same site, but you logged as different users...)

Perhaps your point was on giving user different IPs for every site, potentially for every request, but then the consequences would have to be handled worldwide.

jansan · on June 22, 2023

Well, let's just take a look at the most benign case where a site owner was sued and convicted by simply using Google Fonts without any bad intentions and without selling or transfering any other data to Google. To Google your IP address will be worthless unless they get any link to your identity, so it solved those really simple cases where site owners accidentially violate the GDPR without any bad intentions and have to fear being sues by some random idiot in court.

martin_a · on June 22, 2023

> To Google your IP address will be worthless unless they get any link to your identity

Not true. Google is used so widely that they can track and profile me all over the internet, because they can easily combine on which page I've requested fonts from them. And once I log into YouTube, they can link all those information about medical conditions I googled directly to me personally.

makeitdouble · on June 22, 2023

I see it from two angles:

- on the benign intent and accidental violation: is it natural to go load fonts from a third party service in the first place ? I get why we arrived at this point, but hosting yourself the base resources you're using for your site shouldn't feel like some huge burden or leap from the norm.

If we really see a shared benefit to having fonts on some common platform, I'd also wish it wasn't Google. Perhaps Cloudflare or Fastly ?

- on IP being worthless to Google: is the "unless they get any link to your identity" even hypothetical ? I am a paying Google customer and extensively use their service. And we all do to some extent; trying the "let me live my daily life but block and avoid anything that touches Google" game would still be as critically punishing today as it was 4 or 5 years ago.

smeagull · on June 22, 2023

IP isn't personal, isn't unique, and isn't identifying.

theshrike79 · on June 23, 2023

The point of the law is that anything that can be easily (technically easily, not contractually or legally) combined with another database to make the first bit of data PII - then it's PII.

An IP address can be trivially combined with the ISPs data and provide the exact user of an IP during any point in time.

smeagull · on July 6, 2023

So anything is PII.

> An IP address can be trivially combined with the ISPs data and provide the exact user of an IP during any point in time.

Not even then.

dutchbrit · on June 22, 2023

> This is bad, because the IP can only be turned into personal information via cooperation of the users internet provider.

Not 100% true, you can often trace back users by IP using leaked databases and through companies that sell user data. Might not be legal, but you definitely don't need cooperation from a ISP.

ericpauley · on June 22, 2023

In that case the database is the PII, not the IP.

michaelt · on June 22, 2023

If Person A has a document saying that Joe Biden lives at 1600 Pennsylvania Avenue

and Person B has a document saying an 80 year old male living at 1600 Pennsylvania Avenue has Chlamydia

do you think only Person A holds private information?

ericpauley · on June 22, 2023

Users use the same passwords everywhere. By cross-referencing user passwords through excessive brute force you could find accounts on other sites that link to a user’s personal data.

Is the password personal data?

You have to draw the line of correlation difficultly somewhere.

michaelt · on June 22, 2023

I would expect a company to guard customers' passwords every bit as carefully as they guard customers' e-mail addresses, and probably moreso, yes.

jdietrich · on June 22, 2023

The term "personally identifiable information" does not occur anywhere within the text of GDPR. GDPR regulates the use of personal data, which is conceptually much broader than PII. Any data that relates to a natural living person is potentially within the scope of GDPR, including data that is insufficient in isolation to identify a natural living person. For example, pseudonymised data from an employee database or medical records may still constitute personal data if it would be possible to reconstruct the identity of that individual by inference, even if all direct identifiers have been removed.

https://commission.europa.eu/law/law-topic/data-protection/r...

cccbbbaaa · on June 22, 2023

So someone can be identified, directly or indirectly, with an IP address, making it personal data under GDPR, art. 4(1).

joycian · on June 22, 2023

What does "indirectly" include legally?

cccbbbaaa · on June 22, 2023

Any piece of information that can be related to someone using supplementary information. Eg. My personnal email address contains my name, so I can be identified direclty; my phone number doesn’t, but my operator and contacts knows who is behind it, so I can be identified indirectly.

themitigating · on June 22, 2023

If the IP is dynamic then how would you know who had it at the time?

agos · on June 22, 2023

you make a join between the "user session" and the "user profile" tables

jacquesm · on June 22, 2023

Because ISPs are keeping lease logs.

themitigating · on June 22, 2023

How would you get that information?

jacquesm · on June 22, 2023

Court order, security incident, insider access.

diffeomorphism · on June 22, 2023

Easily

zpeti · on June 22, 2023

You might be able to trace an IP back to a user, but you're absolutely not guaranteed that that IP was only used by that particular user.

Therefore even on a technical level this EU legal interpretation is insane, hundreds or thousands of people can potentially use the same IP address, how is that personal information then?

prepend · on June 22, 2023

I don’t think the goal is to find a single IP for a crime or something. In that case, law enforcement just subpoenas and they’ll sort out the thousands using the IP to the actual person (possible with ISP participation).

I don’t think it’s insane because the more common use is just to pattern match to identify the individuals. While lots of people may share, many do not (eg, everyone in my home shared an external IP but that is frequently just one person).

Google can use this IP and browser traffic to separate out individuals (eg, I don’t watch YouTube and my kid never checks vanguard) to the level they convince advertisers that they know the individual. I expect this is why my kindergartner sees ads for car insurance.

msm_ · on June 22, 2023

> Therefore even on a technical level this EU legal interpretation is insane, hundreds or thousands of people can potentially use the same IP address, how is that personal information then?

ISP knows exactly who used IP at that point in time (they are even obliged by law to log them). Therefore IP from which a request was sent can be used to uniquely identify the device, with 100% certainty. Therefore it is (usually) a personal information.

mcpackieh · on June 22, 2023

> This makes it impossible to use any components hosted by third parties without getting consent by the users.

Sounds good to me, that's the way it should be. I shouldn't have to use third party extensions to stop my browser from automatically loading facebook crap every time I visit websites that aren't facebook. Companies should only include 3rd party components in their websites if there is a very good reason for it, and only then after the user has explicitly consented to it.

prepend · on June 22, 2023

> The IP can only be turned into personal information via cooperation of the users internet provider.

It’s not a direct identifier but with geo-ip or other data, it can identify an individual (eg, have 100 possibilities and geoip narrows it down to only 1 in that region based on IP).

The PII aspect isn’t based on getting a link from the provider. The PII aspect is based on the IP itself standing out in data and allowing reidentification. It’s not 100% accurate, but accurate enough to make money off advertising.

tremon · on June 22, 2023

It’s not a direct identifier

Except when it is. I have a semi-permanent home IP (it only changes when the MAC address on my router changes and I get assigned a new lease) and only one user in my home. My IP address pretty uniquely identifies me.

tredre3 · on June 22, 2023

Most cable modem users in north america are in the same situation. And small shops usually have a business package with a static IP too.

I don't understand the point keyboard warriors who insist IP doesn't identify a person and IP isn't PII are trying to make.

Are they just being pedantic? Because yes, often an IP is shared. Yes, obviously the legal system shouldn't assume a IP equals a person. That would be very problematic.

But when it comes to mere tracking, an IP can absolutely identify directly a person. So why not just treat it as such all the time? What's the downside of considering IP to be PII?

smeagull · on June 22, 2023

They're not stable in time either. And they can be misleading if you try to use them to geolocate a user. The ARIN for my IPs makes me appear 500km away.

RobotToaster · on June 22, 2023

>This makes it impossible to use any components hosted by third parties without getting consent by the users.

Good.

There's very few legitimate uses for third party hosted proprietary components.

Why it became standard to load simple things like scripts or fonts from third parties, that can be trivially hosted locally, is beyond me.

jansan · on June 22, 2023

So hosting an ad is now illegitimate? How about embedding a video? And why should you not embed stuff from a CDN?

mcpackieh · on June 22, 2023

Ads (particularly ads served by third parties) have never been legitimate, that's why most users who know what they're about use an adblocker to stop them. Videos should only be embedded once the user consents to that (which could be implemented using a first-party button that the user clicks to affirm their consent and which then loads the third party video component.

smeagull · on June 22, 2023

> Ads (particularly ads served by third parties) have never been legitimate

That's a very extreme take. They're how many services actually exist.

Also, consent to an embedded video? My brother in Christ, your browser requested the video.

account42 · on June 23, 2023

> Also, consent to an embedded video? My brother in Christ, your browser requested the video.

This kind of attitude is why I block JS and cross-origin requests by default. Unfortunately this is not a reasonable option for the less technically inclined who deserve protection too.

mcpackieh · on June 23, 2023

It's not an extreme take, it's the way most people with a technical clue feel, evidenced by widespread use of adblockers by anybody who actually knows how to use their computer.

martin_a · on June 22, 2023

Videos can easily be embedded and loaded from local resources, too. No need to shove the whole of YouTube down somebody's throat, if a simple <video>-element with an mp4 file does the same. You want the comfort and features of YouTube? Ok, then let the user decide if they want that too, ask for their consent.

jansan · on June 22, 2023

Have you ever heard of the concept of copyright?

martin_a · on June 22, 2023

Taking (kind of) care of that is part of the featureset and "comfort" of YouTube, yes. Nevertheless, a page can work wonderfully without that.

pph · on June 22, 2023

It is not a problem to load a static (not changing based on user data) ad from your own server. Using a CDN is likely legitimate interest when it is not using private data for other reasons than serving the content.

(I'm not a lawyer.)

jansan · on June 22, 2023

THat would require small companies having to implement their own Ad business. Sorry, your idea is cute, but completely unrealistic.

jacquesm · on June 22, 2023

> 'So in Europe, the whole internet is made illegal based on a wrong assumption.'

Is based on a wrong assumption.

Semaphor · on June 22, 2023

It should be mentioned that we don’t have case law the same way the US does. So one court deciding this does not mean that this is now law.

jdsnape · on June 22, 2023

That is not quite completely true - In many cases you can associate it to a user based on their activity too. For example, they might logon which would link the IP to an identity.

janosett · on June 22, 2023

Yes, but the IP itself is not personal without the connection to other information. I do think considering IP address personal is a bit of a reach, especially given the common case of ephemeral addresses.

jeroenhd · on June 22, 2023

All the residential IPv4 addresses have been the same for years, until I switched ISP or moved. Ever since I've lived alone that IP address 100% maps back to me as a person, and I'm not the only person in Europe living on my own. Pretending this situation doesn't exist and forms a privacy risk would be madness.

There's nothing wrong with receiving IP addresses on your website, though. You can log IP addresses and use them for detecting fraud and other kinds of abuse without requiring consent. Third parties can do the same, as long as they follow the law and as long as you clearly document what information you're sharing/making users share in your privacy policy.

You can't use personal information for tracking and ad purposes without consent, though, and you can't partner up with other companies that do it for you. It doesn't matter if you're tracking IP addresses, cookies, passive fingerprints, or some kind of supercookie; you need a legitimate reason or explicit consent to process that kind of information.

BaseballPhysics · on June 22, 2023

> Yes, but the IP itself is not personal without the connection to other information.

By this argument, isn't the same true of a physical home address?

> I do think considering IP address personal is a bit of a reach, especially given the common case of ephemeral addresses.

Except that isn't strictly "the common case". DHCP leases are often for long-ish periods of time on fixed line broadband services. The IP for my home router, for example, has been the same for weeks or months at a time.

daveoc64 · on June 22, 2023

The concept of being able to identify individuals by combining data sources is a key part of GDPR.

If you can look up a user account by IP address, then the IP address is personal data.

jdietrich · on June 22, 2023

>This makes it impossible to use any components hosted by third parties without getting consent by the users.

There are six lawful grounds for processing personal data under GDPR; only one of those grounds is consent. Consent is not always necessary, nor is it always sufficient.

An IP address is potentially personal data, because it could relate to a natural living person. There are all sorts of legitimate reasons to use that data without consent, the most obvious being to fulfil a request by the user. You will run into issues if you're using that data in ways that aren't strictly necessary - keeping logs indefinitely, using that data for marketing purposes, sharing that data with third parties without good reason and without adequate safeguards etc.

https://gdpr-info.eu/art-6-gdpr/

olivierduval · on June 22, 2023

Yeah... it could be nice if people stop spreading FUD on GDPR ;-)

All GDPR is asking mostly is: you only gather minimal PII to provide a service (if needed at all). If you use PII for another purpose than providing the service or meeting operational purposes (like fraud detection or monitoring your infra), then you must obtain the consent of the user (for marketing or selling your users data for example). This extends to your providers too (like Google Analytics...)

The problem is that a lot of "internet services" take for granted that they can do whatever they want with the data they got for a specific purpose... even without informing the user! And that's not good... so GDPR has been created.

But if your service is "fair" to the user (meaning: you only use the datas to provide the service), then there's no problem...

jansan · on June 22, 2023

Apparently a German court ruled that embedding fonts from Google Fonts is not "fair" to the user. You either have to ignore parts of the GDPR or you accept to not be able to do many useful things.

usr1106 · on June 22, 2023

You can do useful things without selling user data to Google.

Yes it's selling: You provied the data, Google gives you hosted services back. If you think embedded fonts are useful for you, you can host them. Then you pay the bill and not your users.

j16sdiz · on June 22, 2023

> (like fraud detection or monitoring your infra)

reCAPTCHA, in this case can be consider as "fraud detection".

Can we do it without PII? Yes. Maybe. With lots of effort and less optimal result.

Does this fit the "necessary" provision of GDPR? This depends on which court you ask.

reCAPTCHA does not add "direct" value to the user, it even cost some harass... but it is a life saver if your service is big spam target.

account42 · on June 23, 2023

Like any US-based service (or one controlled by a US company), reCAPTCHA is subject to US laws which mean they CANNOT guarantee that your PII is only used for fraud detection purposes because the govermnet can order them at any time to hand over that data while also ordering them to keep silent about that.

TekMol · on June 22, 2023

Unfortunately, a judge can always say that using an external resource was not necessary to fulfil a user request.

As they did in the judgement I linked to.

A judge can always claim you could have used a local version of whatever external resource you used.

jdietrich · on June 22, 2023

A judge can say anything they like, but that doesn't mean it'll be upheld at appeal.

If you can use a local version of an external resource, you should. Minimisation is an integral principle of GDPR. There are plenty of circumstances where it would be impractical or impossible to perform a necessary function without sharing data with third parties, but that still needs to be done with appropriate thought and care.

I would have a great deal of confidence in embedding a resource provided by Stripe; I would have absolutely no confidence in embedding a resource provided by Google.

josefx · on June 22, 2023

And if that is the first GDPR related offense you get away with a slap on the wrist.

olivierduval · on June 22, 2023

Of course... or he can say that you could use another resources that is GDPR compliant and/or privacy preserving.

YOU provide the service so YOU are responsible for the way your user's data are processed and must ensure that it's processed according to the rules (requiring consent when necessary)

yonixw · on June 22, 2023

I was wondering about it too, but I guess that for some customers in rural places (everywhere big like USA an EU) IP address is as good as home address. Combine it with some providers that will not change your IP until manually requested (and not until router restart) and you have a real PII on your hand.

kalleboo · on June 22, 2023

I used to live somewhere where the reverse-DNS for my IP was literally my home address (student housing network)

that_guy_iain · on June 22, 2023

> This makes it impossible to use any components hosted by third parties without getting consent by the users. And for components hosted in a different country than the visitor, even consent might not make using those external components legal.

This is based on some very faulty knowledge of GDPR and the law.

You are allowed to process data without the consent of the user for various things. This would include their IP address. You're allowed to have third party data processors process data on your behalf without user consent for various things.

The Google Font ruling was partially due to who Google is. Google data mines, they're famous for it. So giving Google data they can use to map to your internet persona which may even be linked to your name directly is obviously something many people want to do only when they consent. The fact Google Fonts could be self hosted was another part of the reason for the ruling. That is, sharing the information wasn't required to be able to perform what the actions they wanted to perform, use a font.

Data processing done by US companies is not currently GDPR compliant. However, no one is enforcing that. It would be a complete mess and there are far too many. In reality, everyone is ignoring it waiting for the new laws to be created to make it legal. The reason the US companies are an issue is a US court can issue a judgement to a US company and they are forced to comply no matter where the data is.

> This is bad and not in line with reality. The IP can only be turned into personal information via cooperation of the users internet provider.

This is also not true. If you visit a website that sells B2B accounting software and your IP is identifiable to a company. You could phone up the company and ask to talk to the person who is responsible for finance. If there is only one person, boom easily identifed. There are also various other ways.

> So in Europe, the whole internet is made illegal based on a wrong assumption.

Really, your comment is wrong based on multiple wrong assumptions.

red_trumpet · on June 22, 2023

> So in Europe, the whole internet is made illegal based on a wrong assumption.

No, just websites which include components hosted by third parties. This is not the whole internet (e.g. HN doesn't include third party components).

TekMol · on June 22, 2023

Use the search function on the bottom of HN. It is provided by a different company.

HN is also hosted by a different company. They get your IP too.

LelouBil · on June 22, 2023

> Use the search function on the bottom of HN. It is provided by a different company.

HN doesn't communicate my IP to this service.

> HN is also hosted by a different company. They get your IP too.

If they store my IP, they should treat it as PII to be compliant

whstl · on June 22, 2023

What they do with the IP matters. Using your IP address to serve the website, or even to do fraud/DDOS prevention, is perfectly fine.

zyx321 · on June 22, 2023

Yes, and companies doing business in the EU may only use GDPR-compliant hosting.

All European providers as well as major international provider like AWS [1] have compliance statements. The EU companies are more likely to take it seriously, e.g. Hetzner only logs the first three segments of your IP (i.e 24bit IPv4 or 48bit IPv6). Every other EU-based provider I've used at least offers it as an option. If you run your own server, there are nginx and apache modules [2] that anonymize your logs.

It's real. We're 100% serious.

[1] https://d1.awsstatic.com/legal/aws-gdpr/AWS_GDPR_DPA.pdf [2] https://www.supertechcrew.com/anonymizing-logs-nginx-apache/

sfg · on June 22, 2023

If there was a proxy service that acted as an ip mask, and there was a list of the ip addresses of such masking proxies, then could EU customers using such services solve the issue?

cccbbbaaa · on June 22, 2023

Yes, this is what the Cnil suggests for people that want to use Google Analytics.

jansan · on June 22, 2023

Yes, but making stuff illegal is much easier than being inventive.

y7 · on June 22, 2023

Indeed, the GDPR defines "personal information" as (Article 4 sub 1)

> ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

This is not out of step with reality nor a wrong assumption, it is simply a definition. It is motivated somewhat in the considerations of the GDPR.

> (26) The principles of data protection should apply to any information concerning an identified or identifiable natural person. Personal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person. To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments. The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.

> (30) Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.

martin_a · on June 22, 2023

> This makes it impossible to use any components hosted by third parties without getting consent by the users.

That is simply not true, please do not spread misinformation like this.

Using/Embedding third-party resources is allowed IF it is e.g. technically necessary to provide the service or core functionality at all.

Collecting personal information and using a third-party service to do so in a shop checkout? That's okay.

Collecting personal information and shoving everything into Google Analytics because you want to know how many people visited your site? Not so okay, there are less intrusive ways to do that.

pabe · on June 22, 2023

Hahaha, so EU service providers can't protect their website using Google reCAPTCHA because it'd likely need user's consent and they're not allowed to restrict their offering in case a user doesn't consent. So, bots simply deny consent and send the form without captcha :'-D

Did I get that right?

That's nuts.

spiderfarmer · on June 22, 2023

Lots of alternatives who claim to be GDPR compliant though.

littlestymaar · on June 22, 2023

> and they're not allowed to restrict their offering in case a user doesn't consent.

French newspapers don't seem to agree with your interpretation of GDPR, as you are almost always facing a “consent or pay”-wall…

siilats · on June 22, 2023

I would say that you need to call a phone number to verify you as a user if you don't consent to captcha. And then that number is a voicemail leave your number we will call you back. And then we have a backlog so it takes us a while to call back. I don't think that's illegal.

frozenlettuce · on June 22, 2023

My favorite thing is going into a non-us government site and having to solve a recaptcha with a US-centric thing: "Mark all the school buses", "Mark all the stop signs"

xeromal · on June 22, 2023

I'd love if I started getting captachs for some weird local food like "Select all the pao de quechos" or something like that. lol

throwawaymobule · on June 23, 2023

like most Google things, recaptcha is localised based on your location, so I've had to deal with French, German, and occasional Spanish instructions while travelling before. with no easy way to change it. (adding &hl=en to the captcha URL fixes it, but good luck with that)

Hitting refresh until I got one with an example picture was pretty common until I learned some vocab.

I should really make a browser extension for this.

sylware · on June 22, 2023

It is a grey/not well known area: as far as I know it is not working with noscript/basic (x)html browsers, which is paramount for past/present/future interoperability between big tech and small tech.

londons_explore · on June 22, 2023

Note that reCAPTCHA's use of hashing and encryption might mean that the private data this article refers to, such as device type, is not actually sent to google.

A quick check with the network inspector doesn't obviously show this data being sent.

And obviously to punish a company, the EU would need to prove this data is sent - a hard thing to do when the code of recaptcha is deliberately designed to prevent reverse engineering and analysis.

jeroenhd · on June 22, 2023

My French isn't exactly great but based on the machine translation of the linked court case, I believe the problem with reCAPTCHA is that Google uses it for more than just authenticating users.

The generated fingerprint for these scripts is personal data, it pretty much directly refers to you as a person, that's the intention of the system. That's not necessarily a problem, though. These types of detection algorithms are perfectly allowed without explicit consent, just like other types of fraud and abuse detection.

whiplash451 · on June 22, 2023

The real question is why would a service like recaptcha need the ip of the user to operate?

jonas-w · on June 22, 2023

IP reputation/blocking. Often datacenters have a bad IP reputation and if you access a site with a captcha from a datacenter IP, you will often get more/harder/slower captchas to solve. Some IPs are completely restricted from accessing sites which have captchas, if they are known to be used for spamming.

butlerm · on June 22, 2023

Embedded references to external javascript libraries, fonts, images, and so on on a web page are resolved by the browser by making a connection to the service that hosts the resource, and connections over the Internet necessarily involve two IP addresses of some sort, the IP address of the requester (or a requester) and the IP address of the responder. Every web browser works that way by default.

There are two basic choices to avoid that - in some cases you can just host the resources yourself so the client browser does not connect to hosts operated by others, or you can operate a proxy so that the client browsers requests are relayed and anonymized before going to a third party.

nness · on June 22, 2023

The article title is "Is Google reCAPTCHA GDPR Compliant?"

And its a good point — broad data collection has always attracted the mire of European regulators, and in the decision they state that they find that reCaptcha serves as both a security and analytics tool (due to its broad data capture.) I can't argue with that definition.

The solution, for Google, is to only conduct telemetry after the user has authorised that telemetry, allowing reCaptcha to function without the data collection consent. They already have such functionality in Google Analytics, but arguably, might be less valuable for Google without that data.

For the businesses using reCaptcha, its a problem. The article makes a fair point that you can't use the service if the user declines consent. But it is a reminder that any business operating in the EU at this scale must incorporate a data privacy specialist into their requirements gathering and review processes. It's just the price of the ticket to play in the EU.

miohtama · on June 22, 2023

You can argue the data collection is legit and does not need the user consent, because it is needed in order to perform the core function to separate bots from humans. Thus, no special consent is needed.

The different question is that if Google uses this data for purposes it is not intended. In this case the service might be still GDPR compliant from the website implementor point of view, but Google would be doing fraud by breaking their Terms of Service how the data is handled.

croes · on June 22, 2023

If a service doesn't work without respecting user rights, the service can't be used.

What's next? Capture a picture of your webcam to check if a real person is sitting in front of the PC?

mozman · on June 22, 2023

This is already happening, with some remote working platforms requiring the camera on during working hours

Twitter also requires the employee to be in a dedicated room with a door that closes. At least they used to.

manmal · on June 22, 2023

Some KYC processes in the EU (and likely elsewhere) do involve a webcam interview with a 3rd party who checks the customer‘s face and passport.

krzyk · on June 22, 2023

The question is also, why separate humans from bots? It makes creating useful scripts harder, doesn't it fall into "fair use" case which in EU is enforced even in software where you can modify it to be able to run it on your platform.

dageshi · on June 22, 2023

To prevent or at least cut down on spam? Pretty much the entire reason captcha systems were created in the first place and an entirely legitimate reason?

fkyoureadthedoc · on June 22, 2023

In addition to spam, preventing bots from buying up all the GPUs and PS5s in the recent past would have been nice.

pph · on June 22, 2023

Though in the end you're the one clicking busses and bikes and bridges and busses again and when you're through that charade, everything is gone anyway because the bots were faster/better at that..

miohtama · on June 22, 2023

It is up to the website owner to decide how they want to distribute the content, not the reader. What you see as fair is often not fair from the website owner point of view.

If you disagree with this you are always free to create a competing business without such captcha limitations for bots, and put your money where your mouth is.

mozman · on June 22, 2023

Think about why the bots exist - it's almost always data theft or other schemes to abuse your API to profit. Taking money from your pocket.

rypskar · on June 22, 2023

>>because it is needed in order to perform the core function to separate bots from humans

The core function for most sites using recaptcha is not to separate bots from humans, so a consent is needed before sending data to a 3rd party not related to the core functions of the site or app

cassianoleal · on June 22, 2023

> You can argue the data collection is legit and does not need the user consent, because it is needed in order to perform the core function to separate bots from humans.

You would have a hard time arguing that. The core function to separate bots from humans is done by requiring the user to identify certain images. The only data that needs to be "collected" (and even that doesn't need to be kept) is whether they clicked the correct squares.

jefftk · on June 22, 2023

That is not most of how it identifies bot traffic. The order you click them, how quickly you click, and where within each square you click for all ways humans and computers can be different.

But the real work is not about your interaction at all: this is how the invisible version can perform almost as well as the one where you click things. This involves comparing information about your computer's JavaScript environment with what they have seen elsewhere, and if you are running a bot farm it's pretty hard to keep your statistical distribution for all of these different attributes from looking odd.

cassianoleal · on June 22, 2023

> how the invisible version can perform almost as well as the one where you click things

I don't think it does for me. I run most websites in temporary containers, I do a lot of tracker blocking on DNS, uBo, etc., I clean cookies frequently.

Either those CAPTCHAs are really bad and are considering me human when they shouldn't, or all those things you mentioned are not necessary for their core functionality.

jefftk · on June 22, 2023

Several of the non-interactive signals would still pass through in those sorts of situations. I don't know exactly what reCAPTCHA collects, but if you look at something like https://amiunique.org/fingerprint you can get a similar idea of what's possible.

cassianoleal · on June 22, 2023

Yeah I get what you're saying. What I'm saying is that it's not required for its core functionality. Without all that, it still works by getting the user to click on images. Sure, maybe it's not just about validating the correct squares were clicked on. Maybe it analyses timing, how much the mouse moves around and whatever else. Nothing there requires collection and storage of user data. All it needs is to process it there and then. It can throw it all away as soon as the user clicks the button.

jefftk · on June 22, 2023

> Without all that, it still works by getting the user to click on images.

I don't see where you've shown that?

ReCAPTCHA v3 doesn't include any clicking on images, as far as I can tell because that doesn't actually add that much in terms of identifying bots?

> All it needs is to process it there and then. It can throw it all away as soon as the user clicks the button.

Even if they limited themselves to tracking how users clicked the button they'd still need to store it, so they could compare this user to other users and build models of what human/bot traffic looked like.

cassianoleal · on June 22, 2023

> ReCAPTCHA v3 doesn't include any clicking on images, as far as I can tell because that doesn't actually add that much in terms of identifying bots?

Perhaps I'm being considered human and just being generally unaware. It's possible. Do you have an example website where I can try it? I have uBo block recaptcha by default, so whenever a website uses it, it takes me a few clicks and page reloads until I get the prompt. I can't remember of a single instance where I didn't have to go through the challenge but maybe my own biases are in the way of me seeing it.

> Even if they limited themselves to tracking how users clicked the button they'd still need to store it, so they could compare this user to other users and build models of what human/bot traffic looked like.

This gives me the creeps. In any case, does this need to be accompanied by PII? And how can I validate that it's not?

jeroenhd · on June 22, 2023

This does seem like a good legitimate use case.

However, if Google transfers the data collected to its American servers or daughter companies, that would still make for a massive GDPR violation, both for Google for breaking the law and, if the situation does not get resolved, possibly for the companies using Google's services while it knowingly violates the law.

jdietrich · on June 22, 2023

It's almost certainly possible to defend the use of a reCAPTCHA-like tool without user consent under article 6 para 1(f) (legitimate interests). My concern with using reCAPTCHA would be the track record of Google - they have a long history of testing the boundaries of data protection law and accepting fines as a cost of doing business.

Google don't appear to even mention GDPR in the marketing materials or docs for reCAPTCHA; they do claim that reCAPTCHA Enterprise (a separate, paid-for product) can be GDPR compliant, but I'd take that with a big pinch of salt. Competing CAPCHA services make much stronger claims regarding GDPR compliance and are much more transparent about how the service uses personal information.

hnbad · on June 22, 2023

> But it is a reminder that any business operating in the EU at this scale must incorporate a data privacy specialist into their requirements gathering and review processes. It's just the price of the ticket to play in the EU.

This sounds very American. You don't need a data privacy specialist to operate in the EU. You need to develop with privacy first by design. Treat all PII as radioactive. Literally. Yes, if you need to bolt this onto existing US software to make it "compliant", you're screwed and you'll need to call in a containment team like when you find radioactive cargo in your business that is not normally expected to handle it.

A lot of times topics like GDPR and privacy are brought up on HN I'm seeing comments that act like it's black magic. It really isn't. It's trivial to be "good enough" as far as legislation is concerned. The problem is just that over the past decades and especially in the US we've seen myriads of online services sprout up that now often seem integral but were built with a complete disregard for privacy and now need to either be retrofitted or somehow contained to become compliant. It's like finding out paint is radioactive after it has been marketed for decades with no regulation or oversight.

Internet companies have been playing it fast and loose with privacy well past the point that people started pointing out it's a (ethical if not legal yet) problem. That is now coming back to bite them. I'm okay with that. It's just a shame so many businesses are caught in the crossfire because those companies have also tried their best to make themselves integral and unavoidable. Good luck trying to migrate away from AWS/Azure/GCP for example.

nness · on June 22, 2023

Funnily enough, I work in technology in Europe and have consulted on GDPR compliance.

Organisations either care a lot of about their data obligations and have dedicated teams and reviews, or simply don't care, and don't want that cost passed on to them. And service providers, not wanting to lose a sale, just go ahead with whatever is easier. In this case, they shot themselves in the foot by not conducting due diligence for what would have been a fairly easy to recognise issue.

(or worse, were advised, and that advise was incorrect. But based on what I can translate in the original document, that is may not have been the case.)

mbork_pl · on June 22, 2023

> It's trivial to be "good enough" as far as legislation is concerned.

I'd be very interested to read any tutorials/howtos/faqs/etc. about that. I might be tempted to create a (very small) side SaaS-type project (and I'm located in Europe), and the main reason I haven't done it yet is that GDPR compliance looks really, really scary to me.

mozman · on June 22, 2023

Being compliant with regulations such as the GDPR is expensive. Lawyers, security, and annual third party testing. I don't profit off EU users so I block all non-USA traffic and avoid the issue entirely.

ahoka · on June 22, 2023

It’s like saying complying with criminal law is expensive because lawyers are expensive.

mozman · on June 22, 2023

If you've been responsible for implementing GDPR compliance I think you may have a different perspective.

hnbad · on June 26, 2023

It's like saying complying with truth in advertising laws is expensive: most companies barely have to think about it because they comply by default but for some companies it is extremely expensive.

Of course it's extremely expensive for them because they're trying to get as close to breaking the law as they can without actually breaking it. That requires expensive lawyers, constant monitoring and extremely fast response cycles. E.g. there are a lot of big companies making good money of exaggerated but legal health claims and their claims are all not just vetted by a team of expensive lawyers but also documented and tracked in such a way that if they do end up getting sued they can immediately find out where they are using that particular claim and withdraw all advertising material using it to comply with a cease and desist.

So, yes, if you want to run a business that is either intended to be willfully negligent for no good reason or exploit users with as little informed consent as you can get away with (likely because what you want to do is not in their best interest), you'll need a team of expensive lawyers.

But compared to actual nuclear storage (which is highly regulated for good reasons), or storing certain financial data (which requires PCI compliance), or storing medical records (which in the US requires HIPAA compliance) or filing your taxes correctly, GDPR compliance does not actually require an expensive external audit and certainly not a regular one.

Of course SOC2 compliance or ISO compliance are different matters and they may be involved in demonstrating GDPR compliance to business customers but they're neither necessary nor sufficient to comply with the GDPR or the ePrivacy directive.

hnbad · on June 22, 2023

Well, you could have had that for cheap: being US-based currently makes you non-compliant by default thanks to your government's warrantless surveillance (look up Privacy Shield and why it died).

But, no, none of these things are required to be GDPR compliant. Of course if you want to build a business on processing PII (and especially if it's any of the protected categories, e.g. you want to process personally identifiable medical data) the GDPR requires more effort from you because it's harder to maintain your users' privacy while doing this. And if you actually have no business doing this but still need to find a way to coerce your users into surrendering their data against their own interests (e.g. behavioral analytics, insurance risk scoring, etc), it's even harder to do this in a compliant way and opens you up to more scrutiny (rightfully so, I might add).

These things are not required to be compliant. These things may be involved in demonstrating compliance. But the lengths you have to go to to demonstrate compliance is very much a function of how privacy invasive your business is. A nuclear power plant will have more detailed radioactive waste management and radioactive material containment plans than a watch repair shop that occasionally handles radium-coated mechanical parts.

Specifically, other companies may insist you go to greater lengths to demonstrate your compliance to them if they want to do business with you, the same way you don't just buy nuclear waste containment equipment from some guy on eBay.

I did mean it literally when I said treat PII as radioactive.

nness · on June 22, 2023

I do love this argument against GDPR (or any data protection) because it often comes down to "Screw the US consumer, I'm profitable because they have no protections, so no one should have any."

... which is kinda the US business sentiment in general, I suppose.

sam0x17 · on June 22, 2023

captchas aren't going to be effective for much longer anyway with the recent dramatic improvements in AI

manmal · on June 22, 2023

Maybe proof of work based captchas will be a viable alternative, like https://github.com/mCaptcha/mCaptcha:

> mCaptcha makes interacting with websites (computationally) expensive for the user. A well-behaving user will experience a slight delay (no delay when under moderate load to 2s when under attack; PoW difficulty is variable) but if someone wants to hammer your site, they will have to do more work to send requests than your server will have to do to respond to their request.

slondr · on June 22, 2023

So, banning low-compute-capacity clients in the worst way possible. Awesome.

manmal · on June 22, 2023

That should only be an issue if the server is under attack.

According to the docs (https://github.com/mCaptcha/mCaptcha/blob/master/docs/CONFIG...), you can set three difficulty levels:

MCAPTCHA_CAPTCHA_AVG_TRAFFIC_DIFFICULTY

MCAPTCHA_CAPTCHA_PEAK_TRAFFIC_DIFFICULTY

MCAPTCHA_CAPTCHA_BROKE_MY_SITE_TRAFFIC_DIFFICULTY

The defaults are set such that avg traffic takes ca 0.02s on an average system. Even if you have a really really slow system, I don’t think you‘ll ever spend more than 2s there.

MrBruh · on June 22, 2023

Nah you need to think about it in terms of computing power compared to each other.

According to the screenshots of XMRig for android you only get about ~35H/s while my laptop does ~2400. That's 68x faster so if it took my laptop 2 seconds it would take a mobile device ~130 seconds.

It screws with mobile users and makes the whole crypto PoW thing about it using too much energy many times worse. Not to mention botnets could make use of enough computing power to easily outpace any captchas thrown at it.

jefftk · on June 22, 2023

> According to the screenshots of XMRig for android you only get about ~35H/s while my laptop does ~2400.

Is that for the specific proof of work algorithm mCaptcha uses? While I don't think you're going to get something that runs equally quickly on a low-end phone and high-end desktop, if it depends entirely on sequential operations and is not optimization-friendly you should be able to get much closer than 68x?

manmal · on June 22, 2023

The screenshot here shows 117H/s, and I guess the Android version hasn’t been as heavily optimized: https://github.com/XMRig-for-Android/xmrig-for-android

I think we‘d need to compare apples to apples, and not use Monero mining as a benchmark for mCaptcha. Also, as I wrote in another comment, the average case (server is not under attack) is 0.02 seconds on a laptop, and probably 0.4s on an Android device even if we do use xmrig-android as comparison. Compare that to manually identifying stairs on pictures with crappy quality (10 seconds?).

manmal · on June 22, 2023

The 2s are the default setting for when the server is under attack. In a normal scenario, it‘s ca 100x less. It would take your laptop 0.02s and your hypothetical phone ca 1s. But the screenshot for xmrig-android shows 117H/s, so it would take such a phone only 0.4s, going by that logic (I don’t think the performance penalty of mCaptcha is x20 on mobile though).

ahoka · on June 22, 2023

If you want to register to my site from your smart thermometer, then it’s your problem.