Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Is Google reCAPTCHA GDPR Compliant? (wideangle.co)
143 points by openplatypus on June 22, 2023 | hide | past | favorite | 227 comments


> We’re using “cookies” as a shorthand for any technologies that can access or store information on a person’s device. This can also include beacons, pixels, scripts, and other technologies.

That is a weird use of the word "cookie". In normal usage, it doesn't mean a technology that accesses or stores the information, it is that information itself.

"Pixels" is also weird. I think that they mean tracking pixels, which are one-pixel images that are just there so that the browser has to request them from the server and the server can notice that request. They are a subclass of "Beacons". Calling them a "technology that can access or store information on a person's device" seems misleading. Also, reCaptcha wouldn't need them. They already have Javascript running on my PC, they don't need a tracking pixel to contact the server.


I actually thought it was a pretty decent caveat when writing for a non-technical audience. They're listing a bunch of different technologies that are (almost all the time) legally equivalent to cookies in the EU. And in EU privacy writing it's already common to use "cookies" to describe this whole area.

(The main exception to their list is scripts, which are only cookie-like to the extent that they use cookies or other client side storage)


While "pixel" was originally the 1x1 way back in the day, it's a generic term now used by industry to refer to analytics, often in tracking of user behavior and conversion events.


Cookies mean exactly that. You are just being overly pedantic.

And yes, running JS in a users browser to store or access information is exactly the same. Taking a browser fingerprint and storing it server side is also the same, just harder to get caught doing. This whole trying to maliciously take these laws literally need to stop.


Yes, I may be overly pendantic. This part just made me suspect that the author didn't have a technological background. That isn't bad per se, as this is mostly a legal topic.

Also, I'm not trying to "maliciously take these laws literally". The law isn't limited to cookies, so you can't get around it by using a narrow definition of the word "cookie".


Nothing in the text makes me think the authors do or do not have a technical background. Everything sounds correct both technically and legally to me (me being a technical person and not a lawyer).


> This part just made me suspect that the author didn't have a technological background.

Or perhaps the author does have a technical background, which is why they attempted to give a clear yet simple explanation for the non-technical?

People hear cookies are bad, they get pestered by cookie banners, so it makes sense to use cookie as an umbrella term. Their definition of tracking matches the law which is the important part.


I've heard "pixels" used generically to refer to the bundle of tracking code from a particular vendor in the marketing department at work. e.g. "Have you enabled the Facebook pixel?" means have you embedded the JavaScript snippet (usually with a fallback 1x1 pixel) that Facebook provides for tracking.


Pixels likely refers to 1x1 tracking images (image fetched from a server, and through that, they can get some data from you, like IP address or visited before)


Sorry, I sent my comment before it was done. It's edited now.


Yeah, I see your edit. With JS you can also detect whether it was fetched before, so that is a way of storing information about whether you have visited something before. Which can then be sent to the server. So, it kinda is storing data, kinda not.


> "Pixels" is also weird.

Could be a unique pixel that does or does not exist in the browser cache (is or isn't fetched (fought?)).

A pixel can store information as long as the cache works as expected.


In the end, you just use the cookie to store a unique id, and store everything on the server.


CAPTCHAs are overused because of groupthink and fashions/fads. Before you use a CAPTCHA of any kind, consider very carefully if you really need one.

I've seen this a number of times in design meetings: someone will say "oh, an account registration form, we will of course need a CAPTCHA there", everyone will nod their heads and move on. In reality, in most of those cases, no one will ever conceivably even try to automate/script the thing being designed.


Thought the same, had a pleasant signup form for a small SaaS platform nobody really knows about, with no captcha. Then someone or some group found it and there's been a barrage of attacks varying in intensity, vectors etc. Cost us so much money in vendor costs the small company is now in danger of going bankrupt.

I appreciate the sentiment, as I had it, but rest assured any future publicly accessible form I build will get at least a CAPTCHA in front of it.


I have a bunch of publicly accessible forms and none of them have captchas.

I did once run into an issue where a signup form was abused by a spammer, but that was a simple fix (tip: in verification emails, do not include any information that the user typed in the form).

If you are careful with your forms, you don't need captchas. Captchas add a lot of friction for some users, so if they can be avoided, they should be.


Many captchas add friction for some users, but some types don't; there are relatively fast "proof of work" captchas that aren't surfaced to the user at all.


CAPTCHA: Completely Automated Public Turing test to tell Computers and Humans Apart

Proof of work isn't a CAPTCHA.


can you explain what you mean by that tip? was this spammer using your verification emails to send spam or something?

or was it more complicated, like not needing to store which fake account had which details?


The registration form had a name and an email, and I sent a message similar to the following:

Hi <name>, thank you for signing up...

The spammers put their spam message in the name field, so my server started sending messages like this:

Hi Get free cialis now http://example.com, thank you for signing up...


A long time ago, I was still in college (UK college, i.e., pre-university), and still learning.

I discovered a classmate was involved in some event, and found the event's website. They didn't have a captcha. By your logic, this was the right choice.

In reality, my dumb ass decided it would be fun to script something that would register millions of users (another classmate ran the script with me). After a few hundred thousand registration, the website was brought to its knees. I was a bit shook, but didn't think much of it.

Next morning I come into class, and was reprimanded by my teacher. Turns out, the owner of said event had threatened to sue the school and me, among other things. What had happened was their servers were down, their email server was brought to its knees, their web servers had died, and generally I had caused a lot of damage without even thinking about it. It caused them to potentially lose some money. None of this was my intention, of course, but I didn't know much better.

Point is, kids will kid, and spammers will spam. There are plenty of bots that just scrape the internet and fill out forms indiscriminately.

Captcha may or may not be the best option here (I'm always of the opinion it's not, especially not reCAPTCHA), but something has to be put in place, even if to stop the majority of bad actors.


you can also just limit the amount of sign ups from one IP each day. There's more simple heuristics to prevent unsophisticated abuse like that


You can, but then you discover that places like Bangladesh and Cambodia, that do a fair bit of freelance work on the 'net use a surprisingly tiny number of IPv4 addresses to do it.

For lots of these countries their total allocation of IPv4 addresses is < 20 per 1000 people and the nature of their access (through glorified internet cafes) mean that you will have some IP addresses that really are totally legit, yet have LOTS of users.

One size fits all is very dangerous on the Internet.


How is the IPv6 roll-out over there?

One the one hand, I assume bad due to cheap equipment. On the other, it's not like v6 addresses are expensive and you need some way of addressing every subscriber anyway. As more people sign up (as the country gets more people with internet access), you need more equipment which could support v6 out of the box, and the excuse for CGNAT I've always heard is old equipment that is harder to upgrade than to put a NAT router in front of. Could go either way from my POV.

If the roll-out is good, then all those people are already taken care of and the minority left on v4 CGNAT aren't bothered by the collective rate limit.

(To preempt the eventual remark that users can generate a billion addresses in v6: rate limiting on v6 works by limiting whatever prefix the ISP gives out to subscribers, like /56, not individual addresses the way it's often done with v4.)

As an aside, it should also be kept in mind that not every use case involves signing entire countries up for their service, even in an ideal case.


To give another example, in spain most mobile carriers will place everyone behind a cgnat with no ipv6.

In fiber some do the same, although thankfuly most place v4 behind a cgnat while offering ipv6.

The whole 1 ip 1 user even if dynamic quite false and is a mess.


That has been my experience on any mobile network, also in 2007 or so when v4 addresses were still available (because my 15-year-old self wanted to seed torrents with my unlimited data bundle ...on GPRS). It's a fair point that one has to consider this part of the market, though I was primarily thinking of wired connections.


It isn't good and purely being on IPv6 is still a terrible web experience in any event. Huge % of major websites don't properly support IPv6 yet. It's ridiculous.


I know, but this is about hosting a service, not about trying to use existing services that got v4 addresses before it was cool


IPv6 doesn’t address disambiguating people using public computers at places like Internet cafes.


Is your site even relevant to Bangladesh and Cambodia?

If you're collecting sign-up data for something local, then most likely not.


Yes.

FWIW, I learnt about this the hard way.


Yes if it is relevant then for all means make it work for them


No way. In the B2B world at least, I expect hundreds of users coming from behind the same corporate proxies.


> you can also just limit the amount of sign ups from one IP each day

This is a classic example of how "just do this" kind of thinking can lead to terrible results.

Do you now see how "just limiting the sign ups from one IP each day" can go very very wrong?


What you could do is, use both. One sign up from each IP per day before you get a CAPTCHA. Then you're not subjecting 99% of your users to training Google's AI for free but the people at a cafe in Bangladesh can still sign up.


This sounds like extra work to solve the problem you said didn't exist.


It's extra work because it's better. You're not subjecting 99% of your users to training Google's AI for free.


> limit the amount of sign ups from one IP each day

one per library per day...


one per coworking space, one per office location for each company


Life as a developer has taught me to take the other side of your argument. I'd disagree on this.

Once you release something to the wild you need to have robust controls in place to prevent one person or group of people from using all your resources.

I wouldn't release a product that doesn't have rate limiting of some kind, of which a captcha is one way to rate limit.

Always trust people to push the boundaries of your app as far as they possibly can. I have yet to build a system where someone doesn't. And that includes tools I've built for inhouse users:(

Whether intentionally or not, they always find a way to push the boundaries:)


> no one will ever conceivably even try to automate/script the thing being designed.

Spammers will spam everywhere they can. My minuscule personal site suffers from it very rarely, but I can imagine anyone getting a lot of page views making it worth it.


On my custom built site I have none of those. But, on my WordPress site, I had to install captcha the second days. Spammers are just using scripts, which cost next to nothing...


If a site gets even a little popular it attracts spam and security scanners and other nonsense no matter how it's built.


I don't know. I run a SaaS that allows free user signup and significantly more than 50% of my daily signups are just signup "spam", without any visible motivation for doing so. The user name or information doesn't show up anywhere publicly and there is no inherent value in having a free user account. I've implemented some basic countermeasures (dummy form fields which reject the submission) which wasn't enough. I've added reCaptcha, and I'm still getting 50% spam signups from working (!) gmail addresses, meaning someone is able to receive emails on these. The majority of these are from places like India, Bangladesh, Vietnam, etc.

I don't event want to know what my site would look like without my own countermeasures + reCaptcha + if it was a service where a user account has any kind of "value"...


Is there a particular problem if someone signs up for an account on your system and doesn't use it?

Is such an account using a lot of resources?


Blocking others from signing up or using the username they'd prefer. For example, the bad actor could spam the registration with valid email addresses and depending on how your system handles registration it could either send validation emails to those addresses or block the person that owns the address from signing up.


It doesn't use up too many resources, but I still don't like having more than 50% of the user database be essentially spam. If you ever want to sell your company you want to have somewhat accurate numbers of actual user registrations. Ever since I realised the extent of the issue I have become very doubtful about reported user counts from startups. In our case the only reason we've realised this is spam is because we annotate signups with GeoIP and also allow users to fill in a name and title which will then be something like "find escorts in Chennai" and the signup will be from India etc. If you only look at the email addresses all you'll see are many gmail addresses with western names, so you might be fooled into thinking that all of these are legitimate users.


Is it possible that the emails are real, non-spammers and the spammers are abusing your registration form to send a short message entered in the name field to those emails like this person described? https://news.ycombinator.com/item?id=36446532


Well, you can't use raw signup numbers as a tracker of how well you are doing?


CAPTCHA on registration page removed quite a bit of automated registrations. What are other options to prevent/reduce automated registrations? (one from top of my head email/phone verifications)


hidden fields will remove most of the non targeted attacks.

And if they really are targeted, I don't think CAPTCHA will help much.


Google reCAPTCHA v3 has worked pretty well for us, we saw many instances in our logs where v2 was solved by scammers on our site using some automated tools that had a 25% success rate (plenty for automated scammer scripts) but upgrading to v3 stopped all the automated attacks. So far we haven’t seen successful solves of v3 in the wild and we’re a payments company so we see a lot of attempts.


We use Auth0 which determines when to show a captcha, I think "Smarter Captcha" should be the industry standard. If you don't suspect the end-user being a bad actor, why show them a captcha every time. In fact, Google's Captcha is awful for literally almost always showing it, tells you they dont care about stopping bots, only the data they get from user inputs.

Edit: And come to think of it, A TON of websites do "smarter captcha" or whatever you want to call it, because in one of my computer I enabled the resist fingerprinting setting on Firefox, and I get a captcha every visit on some sites that NEVER show a captcha (I think it might be cloudflare driven, but unsure). Like Walmart comes to mind, it shows me a pill looking thing where I have to hold the mouse click until it fills.


It took me one incidence to turn from "no one will ever conceivably even try to..." to "everyone will nod their heads and move on"


Years ago I had a blood test taken at a local pathology place, the form they were submitting had a CAPTCHA and pictures they were given weren't easy by any means. I'm talking the kind of stuff you get trying to go to google.com on Tor browser.

As far as I could tell this was an internal form that wasn't publicly accessible


> In reality, in most of those cases, no one will ever conceivably even try to automate/script the thing being designed.

There are more than enough people running automated crawlers, probably fed from Google "inurl: contact-form" searches or whatever, and just blanket spam you.


We ignored them until we needed them. Then we needed them.


This is in line with my experience as well. For most sites, CAPTCHAs are overkill and an accessibility problem. Hidden honeypot, maybe a simple “How much is 5 + 2” keeps 99% of spam out. I had a few more difficult cases, which were solved by blocking some geographic IP regions and adding blacklists for certain words, like “crypto” for example.


I'm not an expert on honeypot inputs but wouldn't it be super easy to check for type=hidden or opacity=0 if you'd like to spam?


Yes, but most bots don't. There are also some more elaborate methods. Giving the input a tabindex="-1", aria-hidden="true" and then moving it left: -100vh works pretty well.


Remember that Google reCAPTCHA v3 is invisible to the user. No accessibility issues.


It's invisible until it isn't.


Hows that?


If you writing your own account registration form instead of using something off-the-shelf that provides captcha service for you, or even better are just using an oAuth or similar technology so users don't have to manage yet-another-password? I already hate you.


Spam is ever present, and Captchas protect from the massive torrent of trash.


i had to put a CAPTCHA system on a public register form for digital libraries, because they were getting spammed by bots.


The mistake here in Europe is that an ip address is considered personal information:

https://www.ra-plutte.de/lg-muenchen-dynamische-einbindung-g...

This makes it impossible to use any components hosted by third parties without getting consent by the users. And for components hosted in a different country than the visitor, even consent might not make using those external components legal.

This is bad and not in line with reality. The IP can only be turned into personal information via cooperation of the users internet provider.

So in Europe, the whole internet is made illegal based on a wrong assumption.


As a European, I don't consider this a mistake, for the simple reason that the IP address is so easily abused by trackers and people with bad intentions - the extent of abuse that we have experienced until now is absolutely ridiculous.

Hell, even a small startup with a few thousand euros can start to track and trace user behaviour on a massive scale that in reality you wouldn't or shouldn't be able to do.

The tooling (free, cheap and not) at our disposal nowadays makes everything so easy that even something that in theory should not serve as identifier can be used to identify you - so let's start with the most common ones: IP, email, etc.

The Internet in Europe is not illegal - it's just pure BS that a simple page like reuters.com contains references to 14 external scripts when loaded, when actually all you need is 2 maybe 3 scripts (the CDN to load images and videos + the page itself) - the rest is crap used explicitly to identify and market people - that's it: Ads, Ads and probably uglier things to do just to profile people online.


As another European I consider this a grave mistake and I am not surprised that we do not see many successful startups in Europe. It is part of the narrow-minded micro managing mindset that too many European politicians (especially the greens) have.

Instead of finding an innovative solution (how about mandating ISPs to make IP addresses unmappable to a user?) they only know one solution: Making things illegal, even if in almost all cases the use is benign or even makes a lot of sense.


> It is part of the narrow-minded micro managing mindset that too many European politicians (especially the greens) have.

Ironically, even in the USA with the CCPA they are trying to regulate this somehow. This is just not something you can leave untouched. Without even mentioning other non-EU countries (China/Russia/India, etc.) that have even stricter data regulations.

> Making things illegal, even if in almost all cases the use is benign or even makes a lot of sense.

What is there benign about sending your IP address to 10-15 external services that have nothing to do with providing you the service in the first place? Try to install the NoScript extension and you'll see that 90% of the websites only require the main domain to be unblocked to be used. A few more websites require allowing 1-2 additional more scripts (CDN or ... payment systems/redirects, etc..). The rest is just your data being treated like toilet paper for marketing purposes. And I wouldn't even mind if they stopped there (marketing website A buys the data, and they target you with some crappy ads you won't even click on): no, they sell, and the buyer resells, etc.

Is this really what makes a country innovative? Advertisements? I thought that innovation was self-driving cars, electro cars, new ways to fight diseases, digitalizing bureaucracy (which can be done without giving away your personal data to random online companies), etc.


No, advetisements do not make a country innovative, and I never said that, and I have no idea why you would even bring this up.

But if you are a small company in Europe that for example is producing a better and lighter cardboard box, you will have to hire a data privace specialist because you may get sued by some random idiot if you accidentially link to Google Fonts as a convenient way to display fonts, this certainly binds a lot of valuable mindshare and resources and therefore is bad for innovation. You cannot believe how much startups here have to think about GDPR even if they do not collect any data at all. Read the other answers in this thread. There are others who say that linking to Google Fonts is equal to selling your customers' data to Google. What a joke!


Of course we should just give up and give all our data to anyone who wants it, lest we burden the developers in any way.


If you don't understand why including third party resources for developer convenience is a bad idea then you probably shouldn't be building commercial websites.

Of course startups here - and everywhere else! - have to think about applicable law. Just like if you want to start a transportation company you'll have to think about applicable law. If a company does not collect any data at all then the GDPR is a complete no-brainer.

Try this and see if you can make it work for you: to identify an individual you need ~33 bits. An IP address, even though it doesn't quite get you there gives you a very large fraction of the number of bits required. Adding a few more makes the individual identified. So therefore it is a very good idea not to allow access to the IP address to unrelated parties from a privacy point of view.


That ad hominem attack was a bit uncalled for, especially since tablomonto.nl is sending their users' IP addressed to Google by using Google Fonts, too.


Ah good old 'e tu, Brutus'. But you know what: I don't have any influence on how tablomonto.nl is implemented. But if you check out jacquesmattheij.com you'll get a much better idea of my attitude towards the privacy of the visitors to my website(s).


You probably should send an unsolicited email to your colleague with this then:

<quote> If you don't understand why including third party resources for developer convenience is a bad idea then you probably shouldn't be building commercial websites.


Well said. I'd recommend Surveillance Capitalism by Shoshana Zuboff for an in-depth study of this unprecedented and spectacular surveillance that is becoming so commonplace.


> how about mandating ISPs to make IP addresses unmappable to a user?

I can't see how this would work in practice: you'd still have the same IP when visiting site A where your real identity is known, and site B where you didn't give them consent to access your information. The only reasonable course of action is to classify your IP as an identifying info, as site A and B exchanging info would result in your exposure (or site A and B being the same site, but you logged as different users...)

Perhaps your point was on giving user different IPs for every site, potentially for every request, but then the consequences would have to be handled worldwide.


Well, let's just take a look at the most benign case where a site owner was sued and convicted by simply using Google Fonts without any bad intentions and without selling or transfering any other data to Google. To Google your IP address will be worthless unless they get any link to your identity, so it solved those really simple cases where site owners accidentially violate the GDPR without any bad intentions and have to fear being sues by some random idiot in court.


> To Google your IP address will be worthless unless they get any link to your identity

Not true. Google is used so widely that they can track and profile me all over the internet, because they can easily combine on which page I've requested fonts from them. And once I log into YouTube, they can link all those information about medical conditions I googled directly to me personally.


I see it from two angles:

- on the benign intent and accidental violation: is it natural to go load fonts from a third party service in the first place ? I get why we arrived at this point, but hosting yourself the base resources you're using for your site shouldn't feel like some huge burden or leap from the norm.

If we really see a shared benefit to having fonts on some common platform, I'd also wish it wasn't Google. Perhaps Cloudflare or Fastly ?

- on IP being worthless to Google: is the "unless they get any link to your identity" even hypothetical ? I am a paying Google customer and extensively use their service. And we all do to some extent; trying the "let me live my daily life but block and avoid anything that touches Google" game would still be as critically punishing today as it was 4 or 5 years ago.


IP isn't personal, isn't unique, and isn't identifying.


The point of the law is that anything that can be easily (technically easily, not contractually or legally) combined with another database to make the first bit of data PII - then it's PII.

An IP address can be trivially combined with the ISPs data and provide the exact user of an IP during any point in time.


So anything is PII.

> An IP address can be trivially combined with the ISPs data and provide the exact user of an IP during any point in time.

Not even then.


> This is bad, because the IP can only be turned into personal information via cooperation of the users internet provider.

Not 100% true, you can often trace back users by IP using leaked databases and through companies that sell user data. Might not be legal, but you definitely don't need cooperation from a ISP.


In that case the database is the PII, not the IP.


If Person A has a document saying that Joe Biden lives at 1600 Pennsylvania Avenue

and Person B has a document saying an 80 year old male living at 1600 Pennsylvania Avenue has Chlamydia

do you think only Person A holds private information?


Users use the same passwords everywhere. By cross-referencing user passwords through excessive brute force you could find accounts on other sites that link to a user’s personal data.

Is the password personal data?

You have to draw the line of correlation difficultly somewhere.


I would expect a company to guard customers' passwords every bit as carefully as they guard customers' e-mail addresses, and probably moreso, yes.


The term "personally identifiable information" does not occur anywhere within the text of GDPR. GDPR regulates the use of personal data, which is conceptually much broader than PII. Any data that relates to a natural living person is potentially within the scope of GDPR, including data that is insufficient in isolation to identify a natural living person. For example, pseudonymised data from an employee database or medical records may still constitute personal data if it would be possible to reconstruct the identity of that individual by inference, even if all direct identifiers have been removed.

https://commission.europa.eu/law/law-topic/data-protection/r...


So someone can be identified, directly or indirectly, with an IP address, making it personal data under GDPR, art. 4(1).


What does "indirectly" include legally?


Any piece of information that can be related to someone using supplementary information. Eg. My personnal email address contains my name, so I can be identified direclty; my phone number doesn’t, but my operator and contacts knows who is behind it, so I can be identified indirectly.


If the IP is dynamic then how would you know who had it at the time?


you make a join between the "user session" and the "user profile" tables


Because ISPs are keeping lease logs.


How would you get that information?


Court order, security incident, insider access.


Easily


You might be able to trace an IP back to a user, but you're absolutely not guaranteed that that IP was only used by that particular user.

Therefore even on a technical level this EU legal interpretation is insane, hundreds or thousands of people can potentially use the same IP address, how is that personal information then?


I don’t think the goal is to find a single IP for a crime or something. In that case, law enforcement just subpoenas and they’ll sort out the thousands using the IP to the actual person (possible with ISP participation).

I don’t think it’s insane because the more common use is just to pattern match to identify the individuals. While lots of people may share, many do not (eg, everyone in my home shared an external IP but that is frequently just one person).

Google can use this IP and browser traffic to separate out individuals (eg, I don’t watch YouTube and my kid never checks vanguard) to the level they convince advertisers that they know the individual. I expect this is why my kindergartner sees ads for car insurance.


> Therefore even on a technical level this EU legal interpretation is insane, hundreds or thousands of people can potentially use the same IP address, how is that personal information then?

ISP knows exactly who used IP at that point in time (they are even obliged by law to log them). Therefore IP from which a request was sent can be used to uniquely identify the device, with 100% certainty. Therefore it is (usually) a personal information.


> This makes it impossible to use any components hosted by third parties without getting consent by the users.

Sounds good to me, that's the way it should be. I shouldn't have to use third party extensions to stop my browser from automatically loading facebook crap every time I visit websites that aren't facebook. Companies should only include 3rd party components in their websites if there is a very good reason for it, and only then after the user has explicitly consented to it.


> The IP can only be turned into personal information via cooperation of the users internet provider.

It’s not a direct identifier but with geo-ip or other data, it can identify an individual (eg, have 100 possibilities and geoip narrows it down to only 1 in that region based on IP).

The PII aspect isn’t based on getting a link from the provider. The PII aspect is based on the IP itself standing out in data and allowing reidentification. It’s not 100% accurate, but accurate enough to make money off advertising.


It’s not a direct identifier

Except when it is. I have a semi-permanent home IP (it only changes when the MAC address on my router changes and I get assigned a new lease) and only one user in my home. My IP address pretty uniquely identifies me.


Most cable modem users in north america are in the same situation. And small shops usually have a business package with a static IP too.

I don't understand the point keyboard warriors who insist IP doesn't identify a person and IP isn't PII are trying to make.

Are they just being pedantic? Because yes, often an IP is shared. Yes, obviously the legal system shouldn't assume a IP equals a person. That would be very problematic.

But when it comes to mere tracking, an IP can absolutely identify directly a person. So why not just treat it as such all the time? What's the downside of considering IP to be PII?


They're not stable in time either. And they can be misleading if you try to use them to geolocate a user. The ARIN for my IPs makes me appear 500km away.


>This makes it impossible to use any components hosted by third parties without getting consent by the users.

Good.

There's very few legitimate uses for third party hosted proprietary components.

Why it became standard to load simple things like scripts or fonts from third parties, that can be trivially hosted locally, is beyond me.


So hosting an ad is now illegitimate? How about embedding a video? And why should you not embed stuff from a CDN?


Ads (particularly ads served by third parties) have never been legitimate, that's why most users who know what they're about use an adblocker to stop them. Videos should only be embedded once the user consents to that (which could be implemented using a first-party button that the user clicks to affirm their consent and which then loads the third party video component.


> Ads (particularly ads served by third parties) have never been legitimate

That's a very extreme take. They're how many services actually exist.

Also, consent to an embedded video? My brother in Christ, your browser requested the video.


> Also, consent to an embedded video? My brother in Christ, your browser requested the video.

This kind of attitude is why I block JS and cross-origin requests by default. Unfortunately this is not a reasonable option for the less technically inclined who deserve protection too.


It's not an extreme take, it's the way most people with a technical clue feel, evidenced by widespread use of adblockers by anybody who actually knows how to use their computer.


Videos can easily be embedded and loaded from local resources, too. No need to shove the whole of YouTube down somebody's throat, if a simple <video>-element with an mp4 file does the same. You want the comfort and features of YouTube? Ok, then let the user decide if they want that too, ask for their consent.


Have you ever heard of the concept of copyright?


Taking (kind of) care of that is part of the featureset and "comfort" of YouTube, yes. Nevertheless, a page can work wonderfully without that.


It is not a problem to load a static (not changing based on user data) ad from your own server. Using a CDN is likely legitimate interest when it is not using private data for other reasons than serving the content.

(I'm not a lawyer.)


THat would require small companies having to implement their own Ad business. Sorry, your idea is cute, but completely unrealistic.


> 'So in Europe, the whole internet is made illegal based on a wrong assumption.'

Is based on a wrong assumption.


It should be mentioned that we don’t have case law the same way the US does. So one court deciding this does not mean that this is now law.


That is not quite completely true - In many cases you can associate it to a user based on their activity too. For example, they might logon which would link the IP to an identity.


Yes, but the IP itself is not personal without the connection to other information. I do think considering IP address personal is a bit of a reach, especially given the common case of ephemeral addresses.


All the residential IPv4 addresses have been the same for years, until I switched ISP or moved. Ever since I've lived alone that IP address 100% maps back to me as a person, and I'm not the only person in Europe living on my own. Pretending this situation doesn't exist and forms a privacy risk would be madness.

There's nothing wrong with receiving IP addresses on your website, though. You can log IP addresses and use them for detecting fraud and other kinds of abuse without requiring consent. Third parties can do the same, as long as they follow the law and as long as you clearly document what information you're sharing/making users share in your privacy policy.

You can't use personal information for tracking and ad purposes without consent, though, and you can't partner up with other companies that do it for you. It doesn't matter if you're tracking IP addresses, cookies, passive fingerprints, or some kind of supercookie; you need a legitimate reason or explicit consent to process that kind of information.


> Yes, but the IP itself is not personal without the connection to other information.

By this argument, isn't the same true of a physical home address?

> I do think considering IP address personal is a bit of a reach, especially given the common case of ephemeral addresses.

Except that isn't strictly "the common case". DHCP leases are often for long-ish periods of time on fixed line broadband services. The IP for my home router, for example, has been the same for weeks or months at a time.


The concept of being able to identify individuals by combining data sources is a key part of GDPR.

If you can look up a user account by IP address, then the IP address is personal data.


>This makes it impossible to use any components hosted by third parties without getting consent by the users.

There are six lawful grounds for processing personal data under GDPR; only one of those grounds is consent. Consent is not always necessary, nor is it always sufficient.

An IP address is potentially personal data, because it could relate to a natural living person. There are all sorts of legitimate reasons to use that data without consent, the most obvious being to fulfil a request by the user. You will run into issues if you're using that data in ways that aren't strictly necessary - keeping logs indefinitely, using that data for marketing purposes, sharing that data with third parties without good reason and without adequate safeguards etc.

https://gdpr-info.eu/art-6-gdpr/


Yeah... it could be nice if people stop spreading FUD on GDPR ;-)

All GDPR is asking mostly is: you only gather minimal PII to provide a service (if needed at all). If you use PII for another purpose than providing the service or meeting operational purposes (like fraud detection or monitoring your infra), then you must obtain the consent of the user (for marketing or selling your users data for example). This extends to your providers too (like Google Analytics...)

The problem is that a lot of "internet services" take for granted that they can do whatever they want with the data they got for a specific purpose... even without informing the user! And that's not good... so GDPR has been created.

But if your service is "fair" to the user (meaning: you only use the datas to provide the service), then there's no problem...


Apparently a German court ruled that embedding fonts from Google Fonts is not "fair" to the user. You either have to ignore parts of the GDPR or you accept to not be able to do many useful things.


You can do useful things without selling user data to Google.

Yes it's selling: You provied the data, Google gives you hosted services back. If you think embedded fonts are useful for you, you can host them. Then you pay the bill and not your users.


> (like fraud detection or monitoring your infra)

reCAPTCHA, in this case can be consider as "fraud detection".

Can we do it without PII? Yes. Maybe. With lots of effort and less optimal result.

Does this fit the "necessary" provision of GDPR? This depends on which court you ask.

reCAPTCHA does not add "direct" value to the user, it even cost some harass... but it is a life saver if your service is big spam target.


Like any US-based service (or one controlled by a US company), reCAPTCHA is subject to US laws which mean they CANNOT guarantee that your PII is only used for fraud detection purposes because the govermnet can order them at any time to hand over that data while also ordering them to keep silent about that.


Unfortunately, a judge can always say that using an external resource was not necessary to fulfil a user request.

As they did in the judgement I linked to.

A judge can always claim you could have used a local version of whatever external resource you used.


A judge can say anything they like, but that doesn't mean it'll be upheld at appeal.

If you can use a local version of an external resource, you should. Minimisation is an integral principle of GDPR. There are plenty of circumstances where it would be impractical or impossible to perform a necessary function without sharing data with third parties, but that still needs to be done with appropriate thought and care.

I would have a great deal of confidence in embedding a resource provided by Stripe; I would have absolutely no confidence in embedding a resource provided by Google.


And if that is the first GDPR related offense you get away with a slap on the wrist.


Of course... or he can say that you could use another resources that is GDPR compliant and/or privacy preserving.

YOU provide the service so YOU are responsible for the way your user's data are processed and must ensure that it's processed according to the rules (requiring consent when necessary)


I was wondering about it too, but I guess that for some customers in rural places (everywhere big like USA an EU) IP address is as good as home address. Combine it with some providers that will not change your IP until manually requested (and not until router restart) and you have a real PII on your hand.


I used to live somewhere where the reverse-DNS for my IP was literally my home address (student housing network)


> This makes it impossible to use any components hosted by third parties without getting consent by the users. And for components hosted in a different country than the visitor, even consent might not make using those external components legal.

This is based on some very faulty knowledge of GDPR and the law.

You are allowed to process data without the consent of the user for various things. This would include their IP address. You're allowed to have third party data processors process data on your behalf without user consent for various things.

The Google Font ruling was partially due to who Google is. Google data mines, they're famous for it. So giving Google data they can use to map to your internet persona which may even be linked to your name directly is obviously something many people want to do only when they consent. The fact Google Fonts could be self hosted was another part of the reason for the ruling. That is, sharing the information wasn't required to be able to perform what the actions they wanted to perform, use a font.

Data processing done by US companies is not currently GDPR compliant. However, no one is enforcing that. It would be a complete mess and there are far too many. In reality, everyone is ignoring it waiting for the new laws to be created to make it legal. The reason the US companies are an issue is a US court can issue a judgement to a US company and they are forced to comply no matter where the data is.

> This is bad and not in line with reality. The IP can only be turned into personal information via cooperation of the users internet provider.

This is also not true. If you visit a website that sells B2B accounting software and your IP is identifiable to a company. You could phone up the company and ask to talk to the person who is responsible for finance. If there is only one person, boom easily identifed. There are also various other ways.

> So in Europe, the whole internet is made illegal based on a wrong assumption.

Really, your comment is wrong based on multiple wrong assumptions.


> So in Europe, the whole internet is made illegal based on a wrong assumption.

No, just websites which include components hosted by third parties. This is not the whole internet (e.g. HN doesn't include third party components).


Use the search function on the bottom of HN. It is provided by a different company.

HN is also hosted by a different company. They get your IP too.


> Use the search function on the bottom of HN. It is provided by a different company.

HN doesn't communicate my IP to this service.

> HN is also hosted by a different company. They get your IP too.

If they store my IP, they should treat it as PII to be compliant


What they do with the IP matters. Using your IP address to serve the website, or even to do fraud/DDOS prevention, is perfectly fine.


Yes, and companies doing business in the EU may only use GDPR-compliant hosting.

All European providers as well as major international provider like AWS [1] have compliance statements. The EU companies are more likely to take it seriously, e.g. Hetzner only logs the first three segments of your IP (i.e 24bit IPv4 or 48bit IPv6). Every other EU-based provider I've used at least offers it as an option. If you run your own server, there are nginx and apache modules [2] that anonymize your logs.

It's real. We're 100% serious.

[1] https://d1.awsstatic.com/legal/aws-gdpr/AWS_GDPR_DPA.pdf [2] https://www.supertechcrew.com/anonymizing-logs-nginx-apache/


If there was a proxy service that acted as an ip mask, and there was a list of the ip addresses of such masking proxies, then could EU customers using such services solve the issue?


Yes, this is what the Cnil suggests for people that want to use Google Analytics.


Yes, but making stuff illegal is much easier than being inventive.


Indeed, the GDPR defines "personal information" as (Article 4 sub 1)

> ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

This is not out of step with reality nor a wrong assumption, it is simply a definition. It is motivated somewhat in the considerations of the GDPR.

> (26) The principles of data protection should apply to any information concerning an identified or identifiable natural person. Personal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person. To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments. The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.

> (30) Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.


> This makes it impossible to use any components hosted by third parties without getting consent by the users.

That is simply not true, please do not spread misinformation like this.

Using/Embedding third-party resources is allowed IF it is e.g. technically necessary to provide the service or core functionality at all.

Collecting personal information and using a third-party service to do so in a shop checkout? That's okay.

Collecting personal information and shoving everything into Google Analytics because you want to know how many people visited your site? Not so okay, there are less intrusive ways to do that.


Hahaha, so EU service providers can't protect their website using Google reCAPTCHA because it'd likely need user's consent and they're not allowed to restrict their offering in case a user doesn't consent. So, bots simply deny consent and send the form without captcha :'-D

Did I get that right?

That's nuts.


Lots of alternatives who claim to be GDPR compliant though.


> and they're not allowed to restrict their offering in case a user doesn't consent.

French newspapers don't seem to agree with your interpretation of GDPR, as you are almost always facing a “consent or pay”-wall…


I would say that you need to call a phone number to verify you as a user if you don't consent to captcha. And then that number is a voicemail leave your number we will call you back. And then we have a backlog so it takes us a while to call back. I don't think that's illegal.


My favorite thing is going into a non-us government site and having to solve a recaptcha with a US-centric thing: "Mark all the school buses", "Mark all the stop signs"


I'd love if I started getting captachs for some weird local food like "Select all the pao de quechos" or something like that. lol


like most Google things, recaptcha is localised based on your location, so I've had to deal with French, German, and occasional Spanish instructions while travelling before. with no easy way to change it. (adding &hl=en to the captcha URL fixes it, but good luck with that)

Hitting refresh until I got one with an example picture was pretty common until I learned some vocab.

I should really make a browser extension for this.


It is a grey/not well known area: as far as I know it is not working with noscript/basic (x)html browsers, which is paramount for past/present/future interoperability between big tech and small tech.


Note that reCAPTCHA's use of hashing and encryption might mean that the private data this article refers to, such as device type, is not actually sent to google.

A quick check with the network inspector doesn't obviously show this data being sent.

And obviously to punish a company, the EU would need to prove this data is sent - a hard thing to do when the code of recaptcha is deliberately designed to prevent reverse engineering and analysis.


My French isn't exactly great but based on the machine translation of the linked court case, I believe the problem with reCAPTCHA is that Google uses it for more than just authenticating users.

The generated fingerprint for these scripts is personal data, it pretty much directly refers to you as a person, that's the intention of the system. That's not necessarily a problem, though. These types of detection algorithms are perfectly allowed without explicit consent, just like other types of fraud and abuse detection.


The real question is why would a service like recaptcha need the ip of the user to operate?


IP reputation/blocking. Often datacenters have a bad IP reputation and if you access a site with a captcha from a datacenter IP, you will often get more/harder/slower captchas to solve. Some IPs are completely restricted from accessing sites which have captchas, if they are known to be used for spamming.


Embedded references to external javascript libraries, fonts, images, and so on on a web page are resolved by the browser by making a connection to the service that hosts the resource, and connections over the Internet necessarily involve two IP addresses of some sort, the IP address of the requester (or a requester) and the IP address of the responder. Every web browser works that way by default.

There are two basic choices to avoid that - in some cases you can just host the resources yourself so the client browser does not connect to hosts operated by others, or you can operate a proxy so that the client browsers requests are relayed and anonymized before going to a third party.


The article title is "Is Google reCAPTCHA GDPR Compliant?"

And its a good point — broad data collection has always attracted the mire of European regulators, and in the decision they state that they find that reCaptcha serves as both a security and analytics tool (due to its broad data capture.) I can't argue with that definition.

The solution, for Google, is to only conduct telemetry after the user has authorised that telemetry, allowing reCaptcha to function without the data collection consent. They already have such functionality in Google Analytics, but arguably, might be less valuable for Google without that data.

For the businesses using reCaptcha, its a problem. The article makes a fair point that you can't use the service if the user declines consent. But it is a reminder that any business operating in the EU at this scale must incorporate a data privacy specialist into their requirements gathering and review processes. It's just the price of the ticket to play in the EU.


You can argue the data collection is legit and does not need the user consent, because it is needed in order to perform the core function to separate bots from humans. Thus, no special consent is needed.

The different question is that if Google uses this data for purposes it is not intended. In this case the service might be still GDPR compliant from the website implementor point of view, but Google would be doing fraud by breaking their Terms of Service how the data is handled.


If a service doesn't work without respecting user rights, the service can't be used.

What's next? Capture a picture of your webcam to check if a real person is sitting in front of the PC?


This is already happening, with some remote working platforms requiring the camera on during working hours

Twitter also requires the employee to be in a dedicated room with a door that closes. At least they used to.


Some KYC processes in the EU (and likely elsewhere) do involve a webcam interview with a 3rd party who checks the customer‘s face and passport.


The question is also, why separate humans from bots? It makes creating useful scripts harder, doesn't it fall into "fair use" case which in EU is enforced even in software where you can modify it to be able to run it on your platform.


To prevent or at least cut down on spam? Pretty much the entire reason captcha systems were created in the first place and an entirely legitimate reason?


In addition to spam, preventing bots from buying up all the GPUs and PS5s in the recent past would have been nice.


Though in the end you're the one clicking busses and bikes and bridges and busses again and when you're through that charade, everything is gone anyway because the bots were faster/better at that..


It is up to the website owner to decide how they want to distribute the content, not the reader. What you see as fair is often not fair from the website owner point of view.

If you disagree with this you are always free to create a competing business without such captcha limitations for bots, and put your money where your mouth is.


Think about why the bots exist - it's almost always data theft or other schemes to abuse your API to profit. Taking money from your pocket.


>>because it is needed in order to perform the core function to separate bots from humans

The core function for most sites using recaptcha is not to separate bots from humans, so a consent is needed before sending data to a 3rd party not related to the core functions of the site or app


> You can argue the data collection is legit and does not need the user consent, because it is needed in order to perform the core function to separate bots from humans.

You would have a hard time arguing that. The core function to separate bots from humans is done by requiring the user to identify certain images. The only data that needs to be "collected" (and even that doesn't need to be kept) is whether they clicked the correct squares.


That is not most of how it identifies bot traffic. The order you click them, how quickly you click, and where within each square you click for all ways humans and computers can be different.

But the real work is not about your interaction at all: this is how the invisible version can perform almost as well as the one where you click things. This involves comparing information about your computer's JavaScript environment with what they have seen elsewhere, and if you are running a bot farm it's pretty hard to keep your statistical distribution for all of these different attributes from looking odd.


> how the invisible version can perform almost as well as the one where you click things

I don't think it does for me. I run most websites in temporary containers, I do a lot of tracker blocking on DNS, uBo, etc., I clean cookies frequently.

Either those CAPTCHAs are really bad and are considering me human when they shouldn't, or all those things you mentioned are not necessary for their core functionality.


Several of the non-interactive signals would still pass through in those sorts of situations. I don't know exactly what reCAPTCHA collects, but if you look at something like https://amiunique.org/fingerprint you can get a similar idea of what's possible.


Yeah I get what you're saying. What I'm saying is that it's not required for its core functionality. Without all that, it still works by getting the user to click on images. Sure, maybe it's not just about validating the correct squares were clicked on. Maybe it analyses timing, how much the mouse moves around and whatever else. Nothing there requires collection and storage of user data. All it needs is to process it there and then. It can throw it all away as soon as the user clicks the button.


> Without all that, it still works by getting the user to click on images.

I don't see where you've shown that?

ReCAPTCHA v3 doesn't include any clicking on images, as far as I can tell because that doesn't actually add that much in terms of identifying bots?

> All it needs is to process it there and then. It can throw it all away as soon as the user clicks the button.

Even if they limited themselves to tracking how users clicked the button they'd still need to store it, so they could compare this user to other users and build models of what human/bot traffic looked like.


> ReCAPTCHA v3 doesn't include any clicking on images, as far as I can tell because that doesn't actually add that much in terms of identifying bots?

Perhaps I'm being considered human and just being generally unaware. It's possible. Do you have an example website where I can try it? I have uBo block recaptcha by default, so whenever a website uses it, it takes me a few clicks and page reloads until I get the prompt. I can't remember of a single instance where I didn't have to go through the challenge but maybe my own biases are in the way of me seeing it.

> Even if they limited themselves to tracking how users clicked the button they'd still need to store it, so they could compare this user to other users and build models of what human/bot traffic looked like.

This gives me the creeps. In any case, does this need to be accompanied by PII? And how can I validate that it's not?


This does seem like a good legitimate use case.

However, if Google transfers the data collected to its American servers or daughter companies, that would still make for a massive GDPR violation, both for Google for breaking the law and, if the situation does not get resolved, possibly for the companies using Google's services while it knowingly violates the law.


It's almost certainly possible to defend the use of a reCAPTCHA-like tool without user consent under article 6 para 1(f) (legitimate interests). My concern with using reCAPTCHA would be the track record of Google - they have a long history of testing the boundaries of data protection law and accepting fines as a cost of doing business.

Google don't appear to even mention GDPR in the marketing materials or docs for reCAPTCHA; they do claim that reCAPTCHA Enterprise (a separate, paid-for product) can be GDPR compliant, but I'd take that with a big pinch of salt. Competing CAPCHA services make much stronger claims regarding GDPR compliance and are much more transparent about how the service uses personal information.


> But it is a reminder that any business operating in the EU at this scale must incorporate a data privacy specialist into their requirements gathering and review processes. It's just the price of the ticket to play in the EU.

This sounds very American. You don't need a data privacy specialist to operate in the EU. You need to develop with privacy first by design. Treat all PII as radioactive. Literally. Yes, if you need to bolt this onto existing US software to make it "compliant", you're screwed and you'll need to call in a containment team like when you find radioactive cargo in your business that is not normally expected to handle it.

A lot of times topics like GDPR and privacy are brought up on HN I'm seeing comments that act like it's black magic. It really isn't. It's trivial to be "good enough" as far as legislation is concerned. The problem is just that over the past decades and especially in the US we've seen myriads of online services sprout up that now often seem integral but were built with a complete disregard for privacy and now need to either be retrofitted or somehow contained to become compliant. It's like finding out paint is radioactive after it has been marketed for decades with no regulation or oversight.

Internet companies have been playing it fast and loose with privacy well past the point that people started pointing out it's a (ethical if not legal yet) problem. That is now coming back to bite them. I'm okay with that. It's just a shame so many businesses are caught in the crossfire because those companies have also tried their best to make themselves integral and unavoidable. Good luck trying to migrate away from AWS/Azure/GCP for example.


Funnily enough, I work in technology in Europe and have consulted on GDPR compliance.

Organisations either care a lot of about their data obligations and have dedicated teams and reviews, or simply don't care, and don't want that cost passed on to them. And service providers, not wanting to lose a sale, just go ahead with whatever is easier. In this case, they shot themselves in the foot by not conducting due diligence for what would have been a fairly easy to recognise issue.

(or worse, were advised, and that advise was incorrect. But based on what I can translate in the original document, that is may not have been the case.)


> It's trivial to be "good enough" as far as legislation is concerned.

I'd be very interested to read any tutorials/howtos/faqs/etc. about that. I might be tempted to create a (very small) side SaaS-type project (and I'm located in Europe), and the main reason I haven't done it yet is that GDPR compliance looks really, really scary to me.


Being compliant with regulations such as the GDPR is expensive. Lawyers, security, and annual third party testing. I don't profit off EU users so I block all non-USA traffic and avoid the issue entirely.


It’s like saying complying with criminal law is expensive because lawyers are expensive.


If you've been responsible for implementing GDPR compliance I think you may have a different perspective.


It's like saying complying with truth in advertising laws is expensive: most companies barely have to think about it because they comply by default but for some companies it is extremely expensive.

Of course it's extremely expensive for them because they're trying to get as close to breaking the law as they can without actually breaking it. That requires expensive lawyers, constant monitoring and extremely fast response cycles. E.g. there are a lot of big companies making good money of exaggerated but legal health claims and their claims are all not just vetted by a team of expensive lawyers but also documented and tracked in such a way that if they do end up getting sued they can immediately find out where they are using that particular claim and withdraw all advertising material using it to comply with a cease and desist.

So, yes, if you want to run a business that is either intended to be willfully negligent for no good reason or exploit users with as little informed consent as you can get away with (likely because what you want to do is not in their best interest), you'll need a team of expensive lawyers.

But compared to actual nuclear storage (which is highly regulated for good reasons), or storing certain financial data (which requires PCI compliance), or storing medical records (which in the US requires HIPAA compliance) or filing your taxes correctly, GDPR compliance does not actually require an expensive external audit and certainly not a regular one.

Of course SOC2 compliance or ISO compliance are different matters and they may be involved in demonstrating GDPR compliance to business customers but they're neither necessary nor sufficient to comply with the GDPR or the ePrivacy directive.


Well, you could have had that for cheap: being US-based currently makes you non-compliant by default thanks to your government's warrantless surveillance (look up Privacy Shield and why it died).

But, no, none of these things are required to be GDPR compliant. Of course if you want to build a business on processing PII (and especially if it's any of the protected categories, e.g. you want to process personally identifiable medical data) the GDPR requires more effort from you because it's harder to maintain your users' privacy while doing this. And if you actually have no business doing this but still need to find a way to coerce your users into surrendering their data against their own interests (e.g. behavioral analytics, insurance risk scoring, etc), it's even harder to do this in a compliant way and opens you up to more scrutiny (rightfully so, I might add).

These things are not required to be compliant. These things may be involved in demonstrating compliance. But the lengths you have to go to to demonstrate compliance is very much a function of how privacy invasive your business is. A nuclear power plant will have more detailed radioactive waste management and radioactive material containment plans than a watch repair shop that occasionally handles radium-coated mechanical parts.

Specifically, other companies may insist you go to greater lengths to demonstrate your compliance to them if they want to do business with you, the same way you don't just buy nuclear waste containment equipment from some guy on eBay.

I did mean it literally when I said treat PII as radioactive.


I do love this argument against GDPR (or any data protection) because it often comes down to "Screw the US consumer, I'm profitable because they have no protections, so no one should have any."

... which is kinda the US business sentiment in general, I suppose.


captchas aren't going to be effective for much longer anyway with the recent dramatic improvements in AI


Maybe proof of work based captchas will be a viable alternative, like https://github.com/mCaptcha/mCaptcha:

> mCaptcha makes interacting with websites (computationally) expensive for the user. A well-behaving user will experience a slight delay (no delay when under moderate load to 2s when under attack; PoW difficulty is variable) but if someone wants to hammer your site, they will have to do more work to send requests than your server will have to do to respond to their request.


So, banning low-compute-capacity clients in the worst way possible. Awesome.


That should only be an issue if the server is under attack.

According to the docs (https://github.com/mCaptcha/mCaptcha/blob/master/docs/CONFIG...), you can set three difficulty levels:

MCAPTCHA_CAPTCHA_AVG_TRAFFIC_DIFFICULTY

MCAPTCHA_CAPTCHA_PEAK_TRAFFIC_DIFFICULTY

MCAPTCHA_CAPTCHA_BROKE_MY_SITE_TRAFFIC_DIFFICULTY

The defaults are set such that avg traffic takes ca 0.02s on an average system. Even if you have a really really slow system, I don’t think you‘ll ever spend more than 2s there.


Nah you need to think about it in terms of computing power compared to each other.

According to the screenshots of XMRig for android you only get about ~35H/s while my laptop does ~2400. That's 68x faster so if it took my laptop 2 seconds it would take a mobile device ~130 seconds.

It screws with mobile users and makes the whole crypto PoW thing about it using too much energy many times worse. Not to mention botnets could make use of enough computing power to easily outpace any captchas thrown at it.


> According to the screenshots of XMRig for android you only get about ~35H/s while my laptop does ~2400.

Is that for the specific proof of work algorithm mCaptcha uses? While I don't think you're going to get something that runs equally quickly on a low-end phone and high-end desktop, if it depends entirely on sequential operations and is not optimization-friendly you should be able to get much closer than 68x?


The screenshot here shows 117H/s, and I guess the Android version hasn’t been as heavily optimized: https://github.com/XMRig-for-Android/xmrig-for-android

I think we‘d need to compare apples to apples, and not use Monero mining as a benchmark for mCaptcha. Also, as I wrote in another comment, the average case (server is not under attack) is 0.02 seconds on a laptop, and probably 0.4s on an Android device even if we do use xmrig-android as comparison. Compare that to manually identifying stairs on pictures with crappy quality (10 seconds?).


The 2s are the default setting for when the server is under attack. In a normal scenario, it‘s ca 100x less. It would take your laptop 0.02s and your hypothetical phone ca 1s. But the screenshot for xmrig-android shows 117H/s, so it would take such a phone only 0.4s, going by that logic (I don’t think the performance penalty of mCaptcha is x20 on mobile though).


If you want to register to my site from your smart thermometer, then it’s your problem.


Nah you need to think about it in terms of computing power compared to each other.

For example a raspberry pi (using as a substitute for a phone) can do about ~100 hashes a second while my laptop does ~2400. That's 24x faster so if it took my laptop 2 seconds it would take a mobile device ~48 seconds.

See the issue yet? :P

EDIT: According to the screenshots of XMRig for android you only get about ~35H/s which means it would take over 2 minutes to pass a captcha on a phone.


The real issue I see is wasting energy…


0.02s of CPU time (average case) won’t make a measurable difference.


Proof of annoyance


They're not effective anyway. When Google's reCAPTCHA's went particularly aggressive for a while, I downloaded an addon to solve them automatically. It wasn't great (took two or three times) but it worked well enough.

If you've got a bot farm tunneling traffic through residential IP addresses (hello, free VPN clients!) then those extra tries aren't such a problem.

Hell, some spammers are paying actual people to solve reCAPTCHAs. Those people do nothing but click fire hydrants all day. There's no way to prevent those clickfarms from working without some advanced traffic analysis that will break the internet for a significant amount of people behind weird carrier middleboxes.

reCAPTCHAs are excellent at keeping away very basic bots, like Python scripts that just call HTTP endpoints. If you're trying to fight bots using browsers (WebDriver and friends), blocking bots becomes significantly harder, to the point your normal users will start to suffer if you set an effective bot prevention limit.


> I downloaded an addon to solve them automatically. It wasn't great (took two or three times) but it worked well enough

reCAPTCHA v3 doesn't involve clicking on images.


Honestly captchas haven’t been effective for a while. They’ve been pretty trivial to bypass.


It may be trivial but costs money at scale - which is the entire point of PoW.


TL;DR: no³

Slightly longer summary: if consent is granted, then you can use ReCAPTCHA³, but a bot programmer would just choose to have the bot click no on the consent prompt. Denying access after nonconsent is allegedly² illegal because attaching negative consequences to the 'no' button makes the 'yes' button no longer be considered a freely given consent¹.

¹ https://autoriteitpersoonsgegevens.nl/themas/internet-slimme... "Mag een website of app mij toegang weigeren als ik geen tracking cookies accepteer? \n\n Nee. U moet de mogelijkheid krijgen om tracking cookies te weigeren. Zonder dat dit nadelige gevolgen voor u heeft." Basically says: can I be denied if I say no to consent? No, it may not have negative consequences. (Don't y'all love national authorities' information about EU-wide legislation? So much fun translating for english audiences)

² because a judge can overturn what the data protection authority (DPA) claims. I am not aware of case law on this, but I am not a lawyer.

³ assuming this French DPA speaks for all of the EU, which is not an automatic truth, but almost always the case. There may be implementation differences but I am not aware of any significant ones. The only annoying part is that each country's DPA has to put their stamp on it to reaffirm it for that country, and humans tend to have a bit of RNG in what opinion they end up forming based on the same text.


Well, it is also unnecessary, as simple OSS solutions exist.

Users however tend to be greatly annoyed and inconvenienced by reCAPTCHA, which has grown increasingly difficult to decipher for real users. It is basically insulting to present real users with a CAPTCHA.


When everything is illegal, nothing is.


We've assumed this for a while now. We've stopped using reCAPTCHA together with all other services that may be sending our users' data outside of EU.

As a developer, personally I am very happy with this decision, and am thankful that the GDPR is finally making management take user privacy into account.


An alternative would be to use something like https://friendlycaptcha.com/privacy/gdpr/ I guess?


As a simple engineer who just wants to create and not spend time with lawyers, it is time to update my list of technologies made too dangerous to use by GDPR:

1. Analytics

2. Third-party resources

3. CDNs

4. DDOS protection services

5. reCAPTCHA

Anything else?


Servers


This is taking the piss now


This is the problem with the catch-all EU regulation. They needlessly create regulations that are so vague and broad that even completely bening activities are prohibited and you need your own law department to understand what exactly is required. Can't wait for the yet another button-clicking exercise when you will have to accept all the ToS of reCAPTCHA. Similar situation happening in the AI regulations. They are not sure what AI is or what it can do wrong, so they just slap a bunch of red tape on top of it and call it a day.


Stop tracking people and you need no consent and banner.

BTW the sides could simply respect the Do-Not-Track flag but somehow they prefer to annoy their users, just like it's on purpose to blame privacy laws.


I see no mention of reCAPTCHA tracking users. They are "sending data about device and application data", which seems like a completely natural thing considering what reCAPTCHA is used for.


Since the reCAPTCHA code is loaded from the google.com domain, among others, the tool automatically gains access to cookies that are set for logged-in Google users. One of the cookies is called NID and contains a unique user ID that is also used for Google Signals to recognize users even across devices. In this respect, it is almost irrelevant from a data protection perspective whether reCAPTCHA (situationally) sets further cookies or not.

In addition, reCAPTCHA also accesses the domain gstatic.com. As can be read on Google websites, this domain is also used by other tools. Thus, cookies can potentially be exchanged via this domain.


I'm pretty sure that's not the case. The EU law is pretty vague, specifying only that data is stored on your terminal. Local storage also counts. You might not be doing anything with it, but you still need the consent.


If you don't do anything with it, it's not necessary, therefore needs consent.

If you store a cookie to know to not track you don't need consent as long as you don't track which users have this cookie.



They kill plenty of people based on nothing at all. Data’s not the problem. Governments being blood thirsty is.


But google is thirsty for data too, and has to be stoped too.

I still believe that the first step to stop tracking should be done on client side (block 3rd party cookies by default, delete cookies on tab/window close, except those manually whitelisted, to stay looged in), but google still collects way too much data on all of us.


"fully benign" yes I'm sure Google is providing the service for free and not relying on the data it collects...

Though you're right that captcha/fonts etc should have been benign uses, if it could be anonymized/proxied then maybe that would have been a good solution.


Ugh, yeah, why should private commuters have to adhere to traffic laws if we can just apply them to commercial drivers exclusively.

But more seriously, that's not how it works. The regulations are already very permissive for smaller businesses. The heavy fines exist for the worst offenders. If you can demonstrate you made a conscious effort and just fell short and have already started trying to make amends when notified, that's often good enough. Of course if you just throw your hands up and decide privacy is just too hard, you'll be treated like any other loose gun.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: