I've been playing with the TrueDepth Camera APIs on the iPhone X. Some things I've noticed:
1) The ARKit "Face Mesh" seems to be a standard model that is scaled and skewed to fit your face (for example, it ignores glasses, still works if you put your hand in front of your face, etc). It is _not_ a 3D scan.
2) The "TrueDepth" data is not really all that granular. It seems similar to the depth map you get from the rear-facing cameras on the "plus" sized models. Here's what the sensor data spits out: https://twitter.com/braddwyer/status/930682879977361408
3) Apple is really good at marketing. It's been shown that, even if you cover the TrueDepth camera, features that "require" it still work fine (including Animoji and the apps that I've been developing using the front-facing ARKit APIs).
3.1) The lack of Animoji and front-facing ARKit seems to be a software limitation made for business reasons rather than a hardware limitation. See: Google's Pixel 2 portrait mode photos done using a single front-facing camera that have stacked up well against the ones from the iPhone X.
4) The scary part, which is vast dystopian databases of facial fingerprints is already being done with normal photographs. The depth data is not needed.
I agree with the author that the privacy implications of all-encompassing databases could be scary. But I disagree that this has anything to do with the iPhone X or its TrueDepth camera.
> 2) The "TrueDepth" data is not really all that granular. It seems similar to the depth map you get from the rear-facing cameras on the "plus" sized models. Here's what the sensor data spits out: https://twitter.com/braddwyer/status/930682879977361408
As @braddwyer himself notes, you can probably get a much better mesh integrating over time. It depends how long it takes to capture a single frame, but I imagine that's not long, so getting an order of magnitude improvement is probably quite easy.
> 4) The scary part, which is vast dystopian databases of facial fingerprints is already being done with normal photographs. The depth data is not needed.
And yes ... after all, humans are quite capable of identifying people with high accuracy from 2D photographs. Depth maps are not required for there to be serious privacy issues with such databases.
Given the small size of the laser projector, I imagine natural movement from the phone being hand-held would result in significant displacement of the projected dots over a 1s interval? Have you tried integrating the 15 frames to see what it looks like?
Getting minute, subpixel movements can ironically give you MORE resolution if you process it over time, though you'd probably need some sort of "anchor" points
I think the irony being implied is that normally when you're shooting video and your camera is jittering, you're effectively losing resolution compared to a static camera because of motion blur, whereas this depth mapping benefits from minute movements. Though looking at individual frames of video is different than combining them into a single sharper image, I get the counterintuitive feeling they were driving at.
> And yes ... after all, humans are quite capable of identifying people with high accuracy from 2D photographs. Depth maps are not required for there to be serious privacy issues with such databases.
Yes and no. We’re mostly good at that (with exceptions — I have to see someone a lot before I remember their face), but we evolved for small groups and therere are now enough people that doppelgänger is a profession.
On the other hand, databases are still a problem because collections of timestamped photos can reveal far too much about us once an identity is properly confirmed.
> And yes ... after all, humans are quite capable of identifying people with high accuracy from 2D photographs. Depth maps are not required for there to be serious privacy issues with such databases.
Which would be true if they were actually storing this on the cloud in a form they could access. As far as we know, they are not. That's the point of the "Secure Enclave".
> 1) The ARKit "Face Mesh" seems to be a standard model that is scaled and skewed to fit your face (for example, it ignores glasses, still works if you put your hand in front of your face, etc). It is _not_ a 3D scan.
This is how most (all?) state of the art acquisition methods for standard objects (faces, hands, etc) work. By warping a high res template to fit the data in some optimal manner, you get guarantees on the output topology without having to do tons of messy cleanup.
I just tried using Animojis with the TrueDepth camera covered. After a second the frame rate drops significantly (to roughly 10 fps) and the character's eyes glitch out. I'm convinced Animojis are doing something with the TrueDepth hardware. It still tracks head movement with just the camera, but its significantly slower and more error prone.
Initials reports suggested Animoji worked with the TruDepth camera covered, but detailed reports of subsequent experimentation have revealed that TruDepth is required at intervals, just not 100% of the time.
Yes, but the parent comment was arguing that Animojis are a marketing gimmick, suggesting they could be enabled on other phones without the depth sensing hardware. I was sharing my experience as a counterpoint.
Re 3)
I don’t know any details of Apples implementation, but typically computer vision algorithms integrate data from multiple sensors to generate a 3D model. The more data you have, the more robust the output will be.
It’s possible to generate reasonable 3D models of faces from a single photograph. [1]
The highest resolution 3D scans I’ve seen are produced by aligning data from multiple high resolution photographies.
The big problem with that aproach is that it requires a lot of detail in the source material. Smooth surfaces, blurry images, or noise because of poor lighting makes it impossible for the algorithm to find features to align.
This is where the dot matrix projector comes in: by projecting a bunch of dots on your face, you get features that the algorithm can align, making the scan faster and more robust in low light.
And if you're interested in building 3D models from multiple photographs, try Helicon Focus. You take a focus stacked set of images (basically get a macro lens, open it up pretty wide, and take a picture with the focal plane 1cm (etc.) apart until every part of your subject is in sharp focus), and it will look for the sharply-focused parts to infer depth information for the stack. It can then build you a 3D model.
Pretty neat stuff, though I've never found any actual artistic or practical use for it.
3) Apple is really good at marketing. It's been shown that, even if you cover the TrueDepth camera, features that "require" it still work fine (including Animoji and the apps that I've been developing using the front-facing ARKit APIs).
3.1) The lack of Animoji and front-facing ARKit seems to be a software limitation made for business reasons rather than a hardware limitation. See: Google's Pixel 2 portrait mode photos done using a single front-facing camera that have stacked up well against the ones from the iPhone X.
Does it also work in the dark without the depth camera?
I take anything Rene says with a very large grain of salt.
> The reason for the misconception comes from the implementation: The IR system only (currently) fires periodically to create and update the depth mask. The RGB camera has to capture persistently to track movements and match expressions. In other words, cover the IR system and the depth mask will simply stop updating and likely, over time, degrade. Cover the RGB, and the tracking and matching stops dead.
"...likely, over time, degrade."
1) He doesn't know.
2) It's Animoji, so why would it matter if it did degrade? There is already a stock 3D image of the Poop. It simply needs the RGB camera to track where your facial features are.
There's a big difference between what Apple is using to perform FaceID scans and the APIs it exposes to developers. Apple has historically been very cautious about the access it gives to developers, especially where matters of privacy are concerned.
> .. could be scary. But I disagree that this has anything to do with the iPhone X or its TrueDepth camera.
Well, lets just say there's the kind of scary fact that nobody, trustworthy, has audited this thing.
Like, should a company that didn't run a "doesn't let root login on first-try" test be allowed to be making such wide-ranging decisions as face-scanning?
What if I don't want to have my face scanned, but nevertheless need to pick up somebody's lost-phone/detonation-device? Shall I just wear a mask?
The point is that we have moved beyond a zone where 'disagree/-agree' means anything, any more. Our data is out there.
Not so sure I want my face involved where, preferably, my hands should be..
If you are an apologist for what is the equivalent of having hundreds of people in trenchcoats following every person on the planet, detailing their every public move and storing it forever, you don't have any reasonable expectation of your part in the documentary being underlaid with anything but sinister music.
I don't know if I'm an "apologist" for anything, but what you're describing has been the case for decades by now; it's an inherent property of cell phone networks. By 2005 we already all carried personally identifiable devices with microphones, geolocation, and cameras and a persistent connection to a network.
> what you're describing has been the case for decades by now;
Does that make it even the tiniest bit more acceptable, or does that mean it's really high time to stop it? Being an apologist kind of hinges on that, and no need to put anything in quotes.
What does this have to do with "out in public"? I have by law, custom, and common sense an expectation of not being 3D-scanned in a gas station bathroom, even if I pick up the phone that the previous visitor had dropped on the floor.
And if the previous visitor was on a Skype call with someone while in that bathroom, dropped the phone on the floor, and walked out, and you walked in and picked up the phone, you would have the same issue.
Many privacy-conscious settings that I've been in prohibit phones entirely.
In most countries you have no right not to be photographed (by anyone) when in public (by definition being in public is not being in private). I fully support this with the exception of the homeless (and I think it does somewhat support the right to cover your face in public if you wish).
> The scary part, which is vast dystopian databases of facial fingerprints is already being done with normal photographs. The depth data is not needed
Exactly what I've said from day one - that Apple's "FaceID is 50x more secure than Touch ID" claim, based on the False Acceptance Rate, was total bullshit. That only works if you're going to throw random data at the authentication mechanism.
But someone who's going to target you isn't going to do that. It's going to you a 3D profile of your face from your online photos or from CCTV cameras (to which not only the government has access, but hackers, too).
In practice, it's much more difficult to gain a clone of someone's fingerprint than it is to gain a clone of their 3D face.
1) The ARKit "Face Mesh" seems to be a standard model that is scaled and skewed to fit your face (for example, it ignores glasses, still works if you put your hand in front of your face, etc). It is _not_ a 3D scan.
2) The "TrueDepth" data is not really all that granular. It seems similar to the depth map you get from the rear-facing cameras on the "plus" sized models. Here's what the sensor data spits out: https://twitter.com/braddwyer/status/930682879977361408
3) Apple is really good at marketing. It's been shown that, even if you cover the TrueDepth camera, features that "require" it still work fine (including Animoji and the apps that I've been developing using the front-facing ARKit APIs).
3.1) The lack of Animoji and front-facing ARKit seems to be a software limitation made for business reasons rather than a hardware limitation. See: Google's Pixel 2 portrait mode photos done using a single front-facing camera that have stacked up well against the ones from the iPhone X.
4) The scary part, which is vast dystopian databases of facial fingerprints is already being done with normal photographs. The depth data is not needed.
I agree with the author that the privacy implications of all-encompassing databases could be scary. But I disagree that this has anything to do with the iPhone X or its TrueDepth camera.