Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been playing with the TrueDepth Camera APIs on the iPhone X. Some things I've noticed:

1) The ARKit "Face Mesh" seems to be a standard model that is scaled and skewed to fit your face (for example, it ignores glasses, still works if you put your hand in front of your face, etc). It is _not_ a 3D scan.

2) The "TrueDepth" data is not really all that granular. It seems similar to the depth map you get from the rear-facing cameras on the "plus" sized models. Here's what the sensor data spits out: https://twitter.com/braddwyer/status/930682879977361408

3) Apple is really good at marketing. It's been shown that, even if you cover the TrueDepth camera, features that "require" it still work fine (including Animoji and the apps that I've been developing using the front-facing ARKit APIs).

3.1) The lack of Animoji and front-facing ARKit seems to be a software limitation made for business reasons rather than a hardware limitation. See: Google's Pixel 2 portrait mode photos done using a single front-facing camera that have stacked up well against the ones from the iPhone X.

4) The scary part, which is vast dystopian databases of facial fingerprints is already being done with normal photographs. The depth data is not needed.

I agree with the author that the privacy implications of all-encompassing databases could be scary. But I disagree that this has anything to do with the iPhone X or its TrueDepth camera.



> 2) The "TrueDepth" data is not really all that granular. It seems similar to the depth map you get from the rear-facing cameras on the "plus" sized models. Here's what the sensor data spits out: https://twitter.com/braddwyer/status/930682879977361408

As @braddwyer himself notes, you can probably get a much better mesh integrating over time. It depends how long it takes to capture a single frame, but I imagine that's not long, so getting an order of magnitude improvement is probably quite easy.

> 4) The scary part, which is vast dystopian databases of facial fingerprints is already being done with normal photographs. The depth data is not needed.

And yes ... after all, humans are quite capable of identifying people with high accuracy from 2D photographs. Depth maps are not required for there to be serious privacy issues with such databases.


@braddwyer is me :)

It gives you about 15 fps of depth data


Haha, cool! Hi :)

Given the small size of the laser projector, I imagine natural movement from the phone being hand-held would result in significant displacement of the projected dots over a 1s interval? Have you tried integrating the 15 frames to see what it looks like?


I haven't yet.

We submitted a game about 3 weeks ago using front-facing ARKit as its core game mechanic and it hasn't been approved by Apple yet.

I'm waiting to see if they're going to allow us to use the new technology in novel ways or not before I invest a lot more time in it.


Getting minute, subpixel movements can ironically give you MORE resolution if you process it over time, though you'd probably need some sort of "anchor" points


That doesn’t seem ironic to me.


I think the irony being implied is that normally when you're shooting video and your camera is jittering, you're effectively losing resolution compared to a static camera because of motion blur, whereas this depth mapping benefits from minute movements. Though looking at individual frames of video is different than combining them into a single sharper image, I get the counterintuitive feeling they were driving at.


Could you stabilize this before integrating? Using feature points and matching them up, perhaps?


I imagine something like that would be necessary. The techniques would probably be ones related to those used in SLAM [1].

[1] https://en.wikipedia.org/wiki/Simultaneous_localization_and_...


> And yes ... after all, humans are quite capable of identifying people with high accuracy from 2D photographs. Depth maps are not required for there to be serious privacy issues with such databases.

Yes and no. We’re mostly good at that (with exceptions — I have to see someone a lot before I remember their face), but we evolved for small groups and therere are now enough people that doppelgänger is a profession.

On the other hand, databases are still a problem because collections of timestamped photos can reveal far too much about us once an identity is properly confirmed.


> And yes ... after all, humans are quite capable of identifying people with high accuracy from 2D photographs. Depth maps are not required for there to be serious privacy issues with such databases.

Which would be true if they were actually storing this on the cloud in a form they could access. As far as we know, they are not. That's the point of the "Secure Enclave".


The Secure Enclave doesn’t do much good if you are providing the face map to any developer that wants it.


> 1) The ARKit "Face Mesh" seems to be a standard model that is scaled and skewed to fit your face (for example, it ignores glasses, still works if you put your hand in front of your face, etc). It is _not_ a 3D scan.

This is how most (all?) state of the art acquisition methods for standard objects (faces, hands, etc) work. By warping a high res template to fit the data in some optimal manner, you get guarantees on the output topology without having to do tons of messy cleanup.


How does that work with people with non standard features? Think glass eyes, acid burns, missing fingers, etc.


It doesn't.

I'd be curious to know if face unlock had problems with vision impairment, i.e. no gaze vector.


The “require attention" feature can be disabled.


I just tried using Animojis with the TrueDepth camera covered. After a second the frame rate drops significantly (to roughly 10 fps) and the character's eyes glitch out. I'm convinced Animojis are doing something with the TrueDepth hardware. It still tracks head movement with just the camera, but its significantly slower and more error prone.


Initials reports suggested Animoji worked with the TruDepth camera covered, but detailed reports of subsequent experimentation have revealed that TruDepth is required at intervals, just not 100% of the time.


> I'm convinced Animojis are doing something with the TrueDepth hardware

Weren't they specially created to make use of that 3D cam?


Yes, but the parent comment was arguing that Animojis are a marketing gimmick, suggesting they could be enabled on other phones without the depth sensing hardware. I was sharing my experience as a counterpoint.


Re 3) I don’t know any details of Apples implementation, but typically computer vision algorithms integrate data from multiple sensors to generate a 3D model. The more data you have, the more robust the output will be.

It’s possible to generate reasonable 3D models of faces from a single photograph. [1]

The highest resolution 3D scans I’ve seen are produced by aligning data from multiple high resolution photographies.

The big problem with that aproach is that it requires a lot of detail in the source material. Smooth surfaces, blurry images, or noise because of poor lighting makes it impossible for the algorithm to find features to align.

This is where the dot matrix projector comes in: by projecting a bunch of dots on your face, you get features that the algorithm can align, making the scan faster and more robust in low light.

[1]: http://kunzhou.net/2015/hqhairmodeling.pdf


And if you're interested in building 3D models from multiple photographs, try Helicon Focus. You take a focus stacked set of images (basically get a macro lens, open it up pretty wide, and take a picture with the focal plane 1cm (etc.) apart until every part of your subject is in sharp focus), and it will look for the sharply-focused parts to infer depth information for the stack. It can then build you a 3D model.

Pretty neat stuff, though I've never found any actual artistic or practical use for it.


Is there an iPhone app to capture such images, e.g. using the dual camera?


A skilled human with a decent tool can also make a pretty good 3D model from one face image in less than 5 minutes. https://youtu.be/Eq0tTzCwXNI


That is not a 3d model, much less a pretty good one.


That is a wireframe but Blender can turn a wireframe into a mesh and vice versa as the video title states.


Can it really? It thought he intended to use this as a reference when (manually) doing the actual modelling.


It looks like one of those old conversation pieces that had the grid of needles you could push your face or hand into to make a 3D image.


3) Apple is really good at marketing. It's been shown that, even if you cover the TrueDepth camera, features that "require" it still work fine (including Animoji and the apps that I've been developing using the front-facing ARKit APIs).

3.1) The lack of Animoji and front-facing ARKit seems to be a software limitation made for business reasons rather than a hardware limitation. See: Google's Pixel 2 portrait mode photos done using a single front-facing camera that have stacked up well against the ones from the iPhone X.

Does it also work in the dark without the depth camera?



I've done some experimenting with it in my own apps.

I was really surprised how well it does even when covering up the IR sensor prior to opening the app.

I don't doubt that they are using the IR data to improve things. But it does "good enough" without.


I take anything Rene says with a very large grain of salt.

> The reason for the misconception comes from the implementation: The IR system only (currently) fires periodically to create and update the depth mask. The RGB camera has to capture persistently to track movements and match expressions. In other words, cover the IR system and the depth mask will simply stop updating and likely, over time, degrade. Cover the RGB, and the tracking and matching stops dead.

"...likely, over time, degrade."

1) He doesn't know.

2) It's Animoji, so why would it matter if it did degrade? There is already a stock 3D image of the Poop. It simply needs the RGB camera to track where your facial features are.


>The lack of Animoji and front-facing ARKit seems to be a software limitation made for business reasons rather than a hardware limitation.

The A11 chip has dedicated “neural engine” hardware which is used for Animoji and other facial recognition tasks.

How much could be done in the standard CPU on other devices I’m not sure.


The iPhone 8 and 8+ have the same A11 chip as the iPhone X.


Well that's not what I was sold at the keynote. Has this been verified? I thought my face was being scanned with 30k dots and all that.


There's a big difference between what Apple is using to perform FaceID scans and the APIs it exposes to developers. Apple has historically been very cautious about the access it gives to developers, especially where matters of privacy are concerned.


Ahh. I overreacted. This must be it.


Sure. But 30,000 is only 150x200. And not all of them are going to hit your face.


> .. could be scary. But I disagree that this has anything to do with the iPhone X or its TrueDepth camera.

Well, lets just say there's the kind of scary fact that nobody, trustworthy, has audited this thing.

Like, should a company that didn't run a "doesn't let root login on first-try" test be allowed to be making such wide-ranging decisions as face-scanning?

What if I don't want to have my face scanned, but nevertheless need to pick up somebody's lost-phone/detonation-device? Shall I just wear a mask?

The point is that we have moved beyond a zone where 'disagree/-agree' means anything, any more. Our data is out there.

Not so sure I want my face involved where, preferably, my hands should be..


> What if I don't want to have my face scanned, but nevertheless need to pick up somebody's lost-phone/detonation-device? Shall I just wear a mask?

Well...yeah. If you're out in public and your face is visible, you don't have a reasonable expectation of privacy.


If you are an apologist for what is the equivalent of having hundreds of people in trenchcoats following every person on the planet, detailing their every public move and storing it forever, you don't have any reasonable expectation of your part in the documentary being underlaid with anything but sinister music.


I don't know if I'm an "apologist" for anything, but what you're describing has been the case for decades by now; it's an inherent property of cell phone networks. By 2005 we already all carried personally identifiable devices with microphones, geolocation, and cameras and a persistent connection to a network.


> what you're describing has been the case for decades by now;

Does that make it even the tiniest bit more acceptable, or does that mean it's really high time to stop it? Being an apologist kind of hinges on that, and no need to put anything in quotes.


"People should stop carrying mobile phones" is an interesting proposition, but a fairly tangential one.



So... How do I turn it off?


What does this have to do with "out in public"? I have by law, custom, and common sense an expectation of not being 3D-scanned in a gas station bathroom, even if I pick up the phone that the previous visitor had dropped on the floor.


And if the previous visitor was on a Skype call with someone while in that bathroom, dropped the phone on the floor, and walked out, and you walked in and picked up the phone, you would have the same issue.

Many privacy-conscious settings that I've been in prohibit phones entirely.


You don't have anything of the short by law. And not even by custom. It's just that such technology wasn't available in a popular device before.


Honestly, I can't fathom why you're being downvotes, and this is exactly the predicament I'm most concerned about, personally.

Like, its a cool technology - sure. But has nobody thought of the militarisation of it? Sheesh.


In most countries you have no right not to be photographed (by anyone) when in public (by definition being in public is not being in private). I fully support this with the exception of the homeless (and I think it does somewhat support the right to cover your face in public if you wish).


And in some states, like Virginia, it's not legal (for the most part) to wear a mask in public if you're over the age of 16.


While I love the idea of everyone walking around in masks, do that at the moment and you are likely to be arrested or shot.


> The scary part, which is vast dystopian databases of facial fingerprints is already being done with normal photographs. The depth data is not needed

Exactly what I've said from day one - that Apple's "FaceID is 50x more secure than Touch ID" claim, based on the False Acceptance Rate, was total bullshit. That only works if you're going to throw random data at the authentication mechanism.

But someone who's going to target you isn't going to do that. It's going to you a 3D profile of your face from your online photos or from CCTV cameras (to which not only the government has access, but hackers, too).

In practice, it's much more difficult to gain a clone of someone's fingerprint than it is to gain a clone of their 3D face.


> It's going to you a 3D profile of your face from your online photos or from CCTV cameras

Good thing I'm not Jason Bourne, and the most likely scenario for someone trying to get into my phone involves my sister's kids.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: