Gaussian Head Avatar: Ultra High-Fidelity Head Avatar via Dynamic Gaussians

heliophobicdude · on Dec 8, 2023

This is excellent! A similar paper from Meta was published two days ago on Avatars too

https://shunsukesaito.github.io/rgca/

Here's the discussion for it(empty as of right now): https://news.ycombinator.com/item?id=38554537

ChuckMcM · on Dec 8, 2023

It is, I fully expect at some point we'll get "in game" MMORPG player characters that are emotively very very effective. The immersion level will be pretty intense at that point, especially if it's VR but even a 3rd person view type like WoW using a web cam to process your facial expression?

Another interesting application would be "zoom" meetings where everyone is shown around the table or in the audience and it processes their emotive state in real time. That could help speakers engage with the audience in a better way.

Of course the "bad" uses of this tech are pretty out there too, from porn apps, to ripping people off by getting a "zoom" from a relative.

godelski · on Dec 8, 2023

> Another interesting application would be "zoom" meetings where everyone is shown around the table or in the audience and it processes their emotive state in real time. That could help speakers engage with the audience in a better way.

See Permutation City[0]. Another application is actually masking reactions selectively. There's some interesting aspects that play out with respect to this and some other interesting aspects tech that people will now view as near Sci-Fi. The VR identity cloning is a common Sci-Fi plot, not specifically a Permutation City thing. Great book, highly recommend.

[0] https://en.wikipedia.org/wiki/Permutation_City

johnnymorgan · on Dec 9, 2023

Star citizen has a very basic version of FoIP, it does help convey emotive state even if a little janky.

Add in stuff like head tracking and you get a very natural feeling to interacting. Give me haptics and I'll be super happy.

Hand gestures go a long way too

muglug · on Dec 8, 2023

I think the tech in that paper was demoed for this podcast: https://youtu.be/MVYrJJNdrEg

The big roadblock to commercialisation for the moment is the original capture — for the paper they used a 110-camera capture rig under ideal lighting conditions.

In the above podcast Zuckerberg mentions that in the future people will be able to use their phones to do that same capture, but I don’t think that tech is coming next year.

I wonder if there’ll be an interim period where people who want high-quality avatars will have to book an appointment.

lelag · on Dec 8, 2023

There was also this project that was posted a couple days ago: https://blog.metaphysic.ai/controllable-deepfakes-with-gauss...

Why they call a virtual avatar a deepfake beats me though....

yjftsjthsd-h · on Dec 8, 2023

> Why they call a virtual avatar a deepfake beats me though....

What's the difference? In both cases you're simulating somebody's face in a way that doesn't actually require having the original to drive it.

peterleiser · on Dec 8, 2023

If you can "wear" the face of someone else then that seems like deepfake territory.

randall · on Dec 8, 2023

I didn’t expect Gaussian splats to be so good at approximating geometry. It’s cool when you see a new foundational approach to something that’s been done a certain way for decades.

eurekin · on Dec 8, 2023

Someone mentioned previously that guassian splats are ideal extension of Ai image generation into 3d and they might have turned out correct

kridsdale1 · on Dec 8, 2023

They’re basically voxels after we abandon Cartesian 3-dimensionality (which imposes cubes)

porphyra · on Dec 8, 2023

More like point clouds since they are not constrained to a grid.

NERF is more like voxels though.

blovescoffee · on Dec 8, 2023

What do you mean NERF is like voxels? It is very much NOT like voxels

porphyra · on Dec 9, 2023

The original NERF paper had a voxel-like grid, but instead of regular voxels with a single color per grid cell, they had an MLP in each cell that models the color as a function of position and angle.

blovescoffee · on Dec 9, 2023

The literal definition of a NeRF and rendering function in the original paper are a continuous non-discretized vector-field and sampling based render function.

porphyra · on Dec 14, 2023

I'm sorry, you're right. I got confused with some other papers like Plenoxels that use voxels.

blovescoffee · on Dec 8, 2023

Most of the time they aren't that good at approximating geometry. They are good at approximating the "appearance" of geometry. However, many regularization techniques and priors can be introduced to make the Gaussian splatting technique better at geometry approximation.

planckscnst · on Dec 8, 2023

The rotating avatars have some uncannyness to them due to the eye gaze not fixating on a target as it moves around. I think slowly rotating the camera around the model would have done better. Perhaps also some background elements so it's clear that the avatar isn't moving, the camera is.

I'm hoping this technique can be used in video games because it's significantly better than what we have now.

gigel82 · on Dec 8, 2023

Why would they even include a "Code" link if it's just an empty GitHub repo with a README?

blovescoffee · on Dec 8, 2023

It will get updated. This is pretty common in research papers. They're just getting the link attached because it's easier to update the repo than the paper.

kthejoker2 · on Dec 8, 2023

Stills look great, fidelity is there ...

The actual rotating avatar videos still have extremely poor approximations of human musculature especially at the eyes and jaw (bc these are hollow surface meshes naturally)

Is there research to overlay these models on more representative facial muscles?

PTOB · on Dec 8, 2023

And now: - the 4-hr Work Week toolkit is complete. - deep-fakes are now just regular fakes

cloudking · on Dec 8, 2023

When can we send our gaussian heads to the Zoom meeting?

Philpax · on Dec 8, 2023

Excited for the next generation of Personas for the Vision Pro, etc :-)

WhereIsTheTruth · on Dec 8, 2023

How can we trust photos now? Fiction or reality, it's becoming harder to differentiate

gsuuon · on Dec 8, 2023

I think that point has come and gone. Not sure how society will adapt - I really hope smart people are out there working on this sort of stuff.

singularity2001 · on Dec 9, 2023

at which point in the future or past will it be accepted theory that the singularity already happened some 20 years ago

croes · on Dec 8, 2023

We will get a whole new level of fake news.

Online meetings for important things are now unsecure because you can't be sure the other people are who they claim to be.

nuz · on Dec 8, 2023

People can lie, people can send emails that look like they're from your boss. People now know deepfakes are a thing and have an immunity from trusting suspicious online meetings where your boss acts different than they usually do. Etc etc. It's not as big of a threat as people want to make it out to be

DeIlliad · on Dec 8, 2023

People routinely fall for lies and get phished from those emails that look like they're from your boss. Every year there are a handful of high profile tech companies that get hacked because someone you would think should know better falls for a phishing scam. I think this is a bigger threat than people are making it out to be.

croes · on Dec 8, 2023

We will get "recordings" were people will plot the great reset.

We already got fake audio of news anchors apologizing for years of lies.

We will get a lot more of that.

jayd16 · on Dec 8, 2023

Security for a video call is from the user account not visual verification.

sbarre · on Dec 8, 2023

People get their company accounts compromised all the time.

It's one thing to get a poorly-worded email from your CFO asking for company bank info, but it's a whole other thing to be asked over a Zoom video call by who you think is the right person, but it's a fake gaussian splat avatar.

There's already precedent for scammers doing similar things using voice deepfaking over phone calls. This could be a whole new level of phishing.

croes · on Dec 8, 2023

Should be but people can easily be fooled.

cubefox · on Dec 8, 2023

"represented by controllable 3D Gaussians"

They just assume everyone knows what they mean with "3D Gaussian".

GaggiX · on Dec 8, 2023

I don't think a research paper is meant to be understood by everyone, and I imagine the authors don't have that expectation either.

drsopp · on Dec 8, 2023

No, they assume their peers do.

vmfunction · on Dec 8, 2023

It is academia ;-)

data-ottawa · on Dec 8, 2023

I actually have the same gripe, unfortunately there is a long history of academic fields naming things like this (there’s an entire Wikipedia page on this banned after Gauss/“Gaussian” https://en.wikipedia.org/wiki/List_of_things_named_after_Car...)

blovescoffee · on Dec 8, 2023

Not really. There are many things named after Gauss but a "Gaussian" is almost always meant to be a probability distribution / density function that is very well understood and defined (and common)

ndriscoll · on Dec 8, 2023

Seems like they just mean the vector version of a Gaussian function: f(r) = exp(-r•r). Basically a "bell curve" except in 3D so it's a ball that's dense at the center and dies off. Then the optimizer might learn to produce an intensity, offset, and width for each point in a cloud, so the A,B,C for A*f((r-B)/C) at each point or something.

blovescoffee · on Dec 8, 2023

This is in fact what the optimizer does. At least in the original paper, the model learns to skew and rotate the gaussians.

eigenvalue · on Dec 8, 2023

Take a linear algebra course or read a textbook before trying to read and understand cutting edge ML research!

cubefox · on Dec 8, 2023

Rude!