Things are going to get really weird in the next decade or two. Right now AI is able to "communicate" with us mostly via text, and indirectly via algorithms.
With human-like agents that have human-like voices, appearances, and mannerisms, they'll be able to communicate with millions of average Joes, and all in a way particular to that specific Joe. AI being able to do infinite amounts of "ambient work" and expose us to the results via human-like entities makes for a very different world of possibilities.
And right now the big AIs that most people directly interface with are clumsy and abstract, but they're also tied in the backend to large pools of closed data owned by mostly well-meaning entities. But how will things change when and if the AI Joe interacts with isn't directed and contained by wealthy US-based corporations?
You might be interested in the work of Soul Machines [0] (online demo on website), a company formed by Mark Sagar [1][2]. There are a few interesting talks by him if you search on Youtube.
Their AI-driven avatars are already in use in a handful of places, and fairly well received.
"The future is already here – it's just not evenly distributed."
— Appearance-wise, I think MetaHuman's stuff looks better than Soul Machines' — but SM's avatars are backed with a conversational AI, and are not available as a UE-plugin. So it's not an apples-for-apples comparison of course.
nice, this is going to become a commodity technology in few years. I can see Unity following and then so many other 3d design tool makers with similar capabilities.
I'm really confused about what we (as humans) really want. Here's this incredible technology to make human features, and video games already have excellent technology, yet the entities in video games don't _act_ like humans. It's like an incredible holographic cardboard cutout. The characters don't act like aliens either! They are just backdrops with scripted cut scenes.
I don't feel like this is an accident. But I don't know what to think of it, I'm still a bit confused about what we're really trying to get out of these facsimiles.
An ad inside your game that is a male/female/whatever of your preferred attributes selling you things with a seductive voice. Or a company selling that avatar to you as a in-game companion/skin. Or propagating services through that avatar.
Looking here, it fits into TenCent masterplan.
Sell an avatar/avatars to users as companions. Add it to snapchat - see it in AR!. Let it read a spotify audibook to you! Buy it clothes and skins, play with it in Fortnite or Roblox! Imagine all the possibilities of upselling, crosselling and overselling virtually generated items (your only cost is data and electricity - potential unlimited) for virtual characters that people will start to feel emotional ties to since they'll grow up surrounded with them or to fill an emotional void left by lack of socialisation.
>yet the entities in video games don't _act_ like humans
This was the source of a lot of complaints about Cyberpunk 2077. They made a lavishly detailed world, but it was all a millimeter of paint over nothing. Food stalls that don't sell food, cars that disappear when you're not looking at them. As it turns out, graphics are a lot easier than behavior! We should have guessed this, but while you can see all the complicated visual detail that should be hard to replicate, behavior is invisible.
Shamus Young had a good post drilling allllll the way down into the nitty gritty details in one small AI subsystem of a game, Thief 3, and what it would mean to upgrade it: https://www.shamusyoung.com/twentysidedtale/?p=270
>In the game, you can sneak up on anyone (servants, nobility, or guards) and whack them on the back of the head to knock them out. (You can also stab someone in the back, but that’s noisy and bloody, and why kill them when you can just knock them out?) Once they are knocked out, you usually need to hide them. If you leave them laying in the middle of the room or hallway, someone else is likely to come along and discover your work. When this happens, they always assume the victim is dead. Then they start with the running and the screaming and everyone searching for you.
>This can be amusing. I was in a large manor, working my way through a sleeping area for servants. One of them was already asleep in the bunk beds, but a few others were still wandering around. I zonked one of them and placed the sleeping victim into one of the beds. I thought I was being clever. Another servant came in, saw their compatriot in the bed and exclaimed, “Dead!? But who could have killed him? I’ll go tell the guards!” Then he ran off.
>Blast it all.
>I should have known better. The AI was just looking for knocked-out people. It didn’t care where the body was. I was so into the game I stopped the metagame thinking about AI and started thinking about what I’d do in the given situation. In that situation, placing a zonked person in a bed made a lot more sense than dumping them in a corner. However, to the AI it was just a poorly hidden body. Sigh.
>But fixing this problem would be tricky, and would involve a lot of extra work. The level designers would have to designate certain areas or objects as “beds”, and the programmers would need to make it so that bodies laying on beds rouse less suspicion than bodies found elsewhere. Then they would need to add some new dialog and behavior: If an NPC sees someone “sleeping” on a bed (most likely not in their own bed) while fully clothed and while they should be working, he shouldn’t ignore them, but he also shouldn’t run away screaming about murders and dead bodies. You need some new behavior along the lines of “try to wake someone up and then discover they have been knocked out”.
>But even with that extra effort, you can still have some amusing failure modes. Placing a servant girl on a bed in the priest’s quarters or the barracks should raise some eyebrows. Likewise, stacking two or more people in the same bed should tip off guards and servants that something is amiss. It wouldn’t make sense for them to just assume they all decided to take a nap together.
>It would be annoying to code in such a way that it works right. The programmer would probably need to take the victim’s position into account as well. If I just toss somebody on the bed so that their upper body hangs over the side and their head is resting on the floor, it’s going to look pretty stupid if someone comes along and assumes they’re asleep.
>Also, it seems like the length of time since the NPC’s last saw each other should be taken into account as well. If you greet your fellow housekeeper, walk out of the room, and come back a few seconds later to find them motionless in someone else’s bed, you are not going to think they are sleeping.
And that's just one tiny detail, resulting in a vast increase of work. Our old friend, the combinatorial explosion.
A goomba in Super Mario Brothers just walks back and forth. A couple lines of code. Doing better than that is not ten times harder, it's millions of times harder! Who would pay tens of thousands of AI programmers for a decade to make a version of Cyperpunk 2077 where each noodle vendor had a name, a home, a routine, a simulated economy... If you could do it, which you can't, it would an incredible shining gem of technical achievement... but would it make the game more interesting to play? Spend a thousand times as much to make the game 1.4 times more interesting?
"Metagame thinking" is really interesting. There is a definite sense in which you get inside the implementor's mind to try to figure out a game, including the balance they programmed into the game, and the shortcuts they must have taken.
But there's other facets besides this description of what Thief might do. A lot of a game is aesthetics, visual details that don't affect gameplay at all. There are behavioral aesthetics as well, which can be more incremental. They don't have to interact in clever ways with gameplay. They don't have to be resistant to the player hacking the system.
Small cases might be the way bystanders react. Or conversations you might overhear. The backstories of Dwarf Fortress are like this.
But there is a danger you are creating a pretty but ultimately annoying facade on a very fixed system. Games that put fancy conversation in front of a store interaction aren't fooling anyone. Still, if you can invoke an emotional background to a game I think it would be meaningful even if it doesn't significantly affect the optimal gameplay.
Absolutely. games need to be consumeable in a meaningful timespan. Complexity gets in the way. Do we really want a second life? Nothing beats reality.
Thanks for bringing up comb. explosion. I think its also a huge thing with autonomous driving...
Chloe, the digital avatar on the menu of the video game Detroit: Becoming Human is sort of a sneak peak into what that might be like. (Potential spoilers! Chloe has her own story line, and if you want to experience that, don't watch the whole video.)
I don't see how animated faces brings anything to the table aside from novelty factor. People are already plenty comfortable communication with just voice and text. Faces can help with rapport, but if you know if it's a machine, why would you care. The biggest value here will be in entertainment (movies and games) or deception.
>but if you know if it's a machine, why would you care.
It won't be a machine in all cases. You have recorded stuff from mo-cap stages and maybe somehow you consider that a machine you don't care about but would care about a movie (not sure why the distinction?).
But aside from that, VR headsets are also getting mouth cameras and eye tracking cameras that can track facial expressions. You can also facetrack from things like an iphone with face id while playing a game in 2D.
This (and tech like it) will be used for real time streaming and chatting, and we'll see a sharp increase in the number of virtual streamers/youtubers/etc. as well. Some may be run by teams, some a single person, these virtual personas are getting easier and easier to make now, and more realistic as we see with this new service.
It's very impressive. The innovation here is the generator, that is able to generate realistically constrained geometry and materials. It's probably based on massive amount of 3D scans and GANs, or something similar.
Notice that they're realtime rendered, so they don't look completely realistic, but that is the easy part that has been solved already. It's possible to render those characters with offline renderers and they will look photorealistic.
I strongly doubt there are any GANs involved, they are rarely used in production entertainment, and a solution doesn't really exist for vector geometry afaik. Likely very few scans as well. Just lots and lots of artist hours.
I'm with you that I doubt there are GANs running in real time here, however I don't know see why they wouldn't be using similar models for offline static asset generation.
As far as scans go, it's much cheaper than you would imagine to get high quality face scans. Rigging them for animation is the real challenge, and I'd be really impressed if they were using GANs for rigging.
Yeah, getting face-scans is cheap and easy. but cleaning them up for real-time rendering, skinning, and rigging as you mention is a huge pain. And then going from there to a generalized model is pretty crazy.
I can imagine they might be using scans that they fit their model to to generate some blendshapes, but it's just as likely that they don't.
Also, nice paper. An initial read of it though suggests it's only capable of building a set of textures and blendshapes, which implies that the actual topology is still in the realm of "artists" We really have no idea how to deal with geometry in ML.
I don't think you can create constraint-based system with 'artist hours'. Just like you can't create realistic 2D face generator without a GAN. It has to be based on samples of real people. There exists lots of generators, such as Daz3D, but you can't create realistic results in these because of their simplistic model. I haven't tested this yet though, so I might be assuming things.
I think you are just assuming things. I think most likely this is exactly like a lot of existing generators, such as Daz3D, but with a lot more work put into them. Especially regarding the material model, hair, animation morphs and the like.
Just look at the rig controls, it's exactly a better, and more polished version of rigging controls that have been used in animation studios for years. I think they are just doing what everyone else is doing, but with just a ton more money.
The companies that Epic acquired to do this do use GANs to create the character. It's not hand crafted. This is a lot of training data feeding an algorithm that decomposes captures down into the constituent parts that you see here.
I think they have been close for a few years already. We are edging closer every year but it is still somewhat Digital. Which is sort of the end of the S curve taking a very long time to perfection. May be in 5 years time I can barely tell the difference.
Edit: May be it is the Real Time Rendering.
But the real innovation is achieving this in hours and not weeks or months.
Sometimes I wonder if we are also getting better at distinguishing artificial humans as time goes on. I remember as a kid thinking that G-man in halflife-2 was almost photorealistic.
They need to work on randomizing the left right symmetry a bit in their demos. Humans aren't perfectly symmetrical, and it adds a lot of character to faces.
Many years ago, someone told me a little heuristic to determine if any new technology is likely to be successful. It doesn't always pick winners, but it rarely picks losers:
The first time I saw something like this was when Elder Scrolls IV: Oblivion used FaceGen (https://facegen.com/) for character customization. Realism has improved a tiny bit in the intervening 15 years.
You can't copy the customizer (assuming it rebakes new occlusion maps and stuff based on the shape customization), but the final result is a bit more open ended than that:
"You’ll also get the source data in the form of a Maya file, including meshes, skeleton, facial rig, animation controls, and materials."
Impressive. There is a cartoonlike quality. I wonder if that is there to keep the models from looking creepy or if they just wanted their models to look cartoony.
This is an awkward thing to demo because it has look good, but it also has to look "good". I didn't think the second video with the black woman and the asian man was very impressive. They look airbrushed, which makes sense because you don't want to say "hey, look at how believable these ugly craggy people are", but on the other hand I think it plays it too safe by not showing off what they can do.
Of course, that's not really the point. The point is that the metahuman creator tool apparently makes it very fast-and-easy-ish to create "metahuman" characters that look a notch or two better than the Final Fantasy movie from 20 years ago.
The older faces that they made in the video that demoed the tool itself, rather than the result, were pretty impressive.
Very impressive shaders, though I wonder what the performance will be. I guess these might be fine for single player games where there are 10 to 15 characters on screen at a time but I doubt they are ready for use cases in MMORPGs or similar games.
Having seen only the two videos on the article's page, I actually thought they'd passed the valley. To me, they successfully looked artificial (though there were moments where they could have been real) but almost never creepy.
It's probably an 80:20 problem of sorts. More like a 99:1. Externally, people mostly appear to be comprised of flesh and hair. Pick any random square inch and the odds are that you will land on flesh or hair. That gets different on eyeballs, but they are still just bags of goo connected to muscle. The devil there is in the details of the blood vessels and the iris and things.
And then you get to teeth, which are like wet rocks. You're building this thing that is going to bomb if it can't do soft, meaty, hairy things convincingly and then there's this itty bitty section you're only going to see sometimes that is a dark membraneous cave full of wet rocks.
I think one thing is they are translucent/subsurface scattering and the enamel->dentin layer shows through underneath in kind of volumetric ways. Even lower teeth will show through behind the tips of upper teeth a bit (with a standard overbite).
I'm not a figure modeller but I heavily use the Genesis 8 digital human from Daz 3D [0]
My guess is that teeth haven't been that interesting to date and that the modellers don't have access to hundreds of reference pictures / geometries of teeth for their modelling. As opposed to reference images of skin textures, etc. Tongues are probably the same. In fact I don't think I have ever created a render of a figure with their tongue out.
Once they take the time, the teeth should be easy to improve.
I would say that modeling teeth with geometry is very easy. The material, lighting, and rendering is the challenge, especially when you consider all of the occlusion (and hard/soft body dynamics) from your lips that is inevitable with teeth.
Some form of subsurface scattering[1] is a must, and not easily done in games/apps because you can't really "bake" it like you do for most other textures to run performantly.
Technically speaking an animator might know. A lighter would have a better idea. And finally a shader writer could tell you exactly.
If you were interested in the more technical breakdown of responsibilities in the industry: (of course it'll change from studio to studio, but this is pretty good)
As a 3D artist I remember being very very difficult to model a realistic human being. Then came zBrush and made it a little bit easier. I just saw the IGN video for this and I got to say, it's really impressive.
With human-like agents that have human-like voices, appearances, and mannerisms, they'll be able to communicate with millions of average Joes, and all in a way particular to that specific Joe. AI being able to do infinite amounts of "ambient work" and expose us to the results via human-like entities makes for a very different world of possibilities.
And right now the big AIs that most people directly interface with are clumsy and abstract, but they're also tied in the backend to large pools of closed data owned by mostly well-meaning entities. But how will things change when and if the AI Joe interacts with isn't directed and contained by wealthy US-based corporations?