|
Post by Sue Butcher on Jul 29, 2023 3:11:49 GMT
It's OK, if slightly wooden, for talking heads, but even then I think we need a bit of human thought along the lines of "Is that how that particular actor would play that scene?" The problem would be worse when body language is important, or there's a group interacting; there wouldn't be enough information from the stills to made good video completely automatically. However, the climax of "Fury From the Deep" would be a good project for AI reconstruction because there we have both the telesnaps and some of the alternate film takes to work with. There are stories where both the telerecording and telesnaps survive. That gives the opportunity to feed the telesnaps and, probably also, audio in and compare the output with the telerecording. That is the kind of "real world" test that's needed. Perhaps the system could pick up an actor's tone of voice and translate it into the appropriate facial expression, perhaps not. It would be easier to achieve that with Pertwee than Troughton, given their different acting styles.
|
|
|
Post by John Wall on Jul 29, 2023 8:49:34 GMT
There are stories where both the telerecording and telesnaps survive. That gives the opportunity to feed the telesnaps and, probably also, audio in and compare the output with the telerecording. That is the kind of "real world" test that's needed. Perhaps the system could pick up an actor's tone of voice and translate it into the appropriate facial expression, perhaps not. It would be easier to achieve that with Pertwee than Troughton, given their different acting styles. It’s Troughton where it’s most needed. Perhaps expressions could be “learnt” from other material and “pasted” in?
|
|
|
Post by George D on Jul 29, 2023 15:36:19 GMT
As women are better at body language then men, sue brings a valuable perspective that us men don't have.
Half of making recons is picking the correct facial expression for the scene. Many overlook this. It's not the ai fault but the reconstructioner or what he has to work with.
Hopefully, as ai improves, it might discover how to do this better through tonality. I noticed some facial change in the evil clip. The savages facial expression was poorly matched.
The easiest places to add ai are distance actions where there is little focus on faces body language etc.
This is not just the clips here. Even in lc, while it does bring it alive for us, often the facial expression picked may not match the action which is missing.
Ai appears a valuable tool that can make reconstructions better, but the human factor still has value.
|
|
|
Post by John Wall on Jul 29, 2023 16:21:59 GMT
What AI potentially provides is a super deluxe VidFIRE in that it interpolates between images that have a greater temporal separation. It is, however, extremely unlikely to do anything more than the donkey work. Things like Colour Recovery and (re)colourisation use the increased processing power now available for what is basically brute force and ignorance, but human oversight and intervention is still necessary.
I suspect that the best solution to sorting out facial expressions would be to first capture these from elsewhere, there must be plenty of examples of Troughton, Hines, etc, etc smiling, frowning, grimacing, etc, etc. Then see what the AI comes up with and, if necessary, “tell” it to use a particular expression.
|
|
|
Post by jcoleman on Jul 29, 2023 22:36:50 GMT
I'm no expert, but as I understand it you have to train an AI and it's an iterative process with the AI constantly improving as it learns.
Presumably you could start by feeding an AI information about existing footage (e.g. audio, camera script, telesnaps, set plans, set photos, etc.) for a short sequence, get it to try and generate that sequence and then have it to compare its results with the actual footage. Repeat this process thousands of times and the AI's output should, in theory, get closer and closer to matching the real footage.
Another way of teaching the AI would be to feed it hundreds of pieces footage of a particular character assigned with key words to describe the emotion being expressed so that it can learn how the actor expressed that emotion.
Then when it's producing a visual that closely resembles the actual footage you can let it loose on sequences that are missing, providing it with the same information it has learned from.
|
|
|
Post by John Wall on Jul 29, 2023 22:53:47 GMT
I'm no expert, but as I understand it you have to train an AI and it's an iterative process with the AI constantly improving as it learns. Presumably you could start by feeding an AI information about existing footage (e.g. audio, camera script, telesnaps, set plans, set photos, etc.) for a short sequence, get it to try and generate that sequence and then have it to compare its results with the actual footage. Repeat this process thousands of times and the AI's output should, in theory, get closer and closer to matching the real footage. Another way of teaching the AI would be to feed it hundreds of pieces footage of a particular character assigned with key words to describe the emotion being expressed so that it can learn how the actor expressed that emotion. Then when it's producing a visual that closely resembles the actual footage you can let it loose on sequences that are missing, providing it with the same information it has learned from. I’m also no expert but that sounds reasonable to me. I think you’d have to do it scene-by-scene to avoid problems with discontinuities.
|
|
|
Post by Sue Butcher on Jul 30, 2023 8:13:09 GMT
That's the gist of it. AI isn't a magic wand; it can recognise patterns and manipulate images very rapidly, but its "intelligence" in terms of understanding the real world of complex three-dimensional spaces and human interaction is limited. Producing a really convincing facsimile of a human actor is difficult. You are going to need a lot of visual reference material and a human operator who understands acting to guide the production. But it's much harder than conventional animation with its artist-drawn actors, because if the results fall short you're in an "uncanny valley" populated by things that look like androids.
|
|
|
Post by stevehoare61 on Jul 30, 2023 9:05:33 GMT
The technology used to create ABBAs Voyage show, which despite being many hundreds of millions of pounds, years of creation and cutting edge techniques, works amazingly as some of you will know. If that kind of tech could be introduced to AI, using the same procedures that they used, would any one know if that was a feasible way of creating real realistic images?
|
|
|
Post by Sue Butcher on Jul 30, 2023 12:37:06 GMT
The technology used to create ABBAs Voyage show, which despite being many hundreds of millions of pounds, years of creation and cutting edge techniques, works amazingly as some of you will know. If that kind of tech could be introduced to AI, using the same procedures that they used, would any one know if that was a feasible way of creating real realistic images? The virtual ABBA show used motion capture to animate digital avatars. Perhaps they used some AI techniques to create the basic avatars (I'm not certain), but the movements of the figures are telemetric recordings of the actual band members. So this technique requires physical measurements of human actors performing. Not a quick or cheap process!
|
|
|
Post by jcoleman on Jul 30, 2023 12:40:52 GMT
A different technique, but here's an old demonstration from HJ Robins showing the way deep fake improves with 'training':
|
|
|
Post by jcoleman on Jul 30, 2023 12:58:15 GMT
And here's a brief recreation from Mr Robins:
And a couple from Gav Rymill:
These are not AI as such, but give an indication of what one day might be possible.
The last uses wav2lip to change the mouth movements to match the dialogue, which is interesting. It would be theoretically possible to rotoscope shots of a character expressing different emotions and build up a bank of assets that could then be lip-synced and placed against different backgrounds to recreate some shots. The more movement there is the less effective the lip-syncing so footage would have to be carefully selected, but it would retain that element of the actors performance. It would take an awful lot of time and effort! If you can train an AI to do those tasks though...
|
|
|
Post by awpeacock on Jul 30, 2023 13:47:59 GMT
I think we're confusing DeepFake with AI here. The video on the first page is done using AI. The rest (I'm assuming by looking at them) were all done using DeepFake (hence all just being lip movements and no moving parts). To recreate any MEs using DeepFake would indeed require motion capture and getting a bunch of actors in to basically recreate every scene for the computer to then overlay the correct graphics, a la ABBA, but the advent of AI takes away a lot of the need for this I think.
Given the pace at which AI is proceeding, I wouldn't be surprised if we had a believable 5-10 second clip of a missing scene before the year is out. We've already got AI having 30 minute support telephone conversations with unknowing customers, and songs being realistically sung by dead artists. When whole episodes can be done however is anyone's guess - there needs to be an inclination from the Beeb to do it and the money put into it.
|
|
|
Post by lousingh on Jul 30, 2023 14:20:50 GMT
Has anyone thought of merging telesnaps with AI and deep fakes? The telesnaps and the sound would give an approximation of what the action would look like, which the actors would do, and then the AI uses that as a template to remake the episode.
|
|
|
Post by jcoleman on Jul 30, 2023 14:58:03 GMT
I think we're confusing DeepFake with AI here. No confusion. As clearly stated, the examples above are not AI. The first simply shows how more 'training' improves the results, which is true of both deep fake and the AI that started this thread. The other examples were just to give an indication of what creative individuals have been able to achieve using technology to show the types of results that are already possible. AI is the next tool on the horizon.
|
|
|
Post by John Wall on Jul 30, 2023 18:34:30 GMT
Well, it’s clear that the answer ain’t 42, but all of the above 👍
|
|