The Starving Theater Artist: Poor man's outpainting

So one of the big uses of AI as a tool is filling in blanks in an image (say, if you did a Stalin on it), or extending the image.

(Oddly enough, one of the most famous images of the space program is extended. Buzz cut off Neil's head, but since the surrounding negative was black anyhow, they re-cropped it. And straightened it, too.)

So there's a stupid trick you can do with AI video generators; command a change of pose or camera orbit and let the AI interpolate the new image in three dimensions. With work, you can get what (the AI thinks) your model looks like when seen from a different angle. Pretty much, you can turn around the guy in the photograph. It's just the AI will make up a new face on the fly.

In practical terms, it will probably require some rework. But it is a fast-and-dirty way to get a different starter pose or camera angle on the same basic set/model/composition.

***

As I posted a bit back, I think the limitation on long renders is not actually a problem. Well, there are shots where you want to do a long tracking shot or a walk-and-talk. And there's formats like talking heads interview or podcast where the camera setup remains the same for minutes at a time. But especially if you are trying to tell a story, intercuts not only don't harm, they may even be necessary.

But back to that longer shot. After all, depending on which models you are using, how strong the prompting is, if you have useful LoRAs etc., the image can lose cohesion in as little as three seconds. Especially if that outpainting effect comes in; if the camera turns further than the previously seen setting, and the AI has chosen to put elements that don't fit your vision into those previously un-imaged locations.

In general, the video models are strongly biased towards taking the pixel patterns they see and mapping them to motions that are in their training data. It is a lot like the interpolations img2img uses all the time, except the idea of time/animation progression is added into the mix as a strong constraint.

Unfortunately, the AI really can't separate character moves from camera moves and it is almost impossible to lock the camera. That active, steadicam or hand-held camera, language is baked in to the models. It's the usual figure/ground, map/territory problem with AI. They don't know what a forest is, or what trees are. They get there by the fact that most forests have trees, and many trees are in forests.

So I've been messing around with extended videos.

The simplest solution is a cutaway, or change of angle or subject. I rendered a separate set of insets I could switch to whenever I needed to cover a break or change.

These still require observing the 180 rule and preferably keeping line of action as well. The latter is particularly important when cutting between related views. If the vehicle was moving right to left, even if you are cutting to a steering wheel, preserve that right-to-left. It makes the cut much smoother.

***

After that there is daisy-chaining. Especially since you can cut in and out using different angles and insets, you can go a pretty arbitrary length while maintaining the model. Keeping clarity on the set is a different matter and I don't have solutions to that yet.

i2v is the workhorse. This takes a starter image which is on-model, and animates from there. At some point it will diverge enough to become objectionable. In any case, the last-frame-extract node is great here; it pulls out a png of the last frame before compiling the video. (You can also pull the entire image stream and sift through them).

Why? Because you can take the last image, clean it up, and run a new i2v on that. Or you can do an arbitrary "generation" animation to get a different starting point, pull an image off that, and clean that up.

f2l has some advantage here. It is especially good for generating a join. You take as first image, the saved last image of the first animation. Then you take as last image for the f2l, the starter image for the clip that will be following.

The AI will do some weird things getting from A to B, though. As with all things AI, it sees things in a different way. We didn't notice a subtle change in the background because we were watching the action. The AI did, and has the martial artists suddenly engage in a little moonwalk to get to where it can join up with the background in the final image.

Best one I had yet was I had done a long daisy-chain and texture and LoRA burn-in had made the back wall look like a set from Beckett. The AI had the answer; a dozen frames before the end of the splicing clip, it had buckets of mud appear out of the air and throw themselves at the wall.

The odd one out here is s2v. I love sound-to-video because the presence of voice and sound effects makes the AI generate action. As with all things visual AI, it defaults towards static posing. "Model stands looking vaguely at the camera" is what you get so often even when you fill the prompt with action verbs.

I haven't learned that much about controlling the sound. A few experiments show that it is slightly better than Prisoner Zero at figuring out which mouth to work when given multi-character dialogue. I haven't tried it out yet on multiple musical instruments. It does seem to react to emotional content, though. Where it is extracting physical motion from, I don't know.

The other oddity of s2v is it allows use of an extension node that passes the latent on to the next node in the chain. It can get out to thirty seconds before the image degradation becomes too objectionable to continue on.

So what is this about "clean up?"

Yeah, this is what many people are doing now, at least according to the subreddit. Unsurprisingly, everyone wants to let the AI do the work, or at least automate it. So throw it into a Quen node at low denoise, possibly within the same workflow.

I'm cheating right now in that I haven't finished learning how to make a character LoRA. So instead of being able to plug-and-play, I drop the image back in AUTOMATIC, and I flip back and forth between several different models, employing various LoRAs and changing the prompt to focus in on problems.

And, yes, not just inpainting, but looping through an external paint application to address problem areas more directly.

It is a bit more than I need to address image degradation and get a clean starter image in high quality, as well as to stay on model, but it also means that taking an animation that produced a new view or state can be repainted, manipulated, inpainted, and otherwise brought on-model.

This stuff does mean I have a whole scatter of files, for which I have no consistent naming scheme. And can be a pain searching through clips to find the one that actually bridges two other clips properly. But it all sort of works.

Now I want to explore more interesting story beats. Something to do with fixing spaceships.

The Starving Theater Artist

Friday, October 31, 2025

Poor man's outpainting

No comments:

Post a Comment