Pages

Sunday, October 12, 2025

Blue

Another plot bunny visited. This one is blue.

 

(From fanpop.com)

The first novel I finished, I had been writing an article for a gaming magazine. This was a little more straight-forward; I was thinking about concepts as a way to grasp cinematographic challenges of AI.

In that mix is thoughts I've had about wanting to do one of those Hornblower-esque career space navy things, and about wanting to do an engineer (who is more of a hacker), and thoughts about how to help Penny survive those situations where being Mistress of Waif-Fu would really, really come in handy.

And I ended up with a nice demonstration of how one idea can snowball into elaborate world-building...if you follow the potential implications. Start with a "heuristic implant." We're not going to get into the specifics of the tech here. Practically speaking, the young would-be soldier is sat under a helmet for a few hours, and when they get up;


Except experimental, very think-tank, with all the McNamara that promises. It is a whole set of combat skills that are deep-level muscle memory and happen basically automatically under the right stimuli. That right there is a whole host of problems. Bad enough your hands need to be licensed as deadly weapons -- now they are self-driving.

So that's a great character flaw, this killer instinct that could fire at the wrong moment, but at the same time, something that could help them through a sticky situation. Obviously (obviously, that is, when looked at from the needs of story, not through empty speculation on fantasy technology), they start meditating to at least control when it happens. And their relationship with this phantom driver...evolves.

(Yeah...Diadem from the Stars, The Stars My Destination, The Last Airbender -- not like this is exactly new ground.)

And that suggests a evolving situation, where a new kind of low-intensity warfare is challenging a military that organized entirely around capital ship combat. Of course that's a tired truism; the "always fighting the last war."

But that leads to wondering if this is really a navy at all. Or more like United Fruit Company, a massive corporate mercantile thing that works by rote and training and regulation, has been largely getting by with having a huge industrial base and leading-edge tech, but has expanded into a sector of space where the rules are a little different.

And now we've got multiple parties in the mix; old-school frontier traders who have the wisdom of experience, a cadre of experienced officers who want to create an actual military with a good esprit d'corps (not the same thing as free donut day at the office and "work smarter" posters in the cubicles), the friction between what is becoming an actual navy versus what is more like a merchant marine, the fresh-from-university theorists who pay far too much attention to how management thinks the world works (or should) and want quick-fix technological solutions over expensive training...

...and Ensign Blue in the middle of it, a trial run of one of the crazier outliers of the "super-soldier" package that various hard-liners have convinced themselves is the best way to win low-intensity conflict in an extremely politicized environment, a ship's engineer for a merchant ship who has no business at all getting hauled off to do dangerous missions on contested planets.

Oh, yeah. And engineer? Blue is the kind of engineer I've been seeing a lot of recently. Can (and often will) science the hell out of something, making the most amazing calculations. Then can't resist trying out a idea with ham-handed duct-tape and rat's-nest wiring that too often breaks (and sometimes catches fire).

Where does this go in character and career progression? Is there some third party, some out-of-context threat lurking behind what are still seen as basically raiding parties on the company's mining outposts? Are the rules about to go through another paradigm shift, tipping everyone from new enemies to unexpected allies into a brutal war?

Yeah, I got other books to write.

Tuesday, October 7, 2025

Anaconda

My go-to ComfyUI workflow now has more spaghetti than my most recent factory.


 (Not mine; some guy in Reddit.)

The VRAM crunch for long videos seems to rest primarily in the KSampler. There's an s2v workflow in the templates of a standard ComfyUI install that uses a tricky little module that picks up the latent and renders another chunk of video, for all of them to be stitched together at the end. With that thing, the major VRAM crunch is size of the image.

Of course there's still the decoherence issue. I've been running 40-second tests to see how badly the image decomposes over that many frames. Also found the quality is acceptable rendering at 720 and upscaling to 1024 via a simple frame-by-frame lanczos upscaler (nothing AI about it). And I'm rather proud I figured that our all by myself. At 16 fps and with Steps set down at 4 I can get a second of video for every minute the floor heater is running.

Scripting is still a big unknown. I've been experimenting with the s2v (sound to video) and as usual there are surprises. AI, after all, is an exercise in probabilities. "These things are often found with those things." It is, below the layers of agents and control nets and weighting, a next-word autocomplete.

That means it seems to have an uncanny ability to extract emotional and semantic meaning from speech. It is strictly associational; videos in the training material tended to show a person pointing when the vocal patterns of "look over there" occurred. More emergence. Cat logic, even.



So anyhow, I broke Automatic1111. Sure, it had a venv folder, but somehow Path got pointed in the wrong direction. Fortunately was able to delete python, clean install 3.10.9 inside the SD folder, Automatic1111 came back up and ComfyUI was still safe in its own sandbox. And now to try to install Kohya.


Experimenting with the tech has led to thinking about shots, and that in turn has circled back to the same thing I identified earlier, a thing that becomes particularly visible when talking about AI.

We all have an urge to create. And we all have our desires and internal landscapes that, when given the chance, will attempt to shape the work. Well, okay, writing forums talk about the person who wants to have written a book; the book itself being of no import, just as the nature of the film they starred in having nothing to do with the desire to be a famous actor. It is the fame and fortune that is the object.

In any case, the difference between the stereotype of push-button art (paint by numeric control) and the application of actual skills that took time and effort to learn is, in relation to the process of creation itself, just a matter of how granular you are getting about it.

Music has long had chance music and aleatoric music. Some artists throw paint at a canvass. And some people hire or collaborate. Is a composer not a composer if they hire an arranger?

That said, I feel that in video, the approach taken by many in AI is getting in the way of achieving a meaningful goal. As it exists right now, AI video is poorly scriptable, and its cinematography -- the choice of shots and cutting in order to tell the story -- is lacking. This, as with all things AI, will change.

But right now a lot of people getting into AI are crowding the subreddits asking how to generate longer videos.

I'm sorry, but wrong approach. In today's cinematography, 15 seconds is considered a long shot. Many movies are cut at a faster tempo than that. Now, there is the issue of coverage...but I'll get there. In any case, this is just another side of the AI approach that wants nothing more than to press buttons. In fact, it isn't even the time, effort, or artistic skills or tools that are being avoided. It is the burden of creativity. People are using AI to create the prompts to create AI images. And not just sometimes; there are workflows designed to automate this terribly challenging chore of getting ChatGPT to spit out a string of words that can be plugged into ComfyUI.

Art and purposes change. New forms arise. A sonnet is not a haiku. There is argument to recognize as a form the short-form AI video that stitches together semi-related clips in a montage style.

But even here, the AI is going to do poorly at generating it all in one go. It will do better if each shot is rendered separately, and something (a human editor, even!) splices the shots together. And, especially if the target is TikTok or the equivalent, the individual shots are rarely going to be more than five seconds in length.


Cutting to develop a story, using language similar to modern filmic language, is a different beast entirely. The challenge I'm thinking a lot about now is consistency. Consistency of character, consistency of set. There are also challenges in matching camera motions and angles if you want to apply the language correctly. For that shot-reverse-shot of which the OTS is often part, you have to obey the 180 rule or the results become confusing.

One basic approach is image to video. With i2v, every shot has the same starting point, although they diverge from there. As a specific example, imagine a render of a car driving off. In one render, the removal of the car reveals a fire hydrant. In the second render from the same start point, a mailbox. The AI rolled the dice each time because that part of the background wasn't in the original reference.

One weird problem as well. In editing, various kinds of buffer shots are inserted to hide the cuts made to the master shot. The interview subject coughed. If you just cut, there'd be a stutter in the film. So cut to the interviewer nodding as if listening (those are usually filmed at a different time, and without the subject at all!) Then cut back.

In the case of an i2v workflow, a cutaway done like this would create a strange déjà vu; after the cut, the main shot seems to have reset in time.

So this might actually be an argument for a longer clip, but not to be used as the final output; to be used as a master shot to be cut into for story beats.

Only we run into another problem. It is poorly scriptable at present. In the workflows I am currently using, there's essentially one idea per clip. So a simple idea such as "he sees the gun and starts talking rapidly" doesn't work with this process.

What you need is to create two clips with different prompts. And you need to steal the last frame from the first clip and use it as the starting image of the second clip. Only this too has problems; the degradation over a length of a clip means even if you add a node in the workflow to automatically save the target frame, it will need to be cleaned up, corrected back to being on-model, and have the resolution increased back to the original.

And, yes, I've seen a workflow that automates all of that, right down to a preset noise setting in the AI model that regenerates a fresh and higher-resolution image.

My, what a tangled web we weave.

Monday, October 6, 2025

Cryptic Triptych

I got the PC I built up and running, after the usual 22h2 hassle (tip; don't use the internal updater. Run the web installer at Microsoft. For as long as that lasts!)

ComfyUI is sandboxed (and a one-click install) and Automatic1111, though now an abandoned project, also installs a venv folder within the stable_diffusion folder, meaning it can run on Python 3.10.6  Now trying to get Kohya running, and learning venv so I can get that on 3.10.11 or higher...without breaking everything else.

I still like the primitive but functional GUI of Automatic1111 for stills. But ComfyUI opens up video as well. Motion.

And that got me thinking about linear narrative.

There does exist a form call "non-linear narrative." But that refers to the relationship between the narrative and some other chronology. The latter may be shifted around. A writer can at any point refer to a different time, including such techniques as the flashback and flash-forward. But the narrative itself remains linear. One reads one word at a time.

(Arguably, from our understanding of the process of reading, we parse chunks of text and thus multiple words may be included in what is experienced as a phoneme of extracted meaning.)

This means it is extremely difficult to capture the near-simultaneous flow of information that a real person not reading an account in a book would experience. In our old gaming circles, the joke was the monster quietly waiting until the room description was finished. It is a basic problem in writing; you can't tell it all at the same time. And the order you chose influences the relative weight given.

Again, arguably, our attention can't be split too many ways. In most cases, the realization you had while you were in the middle of doing something else arrives as a discrete event. You may have heard the voice behind you, and been processing it, but the moment of understanding that cry of "Stop!" can be treated narratively at the moment it becomes the focus of attention. And the observations that lead to that moment of realization back-filled at that time, as they, too, rise to the top of the consciousness.

Or in another way of putting it, a narrative is an alias of the stream of consciousness and the order of presentation can be taken as the order of items brought into focus.

This idea of the sequential scroll of attention has been used in artwork. We normally absorb a piece of art by moving from one focus to another (in a matrix of probable interest including size, color, position, human faces, etc.) The artist can construct a narrative through this shifting of focus.


This one sneaks up in stages. The first impression is very calming. The next impressions are not. Especially in some periods, there could be subtler and subtler clues and symbols that you don't notice until you've been looking for a while.


Or there are artists, from the triptych to the Bayeux Tapestry that arrange distinct framed panels in a sequential order.

Motion controls this flow of narrative more tightly. Not to say there can't be the same slow realizations. But it means thinking sequentially.


In comic book terminology, the words "Closure" and "Encapsulation" are used to describe the concepts I've been talking about. "Closure" is the mental act of bringing together information that had been presented over a sequence of panels in order to extract the idea of a single thing or event. "Encapsulation" is a single panel that is both the highlight of and a reference pointer or stand-in for that event.

In text, narrative, especially immersive narrative that is keyed to a strong POV or, worse, a first-person POV, has a bias towards moving chronologically. Especially in first-person, this will lead the unwary writer into documenting every moment from waking to sleep (which is why I call it "Day-Planner Syndrome.")


I've been more and more conscious of the advantages and drawbacks of jumping into a scene at a more interesting point and rapidly back-filling (tell not show) the context of what that moment came out of. I don't like these little loops and how they disturb the illusion of a continuous consciousness that the reader is merely eves-dropping on as they go about their day, but I like even less spending pages on every breakfast.

And speaking of time. The best way to experience the passage of time is to have time pass. That is, if you want the reader to feel that long drive through the desert, you have to make them spend some time reading it. There's really no shortcut.

I decided for The Early Fox I wanted to present Penny as more of a blank slate, and to keep the focus within New Mexico. So no talking about her past experiences, comparisons to other places she's been, comparison or discussion of the histories of other places, technical discussions that bring in questions of where Penny learned geology or Latin or whatever, or quite so many pop-culture references.

And that means I am seriously running out of ways I can describe yucca.


In any case, spent a chunk of the weekend doing test runs with WAN2.2 and 2.1 on the subject of "will it move?" Which is basically the process of interrogating an AI model to see what it understands and what form the answer will take.

My first test on any new install is the prompt "bird." Just the one word. Across a number of checkpoints the result is a bird on the ground, usually grass. A strange and yet almost specific and describable bird; it is sort of a combination of bluebird and puffin with a large hooked beak, black/white mask, blue plumage and yellow chicken legs.

In investigating motion in video, I discovered there are two major things going on under the hood. The first is that when you get out of the mainstream ("person talking") and into a more specific motion ("person climbing a cliff") you run into the paucity of training data problem. When there is a variety of data, the AI can synthesize something that appears original. When the selection is too small, the AI recaps that bit of data in a way that becomes recognizable. Oh, that climbing move where he steps up with his left foot, then nods his head twice.

The other is subject-background detection. AI video works now (more-or-less) because of subject consistency. The person walking remains in the same clothing from the first frame to the last. It does interpolate, creating its own synthesized 3d version, but it can be thought of as, basically, detaching the subject then sliding it around on the background.

We've re-invented Flash.


Now, because the AI is detaching then interpolating, and the interpolation makes use of the training data of what the back of a coat or the rest of a shoe looks like (and, for video models, moves like), it does have the ability to animate things like hair appropriately when that subject is in motion. But AI is pretty good at not recognizing stuff, too. In this case, it takes the details it doesn't quite understand and basically turns them into a game skin.

Whether this is something the programmers were thinking, or an emergent behavior in which AI is discovering similar ways of approximating reality to what game creators have been doing, the subject becomes basically a surface mesh that gets the large-scale movements right but can reveal that things like the pauldrons on a suit of armor are basically surface details, parts of the "mesh."

It can help to think of AI animation as Flash in 3D. The identified subjects move around a background, with both given consistency from frame to frame. And think of the subject, whether it is a cat or a planet, as a single object that can be folded and stretched with the surface details more-or-less following.

But back to that consistency thing. For various reasons, video renders are limited to the low hundreds of frames (the default starter, depending on model, is 33 to 77 frames). And each render is a fresh roll of the dice. 

It is a strange paradox, possibly unavoidable in the way we are currently doing this thing we call "AI." In order to have something with the appearance of novelty, it has to fold in the larger bulk of training data. In order to have consistency, it has to ignore most of that data. And since we've decided to interrogate the black box of the engine with a text prompt, we are basically left with "make me a bird" and the engine spitting out a fresh interpretation every time.

That plays hell on making an actual narrative. Replace comic-book panel with film-terminology "shot," and have that "Closure" built on things developed over multiple shots, and you are confronted with the problem that the actors and setting are based on concepts, not on a stable model that exists outside the world of an individual render. If you construct "Bird walking," "Bird flies off," and "Bird in the sky" with each time interpreting the conceptual idea of "Bird," in a different way it is going to be a harder story to understand.


That is going to change. There are going to be character turn-arounds or virtual set building soon enough. As I understand it, though, the necessary randomness means the paradox is baked into the process. No matter what the model or template, it is treated the same as a prompt or a LoRA or any other weighting; as a suggestion. One that gets interpreted in the light of what that roll of the dice spat out that run.

And that's why the majority of those AI videos currently clogging YouTube go for conceptual snippets arranged in a narrative order, not a tight sequence of shots in close chronological time. You can easily prompt the AI to render the hero walking into a spaceport, and the hero piloting his spacecraft...now wearing a spacesuit and with a visibly different haircut.

For now, the best work-around appears to be using the "I2V" subset. That generates a video from an image reference. The downside is that anything that isn't in the image -- the back of the head, say -- is interpolated, and thus will be different in every render. It also requires creating starter images that are themselves on-model.

A related trick is pulling the last frame of the first render and using that as the starter image for a second render. The problem this runs into is the Xerox Effect; the same problem that is part of why there is a soft limit of the number of frames of animation can be rendered in a single run.


(The bigger problem in render length is memory management. I am not entirely clear why, 

As with most things AI, or 3D for that matter, it turns into the Compile Dance. Since each run is a roll of the dice, you often can't tell if there is a basic error of setup (bad prompt, a mistake in the reference image, a node connected backwards) or just a bad draw from the deck. You have to render a couple of times. Tweak a setting. Render a couple times to see if that change was in the right direction. Lather, rinse.

With my new GPU and the convenient test size I have been working with, render times fall into the sour spot. 1-3 minutes; not long enough to do something else, but long enough it is annoying to wait it out.

I still don't have an application, but it is an amusing enough technical problem to keep chasing for a bit longer. The discussions on the main subreddit seem to show a majority of questioners who just want "longer video" and hope that by crafting the right prompt, they can build a narrative in an interesting way.

The small minority is there, however, explaining that cutting together shorter clips better approaches how the movies have been doing it for a long time; a narrative approach that seems to work for the viewer. But that really throws things back towards the problem of consistency between clips.

And that's why I'm neck-deep in Python, trying not to break the rest of the tool kit in adding a LoRA trainer to the mix.

Thursday, October 2, 2025

How does your garden grow

Sometimes scenes evolve. You realize things the outline didn't address, and addressing them opens up opportunities you hadn't spotted before.

And, I know. The fierce outliners and the dedicated pantsers (and especially the WriMoNo -- NaNoWriMo is dead but some people still try to do 50K over November) call you stupid if you delay in getting those words on the page. But I think that time comes due eventually. Fix it now, or fix it in rewrites. As long as you are capable of holding enough story in your head, you don't have to have it written down in order to realize it is wrong and needs to be re-written.

The drive from Alamogordo to the tiny census-designated place (it's too small to be a town) of Yah-ta-hey is about five hours. The original plan/outline/mind mapper diagram just said "Mary leads Penny to the rez to talk to a man who knows about the 'Sheep Ranch'."

The original spec was that Penny doesn't get friends in this one. She doesn't get people she can lean on for emotional support, or people who are too helpful. They all have agenda, problems, and hide stuff from her. Mary is mostly...angry. I got her character and voice from a couple of different writers on continuing problems with radiological contamination on tribal lands.

But can she and Penny take a five-hour drive without either getting some resolution, or killing each other? Am I better off having Penny go alone? Or...is it the better option to make this a longer scene, to go deeper into Mary's personality, and give them a mini character arc?

Sigh. The hard one, of course.


The world continues to change. I started this in 2018. An American tourist loose in the world. The world's view of America has changed since then. COVID has changed things, the economic slump has changed things, and tourism has changed for everyone; European tourists are in the same hot water with over-travelled destinations and Venice is not the only place fighting back.

It always blind-sides you, change does. I had minor bit on a minor characters; a senior airman at the 49th with a shaving profile. I couldn't even name it, because that's the sort of thing that people who have been in the service would recognize but most of my readers would never have heard of.

Until it suddenly became a big thing to the pushup king, our current Secretary of War.


I did happen to think of a new silly idea, possibly for the "weird high-tech company in the wilds of Colorado" next adventure. Probably more productive is I've finally started making a proper vector of the new series logo so I can try out the new cover.

So there actually is a magical artifact. Sort of not very. It is extremely valuable and holds secrets because it is a replica and there's a USB stick in it with a bunch of trade secrets or something.

And maybe, I thought, it isn't the product of a historical culture, but instead a replica prop. And comes out of an imaginary IP, some sprawling fantasy saga with a lot of borrowing from various bits of real history (like GOT, say).

I mentioned a while back that various authors have tried to create a Disneyworld based on an imagined IP. The tough part is communicating this IP to the reader; you can't just say, "Look, it's Elsa from Frozen!" The fun part is, of course, creating the IP in the first place.


And, yeah, finally broke down and built a new computer. Things have moved on there, as well. SSDs aren't being fitted with expansion rails so they can slot in where the old 3.5" drive bays are. Instead they have a PCIe slot on the motherboard -- underneath a built-in heatsink because modern gaming machines run hot.

Darkrock case, MSI board, I7 CPU, 64gb starter RAM, 2tb SSD as the "C" drive, 1000 watt PSU and of course a nice floor heater of an RTX 3090 with 24gb of VRAM. 

Mostly did it because I wanted to do it. Not because I had a need for it. But, oh boy, when I finally got the updates installed and running and tried out Satisfactory it looked so good...

The test render on WAN2.2 t2v took 160 seconds. Not bad.