Sunday, October 12, 2025

Blue

Another plot bunny visited. This one is blue.

 

(From fanpop.com)

The first novel I finished, I had been writing an article for a gaming magazine. This was a little more straight-forward; I was thinking about concepts as a way to grasp cinematographic challenges of AI.

In that mix is thoughts I've had about wanting to do one of those Hornblower-esque career space navy things, and about wanting to do an engineer (who is more of a hacker), and thoughts about how to help Penny survive those situations where being Mistress of Waif-Fu would really, really come in handy.

And I ended up with a nice demonstration of how one idea can snowball into elaborate world-building...if you follow the potential implications. Start with a "heuristic implant." We're not going to get into the specifics of the tech here. Practically speaking, the young would-be soldier is sat under a helmet for a few hours, and when they get up;


Except experimental, very think-tank, with all the McNamara that promises. It is a whole set of combat skills that are deep-level muscle memory and happen basically automatically under the right stimuli. That right there is a whole host of problems. Bad enough your hands need to be licensed as deadly weapons -- now they are self-driving.

So that's a great character flaw, this killer instinct that could fire at the wrong moment, but at the same time, something that could help them through a sticky situation. Obviously (obviously, that is, when looked at from the needs of story, not through empty speculation on fantasy technology), they start meditating to at least control when it happens. And their relationship with this phantom driver...evolves.

(Yeah...Diadem from the Stars, The Stars My Destination, The Last Airbender -- not like this is exactly new ground.)

And that suggests a evolving situation, where a new kind of low-intensity warfare is challenging a military that organized entirely around capital ship combat. Of course that's a tired truism; the "always fighting the last war."

But that leads to wondering if this is really a navy at all. Or more like United Fruit Company, a massive corporate mercantile thing that works by rote and training and regulation, has been largely getting by with having a huge industrial base and leading-edge tech, but has expanded into a sector of space where the rules are a little different.

And now we've got multiple parties in the mix; old-school frontier traders who have the wisdom of experience, a cadre of experienced officers who want to create an actual military with a good esprit d'corps (not the same thing as free donut day at the office and "work smarter" posters in the cubicles), the friction between what is becoming an actual navy versus what is more like a merchant marine, the fresh-from-university theorists who pay far too much attention to how management thinks the world works (or should) and want quick-fix technological solutions over expensive training...

...and Ensign Blue in the middle of it, a trial run of one of the crazier outliers of the "super-soldier" package that various hard-liners have convinced themselves is the best way to win low-intensity conflict in an extremely politicized environment, a ship's engineer for a merchant ship who has no business at all getting hauled off to do dangerous missions on contested planets.

Oh, yeah. And engineer? Blue is the kind of engineer I've been seeing a lot of recently. Can (and often will) science the hell out of something, making the most amazing calculations. Then can't resist trying out a idea with ham-handed duct-tape and rat's-nest wiring that too often breaks (and sometimes catches fire).

Where does this go in character and career progression? Is there some third party, some out-of-context threat lurking behind what are still seen as basically raiding parties on the company's mining outposts? Are the rules about to go through another paradigm shift, tipping everyone from new enemies to unexpected allies into a brutal war?

Yeah, I got other books to write.

Tuesday, October 7, 2025

Anaconda

My go-to ComfyUI workflow now has more spaghetti than my most recent factory.


 (Not mine; some guy in Reddit.)

The VRAM crunch for long videos seems to rest primarily in the KSampler. There's an s2v workflow in the templates of a standard ComfyUI install that uses a tricky little module that picks up the latent and renders another chunk of video, for all of them to be stitched together at the end. With that thing, the major VRAM crunch is size of the image.

Of course there's still the decoherence issue. I've been running 40-second tests to see how badly the image decomposes over that many frames. Also found the quality is acceptable rendering at 720 and upscaling to 1024 via a simple frame-by-frame lanczos upscaler (nothing AI about it). And I'm rather proud I figured that our all by myself. At 16 fps and with Steps set down at 4 I can get a second of video for every minute the floor heater is running.

Scripting is still a big unknown. I've been experimenting with the s2v (sound to video) and as usual there are surprises. AI, after all, is an exercise in probabilities. "These things are often found with those things." It is, below the layers of agents and control nets and weighting, a next-word autocomplete.

That means it seems to have an uncanny ability to extract emotional and semantic meaning from speech. It is strictly associational; videos in the training material tended to show a person pointing when the vocal patterns of "look over there" occurred. More emergence. Cat logic, even.



So anyhow, I broke Automatic1111. Sure, it had a venv folder, but somehow Path got pointed in the wrong direction. Fortunately was able to delete python, clean install 3.10.9 inside the SD folder, Automatic1111 came back up and ComfyUI was still safe in its own sandbox. And now to try to install Kohya.


Experimenting with the tech has led to thinking about shots, and that in turn has circled back to the same thing I identified earlier, a thing that becomes particularly visible when talking about AI.

We all have an urge to create. And we all have our desires and internal landscapes that, when given the chance, will attempt to shape the work. Well, okay, writing forums talk about the person who wants to have written a book; the book itself being of no import, just as the nature of the film they starred in having nothing to do with the desire to be a famous actor. It is the fame and fortune that is the object.

In any case, the difference between the stereotype of push-button art (paint by numeric control) and the application of actual skills that took time and effort to learn is, in relation to the process of creation itself, just a matter of how granular you are getting about it.

Music has long had chance music and aleatoric music. Some artists throw paint at a canvass. And some people hire or collaborate. Is a composer not a composer if they hire an arranger?

That said, I feel that in video, the approach taken by many in AI is getting in the way of achieving a meaningful goal. As it exists right now, AI video is poorly scriptable, and its cinematography -- the choice of shots and cutting in order to tell the story -- is lacking. This, as with all things AI, will change.

But right now a lot of people getting into AI are crowding the subreddits asking how to generate longer videos.

I'm sorry, but wrong approach. In today's cinematography, 15 seconds is considered a long shot. Many movies are cut at a faster tempo than that. Now, there is the issue of coverage...but I'll get there. In any case, this is just another side of the AI approach that wants nothing more than to press buttons. In fact, it isn't even the time, effort, or artistic skills or tools that are being avoided. It is the burden of creativity. People are using AI to create the prompts to create AI images. And not just sometimes; there are workflows designed to automate this terribly challenging chore of getting ChatGPT to spit out a string of words that can be plugged into ComfyUI.

Art and purposes change. New forms arise. A sonnet is not a haiku. There is argument to recognize as a form the short-form AI video that stitches together semi-related clips in a montage style.

But even here, the AI is going to do poorly at generating it all in one go. It will do better if each shot is rendered separately, and something (a human editor, even!) splices the shots together. And, especially if the target is TikTok or the equivalent, the individual shots are rarely going to be more than five seconds in length.


Cutting to develop a story, using language similar to modern filmic language, is a different beast entirely. The challenge I'm thinking a lot about now is consistency. Consistency of character, consistency of set. There are also challenges in matching camera motions and angles if you want to apply the language correctly. For that shot-reverse-shot of which the OTS is often part, you have to obey the 180 rule or the results become confusing.

One basic approach is image to video. With i2v, every shot has the same starting point, although they diverge from there. As a specific example, imagine a render of a car driving off. In one render, the removal of the car reveals a fire hydrant. In the second render from the same start point, a mailbox. The AI rolled the dice each time because that part of the background wasn't in the original reference.

One weird problem as well. In editing, various kinds of buffer shots are inserted to hide the cuts made to the master shot. The interview subject coughed. If you just cut, there'd be a stutter in the film. So cut to the interviewer nodding as if listening (those are usually filmed at a different time, and without the subject at all!) Then cut back.

In the case of an i2v workflow, a cutaway done like this would create a strange déjà vu; after the cut, the main shot seems to have reset in time.

So this might actually be an argument for a longer clip, but not to be used as the final output; to be used as a master shot to be cut into for story beats.

Only we run into another problem. It is poorly scriptable at present. In the workflows I am currently using, there's essentially one idea per clip. So a simple idea such as "he sees the gun and starts talking rapidly" doesn't work with this process.

What you need is to create two clips with different prompts. And you need to steal the last frame from the first clip and use it as the starting image of the second clip. Only this too has problems; the degradation over a length of a clip means even if you add a node in the workflow to automatically save the target frame, it will need to be cleaned up, corrected back to being on-model, and have the resolution increased back to the original.

And, yes, I've seen a workflow that automates all of that, right down to a preset noise setting in the AI model that regenerates a fresh and higher-resolution image.

My, what a tangled web we weave.

Monday, October 6, 2025

Cryptic Triptych

I got the PC I built up and running, after the usual 22h2 hassle (tip; don't use the internal updater. Run the web installer at Microsoft. For as long as that lasts!)

ComfyUI is sandboxed (and a one-click install) and Automatic1111, though now an abandoned project, also installs a venv folder within the stable_diffusion folder, meaning it can run on Python 3.10.6  Now trying to get Kohya running, and learning venv so I can get that on 3.10.11 or higher...without breaking everything else.

I still like the primitive but functional GUI of Automatic1111 for stills. But ComfyUI opens up video as well. Motion.

And that got me thinking about linear narrative.

There does exist a form call "non-linear narrative." But that refers to the relationship between the narrative and some other chronology. The latter may be shifted around. A writer can at any point refer to a different time, including such techniques as the flashback and flash-forward. But the narrative itself remains linear. One reads one word at a time.

(Arguably, from our understanding of the process of reading, we parse chunks of text and thus multiple words may be included in what is experienced as a phoneme of extracted meaning.)

This means it is extremely difficult to capture the near-simultaneous flow of information that a real person not reading an account in a book would experience. In our old gaming circles, the joke was the monster quietly waiting until the room description was finished. It is a basic problem in writing; you can't tell it all at the same time. And the order you chose influences the relative weight given.

Again, arguably, our attention can't be split too many ways. In most cases, the realization you had while you were in the middle of doing something else arrives as a discrete event. You may have heard the voice behind you, and been processing it, but the moment of understanding that cry of "Stop!" can be treated narratively at the moment it becomes the focus of attention. And the observations that lead to that moment of realization back-filled at that time, as they, too, rise to the top of the consciousness.

Or in another way of putting it, a narrative is an alias of the stream of consciousness and the order of presentation can be taken as the order of items brought into focus.

This idea of the sequential scroll of attention has been used in artwork. We normally absorb a piece of art by moving from one focus to another (in a matrix of probable interest including size, color, position, human faces, etc.) The artist can construct a narrative through this shifting of focus.


This one sneaks up in stages. The first impression is very calming. The next impressions are not. Especially in some periods, there could be subtler and subtler clues and symbols that you don't notice until you've been looking for a while.


Or there are artists, from the triptych to the Bayeux Tapestry that arrange distinct framed panels in a sequential order.

Motion controls this flow of narrative more tightly. Not to say there can't be the same slow realizations. But it means thinking sequentially.


In comic book terminology, the words "Closure" and "Encapsulation" are used to describe the concepts I've been talking about. "Closure" is the mental act of bringing together information that had been presented over a sequence of panels in order to extract the idea of a single thing or event. "Encapsulation" is a single panel that is both the highlight of and a reference pointer or stand-in for that event.

In text, narrative, especially immersive narrative that is keyed to a strong POV or, worse, a first-person POV, has a bias towards moving chronologically. Especially in first-person, this will lead the unwary writer into documenting every moment from waking to sleep (which is why I call it "Day-Planner Syndrome.")


I've been more and more conscious of the advantages and drawbacks of jumping into a scene at a more interesting point and rapidly back-filling (tell not show) the context of what that moment came out of. I don't like these little loops and how they disturb the illusion of a continuous consciousness that the reader is merely eves-dropping on as they go about their day, but I like even less spending pages on every breakfast.

And speaking of time. The best way to experience the passage of time is to have time pass. That is, if you want the reader to feel that long drive through the desert, you have to make them spend some time reading it. There's really no shortcut.

I decided for The Early Fox I wanted to present Penny as more of a blank slate, and to keep the focus within New Mexico. So no talking about her past experiences, comparisons to other places she's been, comparison or discussion of the histories of other places, technical discussions that bring in questions of where Penny learned geology or Latin or whatever, or quite so many pop-culture references.

And that means I am seriously running out of ways I can describe yucca.


In any case, spent a chunk of the weekend doing test runs with WAN2.2 and 2.1 on the subject of "will it move?" Which is basically the process of interrogating an AI model to see what it understands and what form the answer will take.

My first test on any new install is the prompt "bird." Just the one word. Across a number of checkpoints the result is a bird on the ground, usually grass. A strange and yet almost specific and describable bird; it is sort of a combination of bluebird and puffin with a large hooked beak, black/white mask, blue plumage and yellow chicken legs.

In investigating motion in video, I discovered there are two major things going on under the hood. The first is that when you get out of the mainstream ("person talking") and into a more specific motion ("person climbing a cliff") you run into the paucity of training data problem. When there is a variety of data, the AI can synthesize something that appears original. When the selection is too small, the AI recaps that bit of data in a way that becomes recognizable. Oh, that climbing move where he steps up with his left foot, then nods his head twice.

The other is subject-background detection. AI video works now (more-or-less) because of subject consistency. The person walking remains in the same clothing from the first frame to the last. It does interpolate, creating its own synthesized 3d version, but it can be thought of as, basically, detaching the subject then sliding it around on the background.

We've re-invented Flash.


Now, because the AI is detaching then interpolating, and the interpolation makes use of the training data of what the back of a coat or the rest of a shoe looks like (and, for video models, moves like), it does have the ability to animate things like hair appropriately when that subject is in motion. But AI is pretty good at not recognizing stuff, too. In this case, it takes the details it doesn't quite understand and basically turns them into a game skin.

Whether this is something the programmers were thinking, or an emergent behavior in which AI is discovering similar ways of approximating reality to what game creators have been doing, the subject becomes basically a surface mesh that gets the large-scale movements right but can reveal that things like the pauldrons on a suit of armor are basically surface details, parts of the "mesh."

It can help to think of AI animation as Flash in 3D. The identified subjects move around a background, with both given consistency from frame to frame. And think of the subject, whether it is a cat or a planet, as a single object that can be folded and stretched with the surface details more-or-less following.

But back to that consistency thing. For various reasons, video renders are limited to the low hundreds of frames (the default starter, depending on model, is 33 to 77 frames). And each render is a fresh roll of the dice. 

It is a strange paradox, possibly unavoidable in the way we are currently doing this thing we call "AI." In order to have something with the appearance of novelty, it has to fold in the larger bulk of training data. In order to have consistency, it has to ignore most of that data. And since we've decided to interrogate the black box of the engine with a text prompt, we are basically left with "make me a bird" and the engine spitting out a fresh interpretation every time.

That plays hell on making an actual narrative. Replace comic-book panel with film-terminology "shot," and have that "Closure" built on things developed over multiple shots, and you are confronted with the problem that the actors and setting are based on concepts, not on a stable model that exists outside the world of an individual render. If you construct "Bird walking," "Bird flies off," and "Bird in the sky" with each time interpreting the conceptual idea of "Bird," in a different way it is going to be a harder story to understand.


That is going to change. There are going to be character turn-arounds or virtual set building soon enough. As I understand it, though, the necessary randomness means the paradox is baked into the process. No matter what the model or template, it is treated the same as a prompt or a LoRA or any other weighting; as a suggestion. One that gets interpreted in the light of what that roll of the dice spat out that run.

And that's why the majority of those AI videos currently clogging YouTube go for conceptual snippets arranged in a narrative order, not a tight sequence of shots in close chronological time. You can easily prompt the AI to render the hero walking into a spaceport, and the hero piloting his spacecraft...now wearing a spacesuit and with a visibly different haircut.

For now, the best work-around appears to be using the "I2V" subset. That generates a video from an image reference. The downside is that anything that isn't in the image -- the back of the head, say -- is interpolated, and thus will be different in every render. It also requires creating starter images that are themselves on-model.

A related trick is pulling the last frame of the first render and using that as the starter image for a second render. The problem this runs into is the Xerox Effect; the same problem that is part of why there is a soft limit of the number of frames of animation can be rendered in a single run.


(The bigger problem in render length is memory management. I am not entirely clear why, 

As with most things AI, or 3D for that matter, it turns into the Compile Dance. Since each run is a roll of the dice, you often can't tell if there is a basic error of setup (bad prompt, a mistake in the reference image, a node connected backwards) or just a bad draw from the deck. You have to render a couple of times. Tweak a setting. Render a couple times to see if that change was in the right direction. Lather, rinse.

With my new GPU and the convenient test size I have been working with, render times fall into the sour spot. 1-3 minutes; not long enough to do something else, but long enough it is annoying to wait it out.

I still don't have an application, but it is an amusing enough technical problem to keep chasing for a bit longer. The discussions on the main subreddit seem to show a majority of questioners who just want "longer video" and hope that by crafting the right prompt, they can build a narrative in an interesting way.

The small minority is there, however, explaining that cutting together shorter clips better approaches how the movies have been doing it for a long time; a narrative approach that seems to work for the viewer. But that really throws things back towards the problem of consistency between clips.

And that's why I'm neck-deep in Python, trying not to break the rest of the tool kit in adding a LoRA trainer to the mix.

Thursday, October 2, 2025

How does your garden grow

Sometimes scenes evolve. You realize things the outline didn't address, and addressing them opens up opportunities you hadn't spotted before.

And, I know. The fierce outliners and the dedicated pantsers (and especially the WriMoNo -- NaNoWriMo is dead but some people still try to do 50K over November) call you stupid if you delay in getting those words on the page. But I think that time comes due eventually. Fix it now, or fix it in rewrites. As long as you are capable of holding enough story in your head, you don't have to have it written down in order to realize it is wrong and needs to be re-written.

The drive from Alamogordo to the tiny census-designated place (it's too small to be a town) of Yah-ta-hey is about five hours. The original plan/outline/mind mapper diagram just said "Mary leads Penny to the rez to talk to a man who knows about the 'Sheep Ranch'."

The original spec was that Penny doesn't get friends in this one. She doesn't get people she can lean on for emotional support, or people who are too helpful. They all have agenda, problems, and hide stuff from her. Mary is mostly...angry. I got her character and voice from a couple of different writers on continuing problems with radiological contamination on tribal lands.

But can she and Penny take a five-hour drive without either getting some resolution, or killing each other? Am I better off having Penny go alone? Or...is it the better option to make this a longer scene, to go deeper into Mary's personality, and give them a mini character arc?

Sigh. The hard one, of course.


The world continues to change. I started this in 2018. An American tourist loose in the world. The world's view of America has changed since then. COVID has changed things, the economic slump has changed things, and tourism has changed for everyone; European tourists are in the same hot water with over-travelled destinations and Venice is not the only place fighting back.

It always blind-sides you, change does. I had minor bit on a minor characters; a senior airman at the 49th with a shaving profile. I couldn't even name it, because that's the sort of thing that people who have been in the service would recognize but most of my readers would never have heard of.

Until it suddenly became a big thing to the pushup king, our current Secretary of War.


I did happen to think of a new silly idea, possibly for the "weird high-tech company in the wilds of Colorado" next adventure. Probably more productive is I've finally started making a proper vector of the new series logo so I can try out the new cover.

So there actually is a magical artifact. Sort of not very. It is extremely valuable and holds secrets because it is a replica and there's a USB stick in it with a bunch of trade secrets or something.

And maybe, I thought, it isn't the product of a historical culture, but instead a replica prop. And comes out of an imaginary IP, some sprawling fantasy saga with a lot of borrowing from various bits of real history (like GOT, say).

I mentioned a while back that various authors have tried to create a Disneyworld based on an imagined IP. The tough part is communicating this IP to the reader; you can't just say, "Look, it's Elsa from Frozen!" The fun part is, of course, creating the IP in the first place.


And, yeah, finally broke down and built a new computer. Things have moved on there, as well. SSDs aren't being fitted with expansion rails so they can slot in where the old 3.5" drive bays are. Instead they have a PCIe slot on the motherboard -- underneath a built-in heatsink because modern gaming machines run hot.

Darkrock case, MSI board, I7 CPU, 64gb starter RAM, 2tb SSD as the "C" drive, 1000 watt PSU and of course a nice floor heater of an RTX 3090 with 24gb of VRAM. 

Mostly did it because I wanted to do it. Not because I had a need for it. But, oh boy, when I finally got the updates installed and running and tried out Satisfactory it looked so good...

The test render on WAN2.2 t2v took 160 seconds. Not bad.

Tuesday, September 23, 2025

Remember the Espirito Santo

I kinda want to send Penny to Texas next. They got oil, some very old towns, the Alamo, and about a million interesting wrecks off the coast. They also got more dinosaurs than you can shake an animatronic thagomizer at.


 Goings-on at a fancy dinosaur exhibit is fun. Everybody loves some amusement park shenanigans, and adding dinosaurs is only further insurance that something will go worng.

Of course archaeologists famously don't do dinosaurs.


But I'm not all that enthralled by doing archaeology on missions, old towns, or the Alamos, or even Washington-on-Brazos, the "birthplace of Texas."


I'm more intrigued by marine archaeology. Always wanted to work that in at some point. But that's a ton of research. And, really, the good way to phrase "Write what you know" is "Write what you want to know." Write about things that excite you, either through prior exposure or because you really want to write about them.

I've got this emotional stub of Penny being a low-wage worker at some soul-less tech center in some up-and-coming development on the outskirts of a national park...until something goes terribly right. (Wrong for her, right for getting a story out of it.)

Well, I really do have a prior novel to finish. Then I might clean the decks, stick a plastic fern in the corner and hang some paper lanterns, and get the hell out of First Person POV for a while.

Assembled a cart at NewEgg, too. I can only sorta write during downtime at work and looking up parts -- as annoying as it can be -- is a way to pass the time. Of course, the only graphics project I've got that is actually for anyone other than myself, is the new cover I'm trying...and there's no AI involved in that.


Thursday, September 18, 2025

Painting with a potato

There is an artist's method. Like scientific method or engineering method, it is a base process. This idea of processes that are largely application-independent underlies the Maker movement -- which has merged and morphed and now is more like the kind of "hacker" the Hackaday Blog was built for.


This is why there are so many multi-instrumentalists in the home studio music circles, and how visual artists change media. There's cross-over, too. Not to in any way diminish the immense investment in skills in the specific mediums and tools, but there are commonalities at more fractal levels than just the generalized artistic method.

In any case, the thinking is basically tools is tools. I've designed sound and lighting for the stage. I've designed props. I've done book covers. I've done music. I'm not saying I do any of these things well, but there is a hacker approach to it. Understand the new medium and tools and then apply the core skills to it. It all feels sort of the same, reaching for personal aesthetics and understanding of the real world and learning about the tools and the traditions of this specific media.

From a process point of view, from this higher level "making art" point of view, AI is just another tool set.

It tends to be a toolset that is relatively easy to learn, but also less responsive. It really is like painting with a potato (and not to make stamps, either!) You can't really repurpose the hand-and-eye skills learned in inking a comic book panel with a nibbed pen or painting a clean border on a set wall with a sash brush. Because it is a potato. You hold it, you dip it in paint, it makes a big messy mark on the canvas.

It is mostly the upper-level skills and that general artistic approach that brings to bear. What of the things I can imagine and want to visualize, will this tool let me achieve? You can't get a melody out of a sculpture or paint the fluting on a hand-knapped obsidian point, and you learn quickly what subjects and styles and so forth the AI tools can support, and which are not really what they are meant for.

This will change. This will change quickly. Already, I've discovered even my potato computer could have done those short videos with animation synchronized to the music.

That's been the latest "Hey, I found this new kind of ink marker at the art store and I just had to try it out on something" for me. S2V with an audio-to-video model built around the WAN2.2 (14B) AI image model. In a spaghetti workflow hosted in the ComfyUI.

On a potato. (I5 with 3060Ti. I want to upgrade to 3090 -- generations don't matter as much as that raw 24GB of VRAM -- but can't stomach another thousand bucks on the proper host of an I9 machine with the bus for DDR5.)

Render times for ten seconds of video at HD (480x640) is around an hour. Hello, 1994; I'm back to Bryce3D!


Actually, right now longer clips lose coherence. That's the reason all that stuff on YouTube right now is based on 30 seconds and under. It's the same problem that underlies why GPT-4 loses the plot a chapter in to a novel and ACE-step forgets the tune or even the key. 

I suspect there's a fix with adding another agent in there so it won't surprise me to find this is a temporary problem. But it also won't surprise me if it proves more difficult than that, as this is out of something that sort of underlines the entire genetic algorithm approach of things. Even on a single still image, I've seen Stable Diffusion forget what it was rendering and start detailing up something other than what it started with. Particularly a problem with inpainting. "Add a hand here." "Now detail that hand." Responds SD "What hand?"

(This also may change with the move towards natural language prompting. The traditional approach was spelling things out, as if the AI was an idiot. This is the "Bear, brown, with fur, holding a spatula in one paw..." approach. The WAN (and I assume Flux) tools claim better results with "A bear tending a barbecue" and let the AI figure out what that entails.)

Anyhow. Been experimenting with a wacky pipeline, but this time using speech instead of music.

Rip the audio track off a video with VLC. Extract and clean up a voice sample via Audacity. Drop voice sample and script into Chatterbox (I've been messing with the TTL and VC branches.) The other pipeline is using basic paint to compose, Stable Diffusion (using Automatic1111 as a front end, because while it may be aging, the all-in-one canvas approach is much more suitable for inpainting. Plus, I know this one pretty well now.)

Throw both into the WAN2.2-base S2V running under ComfyUI. It does decent lip synch. I haven't tried it out yet for instrumentals but apparently it can parse some of that, too. It also has a different architecture than the WAN2.1 I was doing I2V on before. It has a similar model consistency, which is a nice surprise, but the workflow I'm using leverages an "extend video" hack that means that my potato -- as much as it struggles fitting a 20GB model into 12GB of VRAM -- can get out to at least 14 seconds at HD.

As usual, the fun part is trying out edge cases. Always, the artist wants to mess around and see what the tools can do. And at some point you are smacking that sculpture with a drum stick, recording the result and programming it into a sample keyboard just to see if it can be done.

There is an image-to-mesh template already on my ComfyUI install. The AI can already interpolate outside of the picture frame, which is what it is doing -- almost seamlessly! -- in the image2video workflow. So it makes sense that it can interpolate the back side of an object in an image. And then produce a surface and export it as an STL file you can send to your physical printer!

So there are many things the tools can do. The underlying paradox remains -- perhaps even more of one as the tools rely on the AI reading your intent and executing that; an understanding that relies on there being a substantial number of people that asked for the same thing using similar words.

It is the new digital version of the tragedy of the commons. It is the search algorithm turned into image generation. It has never been easier to look like everyone else (well, maybe in Paris in 1900, if you were trying to get selected for show by the Académie).

And that indeed is my new fear. I think there is this thought in some of the big companies that this could be a new way to stream content. Give everyone tools to "create art" so, like the nightmare the Incredibles faced, nobody is special. The actual artists starve out even faster than they have been already. The consumer believes they are now the creator, but they aren't; they are only the users of a service that spits out variations on what they have already been sold.

Right now, AI art is in the hacker domain, where academics and tinkerers and volunteers are creating the tools. Soon enough, the walls will close. Copyright and DMCA, aggressive "anti-pornography" stances. None of that actually wrong by itself, but applied surgically to make sure the tools are no longer in your hands but are only rented from the company that owns them.

The big software companies have often been oddly hostile to content creators. "Why would you need to create and share when we've made it so easy to watch DVD's?" This just accelerates in that direction, where the end-user doesn't create and doesn't own.

We're back to streaming content. It is just the buttons have changed labels. Instead of "play me something that's like what I usually listen to" it will be, "create me something that sounds like what I usually listen to.

Doesn't stop me from playing around with the stuff myself.


(Call to Power players will recognize this one.)

Monday, September 15, 2025

It's all about ethics in archaeology

I've a writing worklist right now.

Finishing The Early Fox is top of the list, of course. I didn't find that Civil War graveyard (there are a lot of "Billy the Kid kilt heah" locations but nothing near enough to US-54) but I did find an old Mission style (WPA project, actually) school in a gold boom ghost town. That will do for my "Ecstasy of Gold" scene.

Still not ready for the foray into Navajo lands. 

Contacted the cover artists I've worked with before but they were uninterested in doing a cover clean-up. They have (most of these guys have) a listed service of cover doctor, but what they mean is, they'll design one for you to replace whatever it is you are using.

So I guess I'm doing my own cover art again. After my last couple of go-arounds, I'm tired of struggling to communicate. And first task on that list is doing my own vectors over the art my logo designer did. And probably changing a few things and trying out a few things, too.

As part of the rebranding, revisions to The Fox Knows Many Things. Including AI -- but not what you'd think! ProWritingAid has an AI development editor thing that is, of course, extra fees but they have a free trial. That's not in any way using AI to revise.

It is, perhaps, one of the best ways of using AI; a thousand-monkey version of crowdsourcing (on stolen data). Point the AI monkey at the manuscript and see if enough of it fell within the kinds of things readers are used to seeing (aka looks like other books) that the AI can figure out what it is looking at.

Always, as with any editing tool or feedback, take it as a suggestion and apply your own instincts and intelligence to those suggestions. But I'd be interested in seeing what it comes up with.

***

I've been thinking about archaeological issues to hang the next book on.

That is my conceit for the series; each book features a place and culture, a bit of history connected to said place and culture, and explores some aspect of modern archaeology.

The current book, I backed off early with the idea of doing "bad experiences on digs." The sexism on the dig in this book is a plot point, but it isn't defining the dig or Penny's experience (which is largely positive...next book, maybe, I can get her the Job from Hell.)

So The Early Fox is more-or-less NAGPRA and associated issues. I was barely able to touch on indigenous archaeology/community archaeology, the mound builder myth, and so on but at that, it did better than the Paris book in having something to do with archaeology.


Over previous books, I've already delved into archaeological looting, nationalist revisionism and irredentism. And pseudo-archaeology, but there are too many easy pickings there. I could easily fill a series with just running around after the Kensington Runestone or Bosnian pyramids. The fake artifact trade is interesting but I've got a book saved for that. And repatriation issues show up again with the boat episode (assuming I get around to writing either of those).


And somehow that thinking led to something that was an actual excuse other than "this is the random small town I ended up in next" for Penny's involvement. Brought in to help the artifacts look more authentic (for extremely suspect values of "authentic") in some kind of weird high-tech VR sort of thing. Less gaming company (why would they need a fancy tech center?) more experimental direct-mind-connection stuff.

Not really directions I wanted to take the series, but then having gunfights with ISIL over some Buddhist statuary isn't where I want to go, either.