Tuesday, September 23, 2025

Remember the Espirito Santo

I kinda want to send Penny to Texas next. They got oil, some very old towns, the Alamo, and about a million interesting wrecks off the coast. They also got more dinosaurs than you can shake an animatronic thagomizer at.


 Goings-on at a fancy dinosaur exhibit is fun. Everybody loves some amusement park shenanigans, and adding dinosaurs is only further insurance that something will go worng.

Of course archaeologists famously don't do dinosaurs.


But I'm not all that enthralled by doing archaeology on missions, old towns, or the Alamos, or even Washington-on-Brazos, the "birthplace of Texas."


I'm more intrigued by marine archaeology. Always wanted to work that in at some point. But that's a ton of research. And, really, the good way to phrase "Write what you know" is "Write what you want to know." Write about things that excite you, either through prior exposure or because you really want to write about them.

I've got this emotional stub of Penny being a low-wage worker at some soul-less tech center in some up-and-coming development on the outskirts of a national park...until something goes terribly right. (Wrong for her, right for getting a story out of it.)

Well, I really do have a prior novel to finish. Then I might clean the decks, stick a plastic fern in the corner and hang some paper lanterns, and get the hell out of First Person POV for a while.

Assembled a cart at NewEgg, too. I can only sorta write during downtime at work and looking up parts -- as annoying as it can be -- is a way to pass the time. Of course, the only graphics project I've got that is actually for anyone other than myself, is the new cover I'm trying...and there's no AI involved in that.


Thursday, September 18, 2025

Painting with a potato

There is an artist's method. Like scientific method or engineering method, it is a base process. This idea of processes that are largely application-independent underlies the Maker movement -- which has merged and morphed and now is more like the kind of "hacker" the Hackaday Blog was built for.


This is why there are so many multi-instrumentalists in the home studio music circles, and how visual artists change media. There's cross-over, too. Not to in any way diminish the immense investment in skills in the specific mediums and tools, but there are commonalities at more fractal levels than just the generalized artistic method.

In any case, the thinking is basically tools is tools. I've designed sound and lighting for the stage. I've designed props. I've done book covers. I've done music. I'm not saying I do any of these things well, but there is a hacker approach to it. Understand the new medium and tools and then apply the core skills to it. It all feels sort of the same, reaching for personal aesthetics and understanding of the real world and learning about the tools and the traditions of this specific media.

From a process point of view, from this higher level "making art" point of view, AI is just another tool set.

It tends to be a toolset that is relatively easy to learn, but also less responsive. It really is like painting with a potato (and not to make stamps, either!) You can't really repurpose the hand-and-eye skills learned in inking a comic book panel with a nibbed pen or painting a clean border on a set wall with a sash brush. Because it is a potato. You hold it, you dip it in paint, it makes a big messy mark on the canvas.

It is mostly the upper-level skills and that general artistic approach that brings to bear. What of the things I can imagine and want to visualize, will this tool let me achieve? You can't get a melody out of a sculpture or paint the fluting on a hand-knapped obsidian point, and you learn quickly what subjects and styles and so forth the AI tools can support, and which are not really what they are meant for.

This will change. This will change quickly. Already, I've discovered even my potato computer could have done those short videos with animation synchronized to the music.

That's been the latest "Hey, I found this new kind of ink marker at the art store and I just had to try it out on something" for me. S2V with an audio-to-video model built around the WAN2.2 (14B) AI image model. In a spaghetti workflow hosted in the ComfyUI.

On a potato. (I5 with 3060Ti. I want to upgrade to 3090 -- generations don't matter as much as that raw 24GB of VRAM -- but can't stomach another thousand bucks on the proper host of an I9 machine with the bus for DDR5.)

Render times for ten seconds of video at HD (480x640) is around an hour. Hello, 1994; I'm back to Bryce3D!


Actually, right now longer clips lose coherence. That's the reason all that stuff on YouTube right now is based on 30 seconds and under. It's the same problem that underlies why GPT-4 loses the plot a chapter in to a novel and ACE-step forgets the tune or even the key. 

I suspect there's a fix with adding another agent in there so it won't surprise me to find this is a temporary problem. But it also won't surprise me if it proves more difficult than that, as this is out of something that sort of underlines the entire genetic algorithm approach of things. Even on a single still image, I've seen Stable Diffusion forget what it was rendering and start detailing up something other than what it started with. Particularly a problem with inpainting. "Add a hand here." "Now detail that hand." Responds SD "What hand?"

(This also may change with the move towards natural language prompting. The traditional approach was spelling things out, as if the AI was an idiot. This is the "Bear, brown, with fur, holding a spatula in one paw..." approach. The WAN (and I assume Flux) tools claim better results with "A bear tending a barbecue" and let the AI figure out what that entails.)

Anyhow. Been experimenting with a wacky pipeline, but this time using speech instead of music.

Rip the audio track off a video with VLC. Extract and clean up a voice sample via Audacity. Drop voice sample and script into Chatterbox (I've been messing with the TTL and VC branches.) The other pipeline is using basic paint to compose, Stable Diffusion (using Automatic1111 as a front end, because while it may be aging, the all-in-one canvas approach is much more suitable for inpainting. Plus, I know this one pretty well now.)

Throw both into the WAN2.2-base S2V running under ComfyUI. It does decent lip synch. I haven't tried it out yet for instrumentals but apparently it can parse some of that, too. It also has a different architecture than the WAN2.1 I was doing I2V on before. It has a similar model consistency, which is a nice surprise, but the workflow I'm using leverages an "extend video" hack that means that my potato -- as much as it struggles fitting a 20GB model into 12GB of VRAM -- can get out to at least 14 seconds at HD.

As usual, the fun part is trying out edge cases. Always, the artist wants to mess around and see what the tools can do. And at some point you are smacking that sculpture with a drum stick, recording the result and programming it into a sample keyboard just to see if it can be done.

There is an image-to-mesh template already on my ComfyUI install. The AI can already interpolate outside of the picture frame, which is what it is doing -- almost seamlessly! -- in the image2video workflow. So it makes sense that it can interpolate the back side of an object in an image. And then produce a surface and export it as an STL file you can send to your physical printer!

So there are many things the tools can do. The underlying paradox remains -- perhaps even more of one as the tools rely on the AI reading your intent and executing that; an understanding that relies on there being a substantial number of people that asked for the same thing using similar words.

It is the new digital version of the tragedy of the commons. It is the search algorithm turned into image generation. It has never been easier to look like everyone else (well, maybe in Paris in 1900, if you were trying to get selected for show by the Académie).

And that indeed is my new fear. I think there is this thought in some of the big companies that this could be a new way to stream content. Give everyone tools to "create art" so, like the nightmare the Incredibles faced, nobody is special. The actual artists starve out even faster than they have been already. The consumer believes they are now the creator, but they aren't; they are only the users of a service that spits out variations on what they have already been sold.

Right now, AI art is in the hacker domain, where academics and tinkerers and volunteers are creating the tools. Soon enough, the walls will close. Copyright and DMCA, aggressive "anti-pornography" stances. None of that actually wrong by itself, but applied surgically to make sure the tools are no longer in your hands but are only rented from the company that owns them.

The big software companies have often been oddly hostile to content creators. "Why would you need to create and share when we've made it so easy to watch DVD's?" This just accelerates in that direction, where the end-user doesn't create and doesn't own.

We're back to streaming content. It is just the buttons have changed labels. Instead of "play me something that's like what I usually listen to" it will be, "create me something that sounds like what I usually listen to.

Doesn't stop me from playing around with the stuff myself.


(Call to Power players will recognize this one.)

Monday, September 15, 2025

It's all about ethics in archaeology

I've a writing worklist right now.

Finishing The Early Fox is top of the list, of course. I didn't find that Civil War graveyard (there are a lot of "Billy the Kid kilt heah" locations but nothing near enough to US-54) but I did find an old Mission style (WPA project, actually) school in a gold boom ghost town. That will do for my "Ecstasy of Gold" scene.

Still not ready for the foray into Navajo lands. 

Contacted the cover artists I've worked with before but they were uninterested in doing a cover clean-up. They have (most of these guys have) a listed service of cover doctor, but what they mean is, they'll design one for you to replace whatever it is you are using.

So I guess I'm doing my own cover art again. After my last couple of go-arounds, I'm tired of struggling to communicate. And first task on that list is doing my own vectors over the art my logo designer did. And probably changing a few things and trying out a few things, too.

As part of the rebranding, revisions to The Fox Knows Many Things. Including AI -- but not what you'd think! ProWritingAid has an AI development editor thing that is, of course, extra fees but they have a free trial. That's not in any way using AI to revise.

It is, perhaps, one of the best ways of using AI; a thousand-monkey version of crowdsourcing (on stolen data). Point the AI monkey at the manuscript and see if enough of it fell within the kinds of things readers are used to seeing (aka looks like other books) that the AI can figure out what it is looking at.

Always, as with any editing tool or feedback, take it as a suggestion and apply your own instincts and intelligence to those suggestions. But I'd be interested in seeing what it comes up with.

***

I've been thinking about archaeological issues to hang the next book on.

That is my conceit for the series; each book features a place and culture, a bit of history connected to said place and culture, and explores some aspect of modern archaeology.

The current book, I backed off early with the idea of doing "bad experiences on digs." The sexism on the dig in this book is a plot point, but it isn't defining the dig or Penny's experience (which is largely positive...next book, maybe, I can get her the Job from Hell.)

So The Early Fox is more-or-less NAGPRA and associated issues. I was barely able to touch on indigenous archaeology/community archaeology, the mound builder myth, and so on but at that, it did better than the Paris book in having something to do with archaeology.


Over previous books, I've already delved into archaeological looting, nationalist revisionism and irredentism. And pseudo-archaeology, but there are too many easy pickings there. I could easily fill a series with just running around after the Kensington Runestone or Bosnian pyramids. The fake artifact trade is interesting but I've got a book saved for that. And repatriation issues show up again with the boat episode (assuming I get around to writing either of those).


And somehow that thinking led to something that was an actual excuse other than "this is the random small town I ended up in next" for Penny's involvement. Brought in to help the artifacts look more authentic (for extremely suspect values of "authentic") in some kind of weird high-tech VR sort of thing. Less gaming company (why would they need a fancy tech center?) more experimental direct-mind-connection stuff.

Not really directions I wanted to take the series, but then having gunfights with ISIL over some Buddhist statuary isn't where I want to go, either.

Thursday, September 11, 2025

Experimenting with ACE-step


Yeah, I've gone to the dark side.

I said I was messing around with AI music creation via ACE-step. That one is a bit more towards a diffusion model, so (at the moment) less controllable than Suno, but potentially more flexible. They boast about the number of languages it can handle...but at the same time, admit that they really haven't trained it on opera, or (as my own small experiments seemed to show) much of anything at all outside of pop music.

(They also side-step around the sound quality, with the user base developing magic incantations that might be improving things if you squint, or might just be random "sometimes it doesn't sound quite so bad" and has nothing to do with their "and set cfig to 1.14" stuff.)

The one that I actually wanted to talk about was the attempt to do a villain song/"I want" song. The result is less Broadway, more Billie Eilish. Among other things, ACE doesn't know how to belt. Or like female singers in a range other than light soprano. It's understanding of "chorus" v. "verse" is "sing louder!" It also only takes lyrics, like anything else, as suggestions. 

But on the fourth or fifth run it came out with this weird melisma that's almost Sondheim or something. Pity it couldn't carry through. I could improvise a better chorus (it is an absolute natural to sequence this one, half-stepping higher and higher). Plus, at least one previous run was smart enough to do the intro on solo piano. I like the "is that supposed to be an oboe?" but it should enter later.

Like I said. Not really trained on showtunes. That's why it surprised me that this run, instead of repeating the chorus over and over, drifting further and further from the melody as it goes (and not even a Truck Driver Gear Shift!) it went for an extended instrumental break. Cool. On stage, this would be a dance, or perhaps an aerial dance.

The less said about the Wan2.1 animations, the better. At some point in the increasingly large and power-hungry tools you can script better, and start using LORA to keep the animator from going completely off model. At this level it is just as well I've got a 3060 ti with 12Gb that is practically a potato when it comes to video -- because Wan2.1 also loses the plot after 60-90 frames of video. So no Bryce-length render times. It also has some memory leaks so even if I could automate a batch, the second one would fail.

But on the other hand; this gave a nice chance to demonstrate using img2img mode for composing a Stable Diffusion image. I did those roughs in Paint3D, using my knowledge of how SD "sees" an image. The airship took a bit of inpainting as the AI really wanted to make either a blimp or a clipper, not try to get both in the same machine. The others I used simple convergence; rendered a half-dozen versions, picked one, then re-rendered on that one to zero it in.

At least with SDXL you can prompt a little. ACE-step and WAN2.1 go even more nuts about stray words -- my first attempt at the airship had "flying" in the prompt and the AI added wings to it. For the ACE renders, my prompts were as short as four words long.

The other lesson here is that the AI will willingly change lots of things, but it is real stickler about proportions. What it saw in the comp, it will keep. A big-headed guy with a short-necked guitar is going to stay that way through the generations, and it is really difficult to fix it once you are in the clean-up render phase.


Wednesday, September 10, 2025

Fox and goose

Spent a chunk of the weekend wrestling with mostly that class of problem. Now I'm outlined to the end. I know; I was outlined before. Now I have more detail.


For me, it isn't figuring out why there's a bomb on the train. Or how the heroes figure out that there is a bomb on the train. It is all about controlling when they find out. That's the part that annoys me and feels like it takes a lot more time than it should.

That's because the why is rooted in the scene in which that clue is dropped, and the ordering of the scenes is at the mercy of geography, time, and all the other plot threads that led to them finding the briefcase full of blasting caps. Like, the briefcase was dropped by the Crooked Man during the chase scene, but that means he was heading away from the station, so how did he end up back on the train?

Multi-value problems. 

Weirdly, this wraps all the way back to one of the basics. Having a bat-computer moment, "I just realized there's a bomb on the train!" is telling. You want the moment that information arrives to be dramatic, and that is practically synonymous with showing it.

Which is to say, it is the central business of a scene. Which makes the scenes inseparable from the clues, which means you have to figure out how to do them in the one order that doesn't end up with a fat chicken and an even fatter fox.


My concentration wasn't up for that for most of the last few days so I made progress on the new cover/series rebrand. I'd roughed out a logo but I didn't quite have the patience for dragging vectors around. And the font I used ("Adventure") isn't licensed for commercial use.

Took a chance on a guy in Fiverr who did not quite share enough English with me to make communication fluid, but seemed familiar with the style I wanted. Nobody in this business makes sketches anymore. The worst are the cover designers. They give you a finished cover and besides the limited number of revisions they allow, you feel bad about asking them to cut into a finished artwork.

That is because of the pipeline, which is heavily asset-based, and I understand. There's nothing wrong about having a cover that looks like other covers in the same genre. It makes it easier for the reader to find the book, and it give the reader a certain trust that they are buying the kind of book they wanted to be buying.

It can lead to asset over-use, though.


My Fiverrr guy was showing me full-color graphics. So, fine, maybe he liked sketching with paint tools. The order was explicitly for vector graphics, though. Was he going to turn this more cartoony paint into proper vectors at the last step?

Turned out, that was the finished art. The vectors were him throwing blobby paint sketch into auto-convert, making one of those impossible-to-edit vector "bit clouds."

So I'm doing the vectors after all. Sigh. Well, half the reason I hired an outsider is because I'm at best a craftsman and I wanted the imagination of an artist. And I liked his ideas. But that's left me even less eager to hire other people.

Because you are up against efficiency. They don't want to do your ideas; they want to do the idea that can be done with stock assets, quick-and-dirty "convert to vector," and more and more, AI. And on the way to talking you into picking that instead (via making a really glossy final version they hope you will be attracted to enough to go with despite reservations), talk you up into Facebook banner and animated logo and an AI image of a hand holding your book.

I already dropped a hundred on a cover I don't want to use. I may just have to go back to doing my own.

Monday, September 8, 2025

In a cavern in a canyon...

...excavating for bitcoins?

I'm not really where I want to be as a writer to tackle the New Orleans story. But more to the point, I've got this weird attraction towards the idea of sending Bill Bixby///Penny to Colorado or Montana or something, find a low-population bit of near-wilderness and put a massive super-advanced tech center for her to get in trouble in.

So, like, that new Google AI center, only located in an old coal mine. Or something else suitably techno-thriller epically over-scaled and real-world commercially implausible.



I can find a remote mountain or a salt mine or something else sufficiently crazy. I can invent a biotech firm or a nanotech lab or some other appropriately dangerous and dystopian technology.

What I'm having trouble with, is tying it to the conceits of the Athena Fox series. Which are that there is a subject of history somewhere in the mix (preferable a narrow or easy-to-define band in time and/or space). That there is an archaeological issue to raise, like archaeological ethics or the problem with field schools. And that at some point it goes Tomb Raider, in that there is some perfectly sensible reason why Penny finds herself crawling down booby-trapped tunnels or fighting with ninja.

For the Bill Bixby stories, she doesn't need a good excuse to be there. There's no dig involved. She got off the Greyhound somewhere in middle America. Or got a phone call.


Yeah, like I'm in a good place to be plotting a new book. I'm having shiny new "I'd rather be writing anything but this" syndrome. Actually, I feel okay about The Early Fox. It went different than I expected, but it looks like I'm hitting the beats I hoped to hit.

I'm enjoying writing it, it is just a lot of work and I'm having trouble coming up with the sheer concentrated brainpower to deal with the stuff that's outlined.

The Atlas-F silo won't be bad, I can stumble through the next Freeman conversations, the long chase down the Jornada de Muerto has complexities but it isn't anything I haven't done before.

No, but going to the rez is challenging. Not looking forward to tackling that. Worse, I decided I need to send her to the War Zone (to contact an urban explorer who can tell her about Lon's errand at the Atlas-F silo outside of Roswell). Which almost certainly includes cops, and that means the deputy sheriff at Alamogordo is firmly pencilled in now. 

And I still have to go back and do the nuke museum scene. The museum is fine. The "Los Alamos Wife" bit is going to take a little more. Probably not watching a full season of Manhattan. Fortunately, the museum has free online videos that are somewhat more...historical.




Saturday, September 6, 2025

I, for one, welcome our new...

...clipping service?

Yes, AI is a fancy plagiarism machine. But that doesn't make it entirely useless as a sort of clipping service; it goes out there and finds things that are like what you seem to be describing, shamelessly rips them off and shows you the resulting pastiche.

Which you then get independently inspired by and do your own, only vaguely similar, thing in response.

Since I went ahead and put ComfyUI -- it of the many nodes -- on my gaming machine (it's the one with the fast graphics card), I tried out text-to-music.

I am not impressed. Besides all the usual artist-type comments about how it doesn't understand anything about rhythm or harmony, and the front end was written by someone who doesn't know a fugue from a flugelhorn (and may have never even heard of the latter), the technical side of me was appalled.

The sound quality sucks rocks. It is damned near unlistenable.

Still, a couple of runs with the prompt "steampunk sea chanty" were amusing to listen to and tempted me for a long moment to go break out the more traditional tools.

This particular implementation (not that ACE is getting top marks in anyone's text-to-music books) totally sucks rocks. And not in a good way.


And this is technically interesting from two totally different directions. One is that getting a glossy appearance is one of the things AI is really good at (too good; it gets called out for that too-perfect "AI look" all the time). So why is ACE so bad? And is this endemic across using diffusion models for music creation?

For that matter, are there competing technologies and if so, what are they? I mean, we've been doing computationally generated music since before there were computers. With the robust ecosystem of software synthesis and sound manipulation, attacking the problem Band in a Box style (which was a pattern-fill sort of MIDI generator) seems more fruitful.

The other is that this seems so very...useless. Drawing is hard. It takes a lot of time to learn and even if you are good and using tools that are fast (some of which highly leverage computing in various ways) it still takes time to do.

Punching a keyboard controller is so much less effort in comparison. The barrier to music creation at home, now that there are things like Garage Band being given away with the operating system, seems very short to me.

But then, isn't that what AI is basically making a name for itself with? Taking tasks you didn't really need to automate, and automating them anyhow?

***

In more productive news, I made a full mock-up of my "map with red line" cover concept. I like it.

Since I'm not that interested in wrestling vectors for days, and I started with a non-commercial font in the first place, I reached out to Fiverrrrr to see if anyone would do the running series title graphic for me. Next step is taking it to my cover artists and asking for the "clean up my cover" service. They might be willing to do the font work, but they also might cheapen out, and it is a major design element for me. The rest of it is a lot of tweaking font fills and kerning (whatever the hell kerning is. I joke. I do know what kerning is). And that sort of thing, they can do just fine.

I'll see. Right now Fiverr is being Fiverr. People missing the point as they frantically run their AI translation software in the background, hoping that being able to pretend to share a language is all they need to pretend to share background and aesthetics as well...and relentlessly hoping to upsell me into an entire cover-and-book-edit while they are at it.

***

ComfyUI's implementation of the WAN2.1 also ran, but illustrated a very basic problem with the current paradigm of AI. No, the belief of legions of middle-management to the contrary, plain-language is not the appropriate tool to communicate technical needs.

We do have a vocabulary of shots and cinematography but that isn't how shots, at the level of a film, are constructed and communicated. There's story-boards, for one. And, yes, there is something like that moving in.

Look, better analogy. I've worked around a bunch of choreographers. They will often demonstrate moves, and work through them and count them out while teaching them. They don't just say, "It's just a jump to the left, then a step to the right."

So shoving 250 words into a text prompt is not going to create the combination of actor move and camera move that the eye of an artist desires. It can only make a shot similar to the shots it has seen other people do, that might look good enough to convince someone without those skills that it is communicating the directorial intent.