Pages

Saturday, September 6, 2025

I, for one, welcome our new...

...clipping service?

Yes, AI is a fancy plagiarism machine. But that doesn't make it entirely useless as a sort of clipping service; it goes out there and finds things that are like what you seem to be describing, shamelessly rips them off and shows you the resulting pastiche.

Which you then get independently inspired by and do your own, only vaguely similar, thing in response.

Since I went ahead and put ComfyUI -- it of the many nodes -- on my gaming machine (it's the one with the fast graphics card), I tried out text-to-music.

I am not impressed. Besides all the usual artist-type comments about how it doesn't understand anything about rhythm or harmony, and the front end was written by someone who doesn't know a fugue from a flugelhorn (and may have never even heard of the latter), the technical side of me was appalled.

The sound quality sucks rocks. It is damned near unlistenable.

Still, a couple of runs with the prompt "steampunk sea chanty" were amusing to listen to and tempted me for a long moment to go break out the more traditional tools.

This particular implementation (not that ACE is getting top marks in anyone's text-to-music books) totally sucks rocks. And not in a good way.


And this is technically interesting from two totally different directions. One is that getting a glossy appearance is one of the things AI is really good at (too good; it gets called out for that too-perfect "AI look" all the time). So why is ACE so bad? And is this endemic across using diffusion models for music creation?

For that matter, are there competing technologies and if so, what are they? I mean, we've been doing computationally generated music since before there were computers. With the robust ecosystem of software synthesis and sound manipulation, attacking the problem Band in a Box style (which was a pattern-fill sort of MIDI generator) seems more fruitful.

The other is that this seems so very...useless. Drawing is hard. It takes a lot of time to learn and even if you are good and using tools that are fast (some of which highly leverage computing in various ways) it still takes time to do.

Punching a keyboard controller is so much less effort in comparison. The barrier to music creation at home, now that there are things like Garage Band being given away with the operating system, seems very short to me.

But then, isn't that what AI is basically making a name for itself with? Taking tasks you didn't really need to automate, and automating them anyhow?

***

In more productive news, I made a full mock-up of my "map with red line" cover concept. I like it.

Since I'm not that interested in wrestling vectors for days, and I started with a non-commercial font in the first place, I reached out to Fiverrrrr to see if anyone would do the running series title graphic for me. Next step is taking it to my cover artists and asking for the "clean up my cover" service. They might be willing to do the font work, but they also might cheapen out, and it is a major design element for me. The rest of it is a lot of tweaking font fills and kerning (whatever the hell kerning is. I joke. I do know what kerning is). And that sort of thing, they can do just fine.

I'll see. Right now Fiverr is being Fiverr. People missing the point as they frantically run their AI translation software in the background, hoping that being able to pretend to share a language is all they need to pretend to share background and aesthetics as well...and relentlessly hoping to upsell me into an entire cover-and-book-edit while they are at it.

***

ComfyUI's implementation of the WAN2.1 also ran, but illustrated a very basic problem with the current paradigm of AI. No, the belief of legions of middle-management to the contrary, plain-language is not the appropriate tool to communicate technical needs.

We do have a vocabulary of shots and cinematography but that isn't how shots, at the level of a film, are constructed and communicated. There's story-boards, for one. And, yes, there is something like that moving in.

Look, better analogy. I've worked around a bunch of choreographers. They will often demonstrate moves, and work through them and count them out while teaching them. They don't just say, "It's just a jump to the left, then a step to the right."

So shoving 250 words into a text prompt is not going to create the combination of actor move and camera move that the eye of an artist desires. It can only make a shot similar to the shots it has seen other people do, that might look good enough to convince someone without those skills that it is communicating the directorial intent.

No comments:

Post a Comment