Monday, May 6, 2024

Unstable Concentration

I finally got AUTOMATIC1111 running again on the new SSD/clean OS install. And that may not have been a good thing.

Somehow, something stuck an older version of Python on there and I just could not get Path or whatever the AUTOMATIC1111 package was using to find the proper 3.10 version (the only one that is fully compatible with current AUTOMATIC1111). Which also seems to have changed some elements in the GUI since the last time I installed it but it is mostly to the good (I can't figure out where it hid the "rebuild faces" checkbox but that didn't always work well.

A stroll through Civtai to see what was new was...informative. As with all things that have been around longer than six months, Stable Diffusion has fractured into competing standards. There are multiple base models now with varying degrees of compatibilities. And as always, all the cool LoRas only work with the very latest Checkpoint, which is generally some weird niche thing you wouldn't want otherwise.

So there's that. The other thing is sort of a blow to the idea that with the millions of source images fractured into mathematical descriptions of pixel relationships, SD isn't really "copying" things. Well, the vast majority of the LoRas (think of them as being tiny checkpoint-like files that are over-trained -- often by using new prompts which weren't in the original training data -- to narrowly focus their response to prompts) are aimed at replicating specific media characters and specific actual people.

A relatively small number of the offerings are along the line of artistic styles (there's two out there that try to capture the look of the illustration of the Voynich Manuscript), general aesthetics (diesel punk) or concepts that the AI otherwise finds very hard to do (like worms-eye camera).

And, yes, there are more and more tools to try to fix the things AI still gets very wrong. 

This does make it less accessible -- besides the wrestling-with-python thing -- because with more and more complex options and more interactions to keep track of it is becoming more of a learning curve. And also with the rapid changes, it takes more attention and more time for the would-be user.

Commercialization is sneaking in, although at least at Civtai it is mostly people posting only the "lite" LoRo and linking the full one on their Patreon. And from the other side, too; for all the effort and time it is significantly less production time and certainly less studying time than more traditional art, meaning it is less valued (and that's before you get to the flaws of AI; the weird errors and the Uncanny Valley of it) so people are working at finding venues where they can dump tons and tons of AI art for whatever small return they can get out of it.

And also. The small reference pools and the over-training of the typical LoRa means they have even more of the prompt side effect. Such as, if you ask for Steve Jobs the Apple logo will usually show up; because most of the training material was taken off presentations or advertisements. The LoRa get strange emergent behavior. Sometimes you can identify the exact image that is weighing heavily on these supposed random algorithms. Other times you get weirdness; one LoRa creator admitted that for some reason when used at higher values the Diesel Punk LoRa will tend to add a turtle somewhere.

Sounds like the attempts at level creation by AI. Asked for a police station, the AI level designer made sure to put donuts on all the tables.

But anyhow. I got engrossed. I lost track of time and forgot to eat or drink, and with that and a nasty bug (possibly stomach flu) I got dehydrated so bad I had to go to the ER.

So not, really, a very productive day.

No comments:

Post a Comment