There's something weirdly detached about AI art. This is beyond the various "tells" that range from obvious to edge cases. It is this sense of there being no there there. A soulless-ness that is of a different quality than corporate art or mass-produced art or been-through-too-many-committee-meetings art. That's the backdrop that makes those glimpses of a real soul in there so striking.
The first level is seeing those human signs that are divorced from their original context. In Saberhagen's Berzerker series, the titular machines do not have vocoders or some other synthesized speech. Instead they talk using scraps recorded from the various humans who have been its prisoners. From word to word the age, accent, gender, and emotional content of the voice changes. And you get glimpses of those original personalities, the fearful, the resigned, the aloof intellectual, the child in terror.
It freaks me out as someone who has tried to learn visual arts, and who has studied the style and careers of various artists. Much of what the AI was trained on is effectively anonymous, as this was the output of commercial artists on contract or the various cogs in a studio system, or just artists you haven't yourself learned to recognize.
But you still see those things, those brush strokes, those choices in line, the way a mouth is drawn. Things that in context are part of a complete style and approach. That come out of a philosophy (quite possibly one that is ever-shifting, as an artist approaches different projects with different intents, and their career changes over time as well).
But you don't see the same thing across the image. One line is drawn one way. Another line, right by it, is drawn in some different way. Not completely different (the AI is more selective than that) and not usually pure (it always blends a little). So you probably won't see an Arthur Adams mouth or a Rob Liefeld foot, not perfectly preserved in amber. But you will have a blurry glimpse, a funhouse mirror version, enough to know without the slightest doubt that this particular artist was in the training material.
And this lack of a clear overall purpose is also visible in something else. Something that is most marked when some semblance of it is present.
And that is in composition. On the largest scale, on the scale of the total image, the AI will probably have a harmonious composition. Most of the material it was trained on had that, and enough of it was using similar rules (or, more precisely, a small set of different compositional approaches that themselves have a small set of fairly well-defined rules).
And some of this is because the current most popular approach of generative AI for visual art is progressive approximation, working from the broadest outlines in. So you could say the AI starts with blocking, with a massing study, and that is probably decent. Where it falls down is because there isn't a guiding purpose that makes sure that everything else unfolds according to it.
That's why AI is currently so shit at lighting, by the way. And yes, the AI-boosters will point out that it is "getting better" and will continue to get better. But it is doing so by brute force and band-aids. When the Hildebrandt's painted, they had a clear vision of sunlight (usually) coming from one direction, the reflection of what was often green earth coming from another. This was in their heads with every shadow, every shading of every limb or rock or tree.
And it was in the planning. When an artist like Jack Kirby penciled, he knew where there would be blocks of solid black, and he planned and placed them as part of the composition. They didn't just fall where you'd expect a shadow to fall.
The AI can capture this statistically. It places the shadows where they usually fall. It shades because the training examples shade. It only gets it right most of the time because most artists are doing the thing in similar ways and with enough samples it will probably not be led into producing an outlier.
But that's all aside of where I was going with this. There are AI images which go beyond simplistic harmony in their composition and actually move on to telling a story within it. Where there is a focus and eye-leading, where the pose isn't just a collection of average limb positions but actually communicates intent. When a character is engaged in a purposeful action that can be read.
And the reason is because there was an original image. I recently watched a video about a young artist (who has learned better since) who did quite nice drawings then sent them into AI to make them look more polished. His style was perfect for this, by the way. AI gets confused by linework and isn't going to save it anyhow, but blocks of color tells it the ideas you want it to flesh out.
There are other routes to this soul transplant. One is by over-prompting the AI to copy a single original artwork you've been inspired by. You might ask it for an astronaut on the moon and with any luck at all you will get one of the three top images from Apollo 11 that appear over and over across media. With a few screwy details, because it is still mashing together multiple sources and not all of them came from the photograph in question or even from the real space program. Or you might outright name the artwork and/or artist, and surprise surprise, you can absolutely get the Mona Lisa back.
But there is also image to image. And that is, in the essentials, using AI as a Photoshop filter on an existing artwork. Or photograph, or whatever.
In fact, there are entire models that are absolutely and unapologetically based on this.
In ComfyUI, one workflow -- right there on the front panel of the application -- is to take an image of a person (which could be the actual original photograph of a real person) and a screen-grab from a movie or a TikTok, and put the choreography of the one on the other. This isn't an edge case or an abuse of the model, this is how it is intended to be used. ComfyUI supports plenty of models which are text-based and even in the above the final results can be shaped by a more typically generative process, but this is absolutely making a paste-up of two real (stolen) things. It is necessarily so; the toolchain doesn't work if you don't provide these two pre-existing things for it to mash together.
What is sad here, though, is the glimpses of a potential partner. There are parts of making art where the amount of expression you put in is too low a ratio for the effort put in. Inking is wonderful stuff. Erasing the pencils, not so much. And few people want to grind their own pigments.
Somewhere in there is the ability to partner, to have a dialog, to use technology to draw the inbetweens or fill in the blocks. Some of it, we have. Some of that has been with us for a while. There's a reason why Photoshop survives the shitty business practices of Adobe.
Just, this thing we currently have that we call AI is being applied to the wrong parts of the process, and too often for the wrong reasons. We could have a tool that would allow us to work as artists, and automate the parts of the process that are least human. Instead we have a tool we are using to bypass the parts of the process that are the most expressive of artistic intent; to do those things that should be human.(And, in the end, fuck up the small details we might have turned to it to do.)
Also... have you looked at the price of RAM lately?