Above: Lensa interpretation of a self-portrait.
BitDepth#1387 for January 02, 2023
In the last six months, the art world and photographers have been thrown into the spin cycle of artificial intelligence (AI) technology with the advent of multiple computer clusters that convert text and image inputs into output that looks very much like art.
The technologies used here emerged from the reversal of research into image recognition technology.
Scientists who had given serious thought to what might make images recognizable to computers turned the script around and began to consider what might come out of computers when asked to create art.
There are two main ways this is done in a growing number of image generation portals.
In some, you enter a text prompt, a description, a sorted keyword style, what you want the image to contain or be influenced by.
This can include locations, art styles, famous people, and other descriptions that collectively affect what those computers synthesize.
In others, you upload an image and the software freestyles it.
I opened half a dozen of these tools, many of which require upfront payment, and ended up trying the popular MidJourney portal and Lensa app.
Lensa, from the company that created a short-lived popular prism art filter app a few years ago, is an all-purpose AI-powered image enhancement tool.
For a fee, it also uploads a batch of your selfies to an AI server to create imaginative avatars.
Three hundred photos and $7 later, my collection is a tantalizing gallery of all the ways AI can get human facial features and limbs wrong.
For some reason, Lensa is pretty overwhelmed that I should also wear open-collared suits and a variety of hats, from baseball caps to snazzy-looking fedoras.
Lensa’s output struck me as almost completely appalling, but I’m a picky photographer.
MidJourney is accessible via bots running on a Discord channel. When you sign up, you get 25 free opportunities to create something digitally amazing.
As with Lensa, I threw various photos of myself into the software with a series of prompts and the results were just as unusable but more interesting.
MidJourney seemed to have no analogue to my caramel skin tone and frequently hopped on Caucasian renditions. When I added prompts like “African” it immediately jumped to sort of a generic chocolate shade.
Every other rendition of my submitted portrait added hair, and when I added the prompt “bald” it immediately switched from either aging me or adding hair to provide an unrecognizable youthful and angelic version of my image.
These mid-journey generations were made from self-portraits that I have on file. The results are highly dependent on the prompts used.
MidJourney processes the submitted image based on the text, and based on my findings, there’s no question that understanding prompt text is critical to AI art.
I asked two local designers and artists about the work they created with AI. Ian Reid, Art Director at Reid Designs, refines his prompt.
“You no longer have to pay a mas designer to come up with costume ideas,” Reid explained. “I’ll do that for you by being the fast engineer. Right now I’m learning everything there is to know about Prompt so I can get what’s asked for.”
“I’m interested in rendering images that are requested by customers that we can’t find [stock image agency] Shutterstock. “Young Indian woman from Trinidad using a phone,” for example. My goal is to see if I can fit into a niche where I can offer a service to customers who don’t find a solution anywhere else.”
But where do these new images come from? Image AI computing clusters are trained by analyzing a sample of images that are fed to them.
The most powerful AI tools have been trained by scraping publicly available images across the internet and assimilating them billions of times.
This is where the mystery of image-generating AI becomes even more complex.
This vast database of images is sorted and analyzed using deep learning, which processes information using algorithms inspired by brain function. It’s the most sophisticated form of AI, but things get even more confusing from there.
All of this information is then categorized into what mathematicians call the latent space, which organizes and references the processed data into dimensions far beyond the three dimensions humans use.
This is where these AI engines produce their output when prompted to do so by prompts.
This latent computational space is either an approximation or an extension of the free associations the creative mind makes based on sensory input, but it proceeds unhindered by human limitations or experience.
This is why thumbs inexplicably appear in minds, alongside other Dali-like incongruities. Salvador Dali himself might have been the best person to interpret the latent space for us.
The technology is clearly still in its infancy, and all of these users are continually training these clusters through acceptance and rejection.
They only become more accurate in their synthesis of art styles, pictorial components, and fusions of elements.
“I’ll never use it for my personal art,” said creative director Anthony Burnley (read the full interviews with Reid and Burnley here).
“Because it’s not my art [but] Definitely for my commercial job. I see it as some kind of clip art on steroids. Or stock art or stock photography, they all come from a similar direction and try to solve similar problems. The challenge is who makes the best use of it.”