Is DALL-E a Genius? – Notes

It’s hardly news to anyone that we’re in the midst of the first image generation craze in human history. Over the past year, OpenAI, the San Francisco-based artificial intelligence company founded by Elon Musk and Sam Altman, has released two generations of its latest AI: DALL-E, a portmanteau of “Dalí” and “WALL-E”. DALL-E builds on the company’s last successful version, GPT-3, and generates images based on a text prompt. Immediately after its release on January 5, 2021, the internet was flooded with generated images. DALL-E gives its users the strange ability to create a photorealistic representation of anything that comes to mind. From the more silly “teddy bears in AI research on the moon in the 1980s” to the slightly more serious “MC Esher-style Mobius strip fractal”, any sentence can suddenly be transformed into an image. Some of these images are very beautiful and it seems that DALL-E’s pieces have real aesthetic value. This raises many questions: have we built a machine that can imagine something? If so, does it imagine the same as us? And perhaps most disturbingly, at least for anyone with a slightly dystopian attitude towards AI, is the age of human art now over? Did we make the first awesome program? Perhaps this really is an electronic Dalí, as the name suggests. On its website, OpenAI says its “hope is that DALL-E 2 will empower people to express themselves creatively.” Can self-portrayal be outsourced to a machine? Can an automatically generated image have something to do with self-portrayal? It certainly takes a genius or an analyst to lead someone else to self-expression. Think of Socrates, the midwife of philosophical positions.

Can one say of DALL-E that it introduces itself? If we stay with the “image” in “imagine” and understand imagination as an imaging ability, as in German educational power, then certainly yes. This is precisely where DALL-E excels. DALL-E can visualize many more details than we can, and the inventions of his imagination are far more impressive. As ridiculous as it is to say, DALL-E imagines it better than we do. Still, there’s a key difference in the way it takes its pictures. While it’s better in a sense, it’s far more important that it imagines itself differently than we do. DALL-E’s neural networked imagination is qualitatively different from ours. François Chollet, a prominent AI researcher at Google and developer of Keras, a widely used deep learning toolkit, recently tweeted about these differences, writing: “For the most part, humans are unable to reproduce the visual similarity of anything. But they know what the parts are (2 wheels + 2 pedals + handlebars + saddle). On the other hand, A [deep learning] The model is excellent at reproducing local visual similarities (what fits on) but lacks an understanding of the parts and their organization.

This difference is fascinating: while we imagine in terms of conceptual schemas, a model like DALL-E seems to function by texture (hence the photorealism). This refers to a fundamental difference between how we understand concepts and how a model works (if it does). For us, a concept becomes applicable to experience via a schema; this is a central tenor in Kant’s schematism, which allows the imagination to mediate between intellectual concepts and intuitions. For us “the term dog means a rule by which my imagination can determine the shape of a four-footed animal in general.” DALL-E correlates “dog” not with a rule for constructing dogs, but with the usual color patterns of faces, legs , skin and fur. We could briefly say that DALL-E introduces itself with content where we introduce ourselves with form. It imagines better, because for DALL-E the quality of the image is the essential, while for us the image only requires certain formal characteristics to be satisfied. Here is the usual difference to these models, that they are absolutely detail-oriented and have no rough sense of the whole. Saying that a dog has four legs requires a rough sketch of what kind of animal a dog is, and models only focus on the structure of the information encoded. For us, a concept is abstract; for a model it is a fusion of concrete things.

Is the content focus preventing DALL-E from being a genius artist? If I were to tell you about a person who can create beautiful and vivid images of anything you ask of her and even perfectly imitate the style of other artists, the word “genius” would not be far off your tongue. Still, it seems a totally inappropriate description of DALL-E. In fact, I’ve only seen it attributed to DALL-E when it was in Comic Genius. Why is that? What is a brilliant artist doing that is so alien to DALL-E? The genius artist sets an aesthetic standard and opens up new avenues of sensibility, showing in the work itself an example of the logic of this new kind of sensibility. The genius in this sense is an instructive figure whose primary means of education is the work of art. DALL-E may teach us many things about the generative abilities of machines, but it is instinctively obvious that it cannot teach us anything about our own sensibilities. Looking at the images generated by DALL-E, one does not get the impression that there is still much to learn about beauty. One marvels at the flawless technical work. This draws on something common to our expectations of AI, that it is an alien and independent producer of meaning. On that front, it’s almost certainly doomed to disappoint. Where the lack of meaning we experience prompts us to create meaning, the lack of meaning has a positive, productive presence; the insignificance in a deep learning model remains completely negative, it cannot do anything with it.

So, if not an artistic genius, why do some feel that DALL-E is a “comics genius”? DALL-E opens up some interesting and dangerous avenues of human endeavors, mainly related to the fate of visual artists in our society. But we should focus on what it is now, on that actually existing AI, so to speak. Currently, DALL-E could best be characterized as a meme machine. Most content created with DALL-E is what you call “high-effort memes”: something stupid that requires a lot of work. (There’s the famous example of “The Bee Movie, but every time someone says bee, it speeds up,” and its many, many variations.) As Wikipedia puts it: “Most coverage of DALL-E is focused to a small subset of ‘surreal’ or ‘quirky’ issues.” That’s no coincidence. When we are presented with something that converts our wildest thoughts into images, the first instinct is to do something absurd. (Of course, this is only true because OpenAI has banned pornographic solicitations. After porn, gaming is the most popular use of technology.) So we ask for an avocado on a lawn chair, Hegel on a bagel, or two baked potatoes at a business meeting. These intentionally silly prompts have an interesting effect when built into a powerful, cutting-edge piece of technology. There’s an inherent incongruity about it. The material is absolutely ludicrous, while the machine that operates it is as serious as it gets. The highest is connected to the lowest. As Herbert Spencer points out in his Physiology of Laughter (1860), which is generally pretty funny. As it stands now, DALL-E is an invitation for us to play, and it’s fun to play with something very serious. DALL-E could be a comic genius because, in the manner of a good comic actor, he can turn plain old nonsense into comedy gold. In the end, both are about performance.