DALL·E, a portmanteau of the names of the artist Salvador Dalí and Pixar’s WALL·E, can take any text and create an image from it. The system uses a neural network that’s been trained on billions of pictures and text examples. It’s one of a growing number of AI projects that can mimic, but not replicate, human beings’ creative output. “Because natural language is constantly evolving, and very dependent on contextual nuance, teaching a machine to understand language well enough to draw a picture is a very significant achievement,” Tamara Schwartz, professor of cybersecurity at York College of Pennsylvania, said in an email interview. “Imagine a police sketch artist, that’s a rare talent, having the ability to create a picture based on a witness description.”
Using Big Data to Produce Images
DALL-E was created by the AI research company OpenAI and works by accumulating vast amounts of data from the internet. The data is then processed by a natural language model and is trained to produce images from text. DALL-E works similarly to the recently released GPT-3, a language model created by OpenAI that can be prompted to generate original text passages. GPT-3 was trained using half a trillion words of internet text and can produce surprisingly lifelike text.
Michael Yurushkin, founder and CTO of BroutonLab, a data science company, said in an email interview that DALL-E is “one of humanity’s few successful jabs at emulating our creativity and imagination.” He added, “It’s easier to realize how AI predicts something by going through relevant data, but understanding how it’s able to generate drawings out of things it never heard' about before is more difficult."
Schwartz is careful to note that the AI is not creating information, but rather taking language data and transforming it into images.
"The initial creativity comes from the human who constructed the task," Schwartz said. "There is some
creativity’ on the part of the AI, because it experiments with various combinations of data and then selects from a number of potential outputs. However, a human is examining the outputs and teaching the AI how to select from the many combinations.”
Robot Detective Work?
A machine can experiment with this data and object combination much faster than a human artist. Schwartz noted that DALL-E could one day partner with a detective trying to reconstruct a crime scene through a sketch, based on eyewitness testimony. “As witnesses provide their statements, the computer could take that spoken, natural language information and create a drawing of the scene, or many drawings of the scene,” she said. “These visualizations could then be integrated to create a more precise image of lost evidence. This visualization could be enriched by integrating previous imagery of the location prior to the crime.” Several other AI-driven programs can produce art. For example, Ai-Da uses a robotic arm system and facial recognition technology paired with artificial intelligence to create art. The system can analyze an image put in front of the machine, which feeds into an algorithm to produce the robot’s arm movements. However, human artists shouldn’t worry that robotic overlords will replace them, argued Ahmed Elgammal, the director of the Art and Artificial Intelligence Lab at Rutgers University, in The New York Times last year. “While the definition of art is ever-evolving, at its core, it is a form of communication among humans,” he wrote. “Without a human artist behind the machine, AI can do little more than play with form, whether that means manipulating pixels on a screen or notes on a musical ledger. These activities can be engaging and perceptually intriguing, but they lack meaning without interaction between artist and audience.” After taking a look at DALL-E’s work, I understand Elgammal’s point that the AI-created images aren’t art. On the other hand, they are better than any art I could create. So, really, what’s the difference?