Last Updated on July 26, 2022 by Chandrakant Isi
They say, “AI is likely to be the last human invention”. It is not hard to imagine the rationale behind it. While human intelligence is on a steady decline, Artificial Intelligence is becoming more sophisticated. It is replacing workers in warehouses, driving cars, baking pizza, and even diagnosing patients. And if all this wasn’t enough, machine learning is now eyeing the last bastion of human potential — art. Driving my point home, the lead image of this article has been created by an AI.
California-based start-up Open AI has trained an advanced neural network dubbed Dall.E, which is capable of generating photorealistic images and pieces of art from natural language prompts. If you instruct an artist to create a “High-quality photograph of a robot serving vegan food in a futuristic restaurant”, he or she will require at least 10 mins to comprehend the image composition and setting. The latest version of Dall.E, on the other hand, can deliver results in just 10 seconds.
What’s Dall.E?
The company behind Dall.E, Open AI, is at the forefront of innovations when it comes to machine learning. It has several high-profile backers including Elon Musk and Peter Thiel. Its biggest breakthrough has been the third iteration of the Generative Pre-trained Transformer (GPT). It is a Natural Language Processing (NLP) model based on a whopping 175 billion parameters, tuned to generate literature. Researchers have used GPT-3 to create articles, dialogue, poetry, and even computer code with minimal user input. After threatening the jobs of writers, poets, and developers, the AI researchers then put graphics designers on notice by creating an image generation tool Dall.E 2.
This AI tool’s name is inspired by the titular robot character from Disney’s Wall.E and legendary Spanish painter Salvador Dali. Open AI claims that Dall.E 2 doesn’t simply modify exiting images but creates them from scratch. To predict what the user is asking from the system, Dall.E 2 relies on an enormous database of 650 million images and captions it was trained on. What makes Dall.E 2 so special is its ability to perform well at zero-shot learning. For those not in the know, zero-shot learning is the AI model’s ability to identify or classify things beyond the training database using association. For instance, if you train an AI to identify apples, oranges, and guavas as fruits, it will classify peaches as fruits even though it was not part of the training data.
Dall.E 2’s Strengths and Weaknesses
Although in beta stage, DalL.E 2 has been delivering some stunning results. The mages produced by this AI can give stock image services a run for their money. Similarly, it is also going me make many artists uneasy with its turnaround time of mere 10 seconds. It understands image styles including photorealism, renaissance paintings, art deco, impressionist, abstract, surrealism, digital art, and 3D render. Define vibes such as steampunk, post-apocalyptic, gothic, fantasy, biopunk, soviet design, etc. Moreover, you can choose the lighting conditions like sunlight, overcast, studio lighting, low-key lighting, golden hour, and twilight to name a few.
Image rendered with sunlight Image rendered with sunlight
Image rendered with neon lights Image rendered with neon lights
While Dall.E 2 understands natural language, you have to be precise with the prompts for best results. If you ask the AI to create an “image of a modern living room with LED lights and a large fish tank”, it sometimes mounts the aquarium on the ceiling. You then have to correct it by adding the line “next to a TV”.
Then, there are some deliberate limitations to block the creation of “harmful content”. As per the Open AI content policy, any form of action suggesting harassment, violence, self-harm, nudity, obscene gestures, depiction of drugs, vandalism, and deception is barred. The AI also blurs and warps celebrity faces to prevent misuse. While such measures are essential to curb the harmful content, the current iteration of Dall.E is quite heavy-handed with these restrictions. For instance, the word “gun” is banned without context, and as a result, you can’t even mention the harmless movie Top Gun in Dall.E 2 prompts.
Now that we have the basics down, here are a few results I managed to get from Dall.E 2:
Prompt: High quality photograph of an astronaut at a crowded Indian train station
Prompt: Photo of a cat wearing racing goggles driving a light green classic Mini cooper with motion blur
Prompt: Photorealistic penthouse touching clouds at golden hour
Prompt: A still of a Minion in Apocalypse Now (1979)
Prompt: 70mm close up photo of Nero with the great fire of Rome in background, high-resolution, cinematic lighting
Prompt: High quality photo of an aircraft carrier ship stranded in a small pond in Mumbai city
Based on these results, you can tell that Dall.E 2 can create images and artwork where the human mind can struggle to find a reference point. On the other hand, it still lacks finesse in certain scenarios and may even fail to understand the instructions. Overall, with its current capabilities, Dall.E 2 can complement the artists rather than replace them.
[Open AI offers limited credits. Kudos to Dnyanesh for sharing his credits to wrap this article]
Discussion about this post