It is no secret that text-to-image diffusion models like DALL-E 2, Midjourney and Stable Diffusion can create amazing generated art and that they can capture the style of existing human artists exceptionally well. However, how well would they do at recreating an artists style if only prompted with how their style is but without explicitly being told the name of the artist?
More specifically, can these models recreate some of the greatest works of art from only being told the content of the painting and style of the artist? If so, could this mean the end of human works of art? Could we simply replace the content of fancy art galleries with cheaper AI generated recreations? Lets find out!
To begin with we need descriptions of the artists and their respectable works of art, and what better way to do that but to ask another AI: ChatGPT!
For each artwork lets first get a description of the artist. I can do this by ask ChatGPT the following:
I ask for the style in keywords because that will make it easier to shoot into a prompt for the Diffusion models.
Next lets get a description of the artwork from ChatGPT:
Nice, now we have our raw input! I then do a bit of prompt engineering to remove all mentions of the original artwork and artist (in this case Starry Night & Vincent Van Gogh) & condense the remaining information to a more Diffusion friendly format.
The final prompt for Starry Night ended up as:
An oil painting depicting a nighttime scene of a small village with a large cypress tree in the foreground and a rolling hill behind it. The sky is filled with swirling stars and a bright crescent moon, creating a sense of movement and energy. Post-Impressionist, bold colors, thick brushstrokes, emotional, expressive, impasto, landscapes, sunflowers
So with that in place, lets dive in to some great works of art and their (maybe) great AI generated counterpart! For each artwork I will display the prompt used as well as the results from DALL-E 2, Midjourney & Stable Diffusion.
The Starry Night – Vincent Van Gogh
An oil painting depicting a nighttime scene of a small village with a large cypress tree in the foreground and a rolling hill behind it. The sky is filled with swirling stars and a bright crescent moon, creating a sense of movement and energy. Post-Impressionist, bold colors, thick brushstrokes, emotional, expressive, impasto, landscapes, sunflowers
Interestingly they all seem to focus a lot on the cypress tree, however with only Midjourney really being able to capture the same shape of the tree as in the original artwork. Stylewise DALL-E 2 is definitely closest. It also did a good job of capturing the same style of crescent moon as depicted in the original artwork. Stable Diffusion is completely missing the mark on this one. DALL-E 2 wins this one.
Mona Lisa – Leonardo da Vinci
A 16th-century oil painting of a seated woman facing the viewer with an enigmatic smile in a relaxed pose wearing a fine dress, surrounded by a winding road, water, and a bridge in the background, Renaissance, realism, naturalism, detail, perspective, lifelike poses, three-dimensional, light, shadow, human experience, symbolism, emotions, master of techniques, fresco, oil, tempera, influential
For this one they all went for more or less the same renaissance-y style and similar composition. Midjourney did a good job of capturing the enigmatic smile Mona Lisa is famous for, however Stable Diffusion came close at capturing her clothing and posture. I will give this one to Stable Diffusion even though her torso appears to be on the wrong way.
Liberty Leading the People – Eugène Delacroix
A 19th century painting of a bare-breasted woman holding the tricolor flag of France, leading a charge of soldiers and civilians over the bodies of the fallen, French Revolution, Romanticism, bold colors, dynamic compositions, emotional expressiveness, historical and literary themes, fascination with exotic cultures, influence from Rubens and Caravaggio, lyrical brushwork, glorification of nature, liberty, equality, and fraternity
For this one Midjourney does a fantastic job of capturing the fierceness and chaotic style of the original artwork (even though it complained about the “bare-breasted” part). The colour palette is also not far off. DALL-E 2’s image is too innocent. It could just as well have been a woman celebrating France winning the Eurovision or World Cup. I had a tough time getting something useful from Stable Diffusion for this prompt but ended up with this tricolor clad lady. Midjourney wins this one.
The Creation of Adam – Michelangelo
A 16th century fresco painting of God is depicted as an elderly bearded man who extends his finger to touch the outstretched finger of Adam, surrounded by a throng of figures from the Bible, prophets, sibyls, angels, putti, renaissance, mastery of anatomy, dramatic compositions, classical, intricate details, naturalism, dynamic poses, marble sculptures, frescoes, Pieta, emotional intensity
Note to self: Diffusion models does not like peoples fingers touching. All models struggled on getting the content right for this piece. They can create people with pointing hands but getting them to touch is a whole different story. Through more advanced prompt engineering I would probably end up with a version where their fingers touch like in the original. DALL-E 2 get closest with two hands where the fingers are “almost” touching. Midjourney manages to match the style of the original piece well, even though Adam is looking rather old and as if he is scolding god. I will let Midjourney win this one.
The Persistence of Memory – Salvador Dalí
A 20th century painting of a barren, dream-like landscape with several soft, melting pocket watches draped over various objects, including a tree branch, a bottle, and a board, Surrealism, dream-like imagery, psychoanalytic themes, optical illusions, challenging reality, irrationality, fantasy, symbolist elements, technical mastery, unconventional symbolism, melting clocks, wide-ranging influences, meticulous details
This must be an exciting task for DALL-E 2 as it gets to generate an image by the person it is partially named after! All Diffusion models manages to hit that surrealistic vibe, but in quite distinct ways. The tree, which in the original has just a single branch is much larger in all of the generated images, though it is still withered like in the original. All images display clocks but sadly none of them are melting. DALL-E 2 and Midjourneys images are closest to the original motive. I will give this one to Midjourney as the style and content is most Dali-esque in my oppinion (sorry DALL-E 2).
A tour of the Art Gallery of “fine” AI Art
Lets put these creations to the ultimate test by opening up the Art Gallery of “fine” AI Art!
Welcome to the vernissage. The Gallery has many fine piece on show:
That concludes our tour of the Art Gallery of “fine” AI Art. The Art Gallery where nothing is made by real people!