AI is used for many things across many fields. Among the use cases, some have quickly won over the general public: text-generating AIs (ChatGPT, Bard...) and image-generating AIs. This interest can probably be attributed to their ease of access, their "sandbox" appeal (who hasn't sent silly prompts to ChatGPT just to see what it would answer?), and most importantly, their usefulness to a wide range of people, from professionals to curious individuals. If you've read the title, you'll have figured out that this article is about the second category: AIs that generate images.
Before giving you a (non-exhaustive) list of the main players in the field, we first need to explain what the average person can do with them. The primary feature of these AIs is, of course, generating an image on command. Where it gets interesting is that there are many ways to influence the generated image.
This is the simplest and most common use case. The idea is to provide text to the AI (a "prompt"), which it will use to generate an image. A quick example:
Image generated with the text "A studio photo of a rainbow coloured cat"
Although some AIs are starting to better interpret other languages, English is the way to go. It's also worth keeping in mind that the AI doesn't "understand" what it generates. It can therefore be helpful to guide it by describing a scene in more detail if the first attempts are unsuccessful. For the same reason, it's recommended to guide the AI regarding the style to give the image. Words like "photorealistic" help achieve a more "photographic" look, or "painting of..." for a painting, etc.
The principle remains very similar, but here the idea is to provide a "starting" image to the AI for it to reinterpret or modify. This can be a photo, a painting, or a simple drawing. Example:
The image on the left was generated with the image on the right and the text "painting of an angel, gold hair, wearing laurels, wings, bathed in diving light, head halo, christian art, goddess, art nouveau, tarot card, rococo"
Of course, these generations will require a few attempts, tweaking the prompt and parameters before you're satisfied.
The goal is to modify an existing image, but not entirely: you start by "painting" over the part of the original image you want to rework. Then you provide a prompt defining what the final image should look like once the painted area is regenerated:
The man on the right was generated from the man on the right whose mouth and top of the head were painted over, with the prompt "a happy red-bearded guy, wearing a hat with flowers in it"
Here, only the mouth and the top of the head were modified. This allows you to touch up existing images, but also to correct errors in an image generated from simple text. You start with text-based generation, then when the image is satisfactory, you modify the areas that need improvement via inpainting.
A direct extension of inpainting, outpainting allows you to ask the AI to imagine what lies beyond the frame of the original image.
In the center, you can recognize Vermeer's original painting, Girl with a Pearl Earring. The AI then generated the room around her.
Since the goal isn't to present one by one the (too) many incredible capabilities of these AIs, here is a list of more advanced processes that won't be shown in detail:
DALL-E was the first AI performant enough to create usable images and accessible to enough people to go viral. While being first doesn't necessarily mean being the best, DALL-E 2 remains a very important player in the field (the upcoming release of DALL-E 3 is extremely promising).
The main thing is that the interface is ridiculously simple to use. Type what you want to see, click "Generate" and within seconds, you'll have four AI-generated variations to choose from.
Unfortunately, trying DALL-E 2 isn't free, but at $15 for 115 credits, that's about $0.13 per message or $0.0325 per image variation.
To try it: https://labs.openai.com
It's the first AI to have won an art competition (without the judges knowing, of course). It's currently my favorite in terms of the quality of generated images, scene consistency, and more.
Unfortunately, Midjourney suffers from a major drawback: the beta version is only accessible via Discord. Once you've joined Midjourney's Discord server or invited the Midjourney bot to a server you control, you can send a message in the chat channel in the form /imagine. Four variations of your message are then generated, which you can subsequently download, upscale, re-edit, etc.
By default, every image generated is displayed publicly on the Midjourney Discord. This gives the whole thing an interesting community aspect, but it also means that anyone interested can see what you create. While this isn't necessarily a problem for artists, it can be an obstacle if you're looking to use Midjourney for professional purposes.
Midjourney's free trials are currently suspended due to the overwhelming number of people trying to use it, but they are sometimes reinstated for a few days. If you miss a free trial window, the basic plan starts at $10/month and includes 3.3 hours of GPU time per month, or about 200 images. You also have the option to purchase additional GPU time, and you can use your images for commercial purposes.
To try it: https://docs.midjourney.com/docs/quick-start
Unlike DALL-E 2 and Midjourney, Stable Diffusion is an OpenSource solution. This means that anyone with the required technical skills can download it and run it locally on their own computer.
This also means it's possible to train and fine-tune the model for specific purposes. For this reason, almost all services that use AI to generate artistic portraits, historical portraits, architectural renderings, etc., use Stable Diffusion. If you have a few hours to spare and a good computer, it's the best way to integrate your style into your creations.
If you'd like to learn more, a recording of the talk I gave on the subject is available here.
To test online: https://clipdrop.co/stable-diffusion
The field of image generation, like other uses of AI, is evolving extremely fast. Technological leaps are rapid and impressive, and their capabilities grow with each new version.
I encourage you to try things out for yourself, learn, fail, try again, and above all, follow the news around these models!
Consultant en architecture et développement chez Reboot, Théo accompagne les clients sur des problématiques techniques variées. Contributeur actif au sein de l'équipe, il participe à la production de contenus et à l'animation d'événements internes autour des nouvelles technologies.
LinkedInGet our best articles every month.
Père Castor, raconte-moi N8N N8N (prononcez « n-huit-n » ou « nodemation » si vous voulez faire classe). C'est un outil qui permet de connecter vos...
ArticleL'intelligence artificielle s'est invitée dans le quotidien des marketeurs à une vitesse record. En quelques mois, des outils comme ChatGPT,...
ArticleLe risque ? Créer une \"illusion de compétence\" tout en laissant les véritables lacunes stratégiques se creuser. La solution est pourtant simple et...
ArticleÀ lire avec la voix de Stallone : « plus de puces, plus de data, plus de milliards, le maître du monde ». Je viens de regarder le dernier numéro du...
ArticleSoyons clairs : si vous dirigez une organisation de taille significative aujourd'hui, la complexité des données—leur volume, leur vitesse de...
ArticleOn parle ici d'une transformation fondamentale, un changement de paradigme comparable à l'arrivée d'Internet ou de l'électricité dans l'industrie....