A Tour of Generative Image AIs

AI is used for many things across many fields. Among the use cases, some have quickly won over the general public: text-generating AIs (ChatGPT, Bard...) and image-generating AIs. This interest can probably be attributed to their ease of access, their "sandbox" appeal (who hasn't sent silly prompts to ChatGPT just to see what it would answer?), and most importantly, their usefulness to a wide range of people, from professionals to curious individuals. If you've read the title, you'll have figured out that this article is about the second category: AIs that generate images.

Use Cases

Before giving you a (non-exhaustive) list of the main players in the field, we first need to explain what the average person can do with them. The primary feature of these AIs is, of course, generating an image on command. Where it gets interesting is that there are many ways to influence the generated image.

From a Prompt

This is the simplest and most common use case. The idea is to provide text to the AI (a "prompt"), which it will use to generate an image. A quick example:

Image generated with the text "A studio photo of a rainbow coloured cat"

Although some AIs are starting to better interpret other languages, English is the way to go. It's also worth keeping in mind that the AI doesn't "understand" what it generates. It can therefore be helpful to guide it by describing a scene in more detail if the first attempts are unsuccessful. For the same reason, it's recommended to guide the AI regarding the style to give the image. Words like "photorealistic" help achieve a more "photographic" look, or "painting of..." for a painting, etc.

From Another Image and a Prompt

The principle remains very similar, but here the idea is to provide a "starting" image to the AI for it to reinterpret or modify. This can be a photo, a painting, or a simple drawing. Example:

The image on the left was generated with the image on the right and the text "painting of an angel, gold hair, wearing laurels, wings, bathed in diving light, head halo, christian art, goddess, art nouveau, tarot card, rococo"

Of course, these generations will require a few attempts, tweaking the prompt and parameters before you're satisfied.

Modifying Part of an Image: Inpainting

The goal is to modify an existing image, but not entirely: you start by "painting" over the part of the original image you want to rework. Then you provide a prompt defining what the final image should look like once the painted area is regenerated:

The man on the right was generated from the man on the right whose mouth and top of the head were painted over, with the prompt "a happy red-bearded guy, wearing a hat with flowers in it"

Here, only the mouth and the top of the head were modified. This allows you to touch up existing images, but also to correct errors in an image generated from simple text. You start with text-based generation, then when the image is satisfactory, you modify the areas that need improvement via inpainting.

Extending the Frame of an Image: Outpainting

A direct extension of inpainting, outpainting allows you to ask the AI to imagine what lies beyond the frame of the original image.

In the center, you can recognize Vermeer's original painting, Girl with a Pearl Earring. The AI then generated the room around her.

More Advanced Use Cases

Since the goal isn't to present one by one the (too) many incredible capabilities of these AIs, here is a list of more advanced processes that won't be shown in detail:

Generating a video from a prompt: by generating successive images that take into account the context of the previous image, it's possible to generate videos on demand.
Generating a video from another video: by mirroring (for example) a person's movements in a video, you can generate a character moving in the same way.
Modifying a character's pose: leave the original character unchanged, but make them sit, run, etc.
...

Where Can You Try These AIs?

DALL-E 2

DALL-E was the first AI performant enough to create usable images and accessible to enough people to go viral. While being first doesn't necessarily mean being the best, DALL-E 2 remains a very important player in the field (the upcoming release of DALL-E 3 is extremely promising).

The main thing is that the interface is ridiculously simple to use. Type what you want to see, click "Generate" and within seconds, you'll have four AI-generated variations to choose from.

Unfortunately, trying DALL-E 2 isn't free, but at $15 for 115 credits, that's about $0.13 per message or $0.0325 per image variation.

To try it: https://labs.openai.com

MidJourney

It's the first AI to have won an art competition (without the judges knowing, of course). It's currently my favorite in terms of the quality of generated images, scene consistency, and more.

Unfortunately, Midjourney suffers from a major drawback: the beta version is only accessible via Discord. Once you've joined Midjourney's Discord server or invited the Midjourney bot to a server you control, you can send a message in the chat channel in the form /imagine. Four variations of your message are then generated, which you can subsequently download, upscale, re-edit, etc.

By default, every image generated is displayed publicly on the Midjourney Discord. This gives the whole thing an interesting community aspect, but it also means that anyone interested can see what you create. While this isn't necessarily a problem for artists, it can be an obstacle if you're looking to use Midjourney for professional purposes.

Midjourney's free trials are currently suspended due to the overwhelming number of people trying to use it, but they are sometimes reinstated for a few days. If you miss a free trial window, the basic plan starts at $10/month and includes 3.3 hours of GPU time per month, or about 200 images. You also have the option to purchase additional GPU time, and you can use your images for commercial purposes.

To try it: https://docs.midjourney.com/docs/quick-start

Stable Diffusion

Unlike DALL-E 2 and Midjourney, Stable Diffusion is an OpenSource solution. This means that anyone with the required technical skills can download it and run it locally on their own computer.

This also means it's possible to train and fine-tune the model for specific purposes. For this reason, almost all services that use AI to generate artistic portraits, historical portraits, architectural renderings, etc., use Stable Diffusion. If you have a few hours to spare and a good computer, it's the best way to integrate your style into your creations.

If you'd like to learn more, a recording of the talk I gave on the subject is available here.

To test online: https://clipdrop.co/stable-diffusion

Final Thoughts

The field of image generation, like other uses of AI, is evolving extremely fast. Technological leaps are rapid and impressive, and their capabilities grow with each new version.

I encourage you to try things out for yourself, learn, fail, try again, and above all, follow the news around these models!

Use Cases

From a Prompt

This is the simplest and most common use case. The idea is to provide text to the AI (a "prompt"), which it will use to generate an image. A quick example:

Image generated with the text "A studio photo of a rainbow coloured cat"

From Another Image and a Prompt

The principle remains very similar, but here the idea is to provide a "starting" image to the AI for it to reinterpret or modify. This can be a photo, a painting, or a simple drawing. Example:

Of course, these generations will require a few attempts, tweaking the prompt and parameters before you're satisfied.

Modifying Part of an Image: Inpainting

The man on the right was generated from the man on the right whose mouth and top of the head were painted over, with the prompt "a happy red-bearded guy, wearing a hat with flowers in it"

Extending the Frame of an Image: Outpainting

A direct extension of inpainting, outpainting allows you to ask the AI to imagine what lies beyond the frame of the original image.

In the center, you can recognize Vermeer's original painting, Girl with a Pearl Earring. The AI then generated the room around her.

More Advanced Use Cases

Since the goal isn't to present one by one the (too) many incredible capabilities of these AIs, here is a list of more advanced processes that won't be shown in detail:

Generating a video from a prompt: by generating successive images that take into account the context of the previous image, it's possible to generate videos on demand.
Generating a video from another video: by mirroring (for example) a person's movements in a video, you can generate a character moving in the same way.
Modifying a character's pose: leave the original character unchanged, but make them sit, run, etc.
...

Where Can You Try These AIs?

DALL-E 2

The main thing is that the interface is ridiculously simple to use. Type what you want to see, click "Generate" and within seconds, you'll have four AI-generated variations to choose from.

Unfortunately, trying DALL-E 2 isn't free, but at $15 for 115 credits, that's about $0.13 per message or $0.0325 per image variation.

To try it: https://labs.openai.com

MidJourney

It's the first AI to have won an art competition (without the judges knowing, of course). It's currently my favorite in terms of the quality of generated images, scene consistency, and more.

To try it: https://docs.midjourney.com/docs/quick-start

Stable Diffusion

Unlike DALL-E 2 and Midjourney, Stable Diffusion is an OpenSource solution. This means that anyone with the required technical skills can download it and run it locally on their own computer.

If you'd like to learn more, a recording of the talk I gave on the subject is available here.

To test online: https://clipdrop.co/stable-diffusion

Final Thoughts

The field of image generation, like other uses of AI, is evolving extremely fast. Technological leaps are rapid and impressive, and their capabilities grow with each new version.

I encourage you to try things out for yourself, learn, fail, try again, and above all, follow the news around these models!

A Tour of Generative Image AIs

Use Cases

From a Prompt

From Another Image and a Prompt

Modifying Part of an Image: Inpainting

Extending the Frame of an Image: Outpainting

More Advanced Use Cases

Where Can You Try These AIs?

DALL-E 2

MidJourney

Stable Diffusion

Final Thoughts

Similar articles

N8N, What's That All About?

How AI Is Revolutionizing Marketing (Without Replacing You)

AI Training Needs Assessment Framework: A Guide for HR Directors and Managers

Newsletter

Go further

N8N, c'est quoi ce truc ?

Comment l'IA révolutionne le marketing (sans vous remplacer)

Grille d'évaluation des besoins de formation IA : Le guide pour DRH et Managers

La France a-t-elle déjà perdu la bataille de l'IA ?

L'IA : l'Impératif de Gouvernance Ultime

Vision 2026 : L'IA est un Mandat de Leadership, Pas un Ticket IT

A Tour of Generative Image AIs

Use Cases

From a Prompt

From Another Image and a Prompt

Modifying Part of an Image: Inpainting

Extending the Frame of an Image: Outpainting

More Advanced Use Cases

Where Can You Try These AIs?

DALL-E 2

MidJourney

Stable Diffusion

Final Thoughts

Similar articles

N8N, What's That All About?

How AI Is Revolutionizing Marketing (Without Replacing You)

AI Training Needs Assessment Framework: A Guide for HR Directors and Managers

Newsletter

Go further

N8N, c'est quoi ce truc ?

Comment l'IA révolutionne le marketing (sans vous remplacer)

Grille d'évaluation des besoins de formation IA : Le guide pour DRH et Managers

La France a-t-elle déjà perdu la bataille de l'IA ?

L'IA : l'Impératif de Gouvernance Ultime

Vision 2026 : L'IA est un Mandat de Leadership, Pas un Ticket IT