C-LARA

An AI collaborates with humans to build a language learning app.


Making picture books in C-LARA (part 1)

We now have a first version of the “coherent image set” functionality installed on the C-LARA server. If you tick the box “Use coherent AI-generated image set” when you create an Advanced C-LARA project, the Add/Remove Images tab will give you support for creating a thematically related set of images. You first provide a prompt to create an image which will define the visual style for the text’s illustrations. C-LARA creates the image, then analyses it to obtain a detailed description of the style. It then adds this to the prompts used for all the other images.

You can see the first text generated using this method here; it’s another nice-people-get-together-to-save-the-world effort by ChatGPT-4. Both the story, and the prompts used to create the images, were generated by the AI. It evidently isn’t very good. The style, which is required to be manga-inspired, was set by this initial image:

If we look at a couple of images from the actual text, say the ones from pages 4 and 5 below:

Page 4: One day, while wandering deeper in into the forest than even before, she discovered a dragon who had lost his way.

Page 5: The dragon, named Fern, was not like the fearsome dragons of old tales. He was gentle and kind, with emerald green scales that shimmered in the sunlight.

we see that AI isn’t doing such a bad job of keeping the style uniform. But it’s immediately apparent that the style isn’t enough. We also need to keep the appearances of the characters uniform: Princess Elana and Fern the dragon look completely different in the two images.

So we need a way to do that too, and there’s an obvious approach: as we did with the style, we ask GPT-4o to look at one image and create a detailed description, then we include that description in the prompts used to create the other images. I’m going to talk with the AI now about how we might implement this scheme. Hopefully we’ll have a new version in a couple of days which is substantially better.

Interested to hear comments and suggestions!



3 responses to “Making picture books in C-LARA (part 1)”

  1. Exciting progress!

    Like

  2. I am talking with the AI now, and we already have a promising scheme for supporting continuity in content as well as style. The idea is straightforward: instead of just creating a sequence of image generation requests, the AI instead creates an interleaved sequence of requests for both image generation and image understanding. The results of image understanding requests are stored for inclusion in later image generation requests.

    So for example, after the request where it first generates an image of the princess, it immediately follows up with an image understanding request something like

    “Look at this image, which depicts a princess standing in front of a castle, and provide a description of the princess. This description will be used to generate other images, so make it as detailed as possible.”

    and then stores the output under the name “Elara-description”. Then the next image generation request will be something like

    “An image of Princess Elara exploring the forest, surrounded by tall trees, colorful flowers, and playful animals like rabbits and birds. Princess Elara will be as described here: {Elara-description}.”

    where {Elara-description} means that the text stored under “Elara-description” is substituted in.

    The new C-LARA code should be easy to write: the question is whether the AI can in practice generate the sequence of generation and understanding requests. For this simple story, it immediately came up with a plausible answer.

    It’ll be really interesting to see how this idea works in practice.

    Like

  3. […] Progress on images/picture books. […]

    Like

Leave a reply to cathyc Cancel reply