C-LARA

An AI collaborates with humans to build a language learning app.


Making picture books in C-LARA (part 3)

I made another educational C-LARA picture book this morning. People complained about how tasteless the supermodel story was, so I went for a toned-down version called “Secret Agent Samantha Loves Set Expressions”. The result, which you can see here, is basically a failure; we ran into problems with both the images and the multi-words. However, it’s an interesting failure that suggests fruitful directions to explore.

Images

I started off by creating a style image using this prompt:

A comic-book style image of Secret Agent Samantha on a mission. Samantha is glamorous, deadly and determined. She is portrayed in an ironic over-the-top way that to some extent subverts the clichés of this genre.

I got this image:

from which GPT-4o extracted this description:

The image employs a style reminiscent of classic comic book art from the mid-20th century. Key characteristics include:

  1. Pop Art Influence: The image features bold, clean lines and a limited but contrasting color palette dominated by black, white, and shades of blue and beige. This gives the image a distinctive, graphic quality.
  2. Halftone Patterns: There are evident halftone dots used to create shading and gradients, which is hallmark of traditional comic book printing.
  3. Retro Aesthetic: The overall style has a vintage feel, with design elements such as the woman’s hairstyle and makeup reflecting mid-century fashion. The clothing and accessories, like pearl necklaces, also contribute to this retro appearance.
  4. Framing and Composition: The layout is divided into separate panels, typical of comic book storytelling. Each panel contains different scenes or actions, enhancing the narrative flow.
  5. Typography: The text uses a bold, sans-serif font that mimics classic comic book dialogue boxes. Additionally, there are intentional typographical errors which contribute to the aesthetic and possibly the narrative.
  6. Dramatic Elements: Thematically, the image conveys a sense of drama and suspense, with scenes showing action such as airplanes in combat and intriguing elements like secret agents. These are consistent with the spy or noir genre often depicted in older comics.

To summarize, the image is characterized by its retro pop art style, bold lines, halftone shading, vintage fashion elements, comic book layout, and dramatic, action-oriented themes.

Here’s where I think I made my first mistake: I should have looked carefully at both the image and the description, decided they weren’t adequate, and tried again. In fact, I barely glanced at them and carried on. GPT-4 created image requests from the story. The first image it generated, for the title page, used this AI-generated prompt:

A cover image featuring Secret Agent Samantha, confident and poised, standing in a dynamic pose with various spy gadgets around her. The background is filled with expressive visuals such as coded messages and hidden compartments, setting the tone for a thrilling adventure.

and produced this result:

This was immediately followed by an AI-generated image understanding request:

Look at this image of Secret Agent Samantha and provide a detailed description of her appearance, including her outfit, facial expression, and the spy gadgets around her.

which gave this description:

In the image of Secret Agent Samantha, she is depicted as a classic, sophisticated spy. Here are the details of her appearance and the items around her:

Appearance:

  • Outfit: Samantha is wearing a sleek, tailored trench coat that exudes a sense of elegance and professionalism. The coat is buttoned up and belted at the waist, giving her a polished look. Around her neck, she has a necklace that adds a touch of femininity.
  • Facial Expression: Her expression is confident and composed, with a slight smile that conveys a calm and controlled demeanor. Her arched eyebrows and focused eyes suggest she is always alert and ready for action.
  • Hair: Samantha has voluminous, wavy hair that is neatly styled and falls past her shoulders, enhancing her glamorous appearance.

Spy Gadgets Around Her:

  • Background Gadgets: Surrounding Samantha are various spy gadgets and equipment depicted in a collage-like manner, emphasizing her role as a secret agent.
  • Clock and Map: A clock and a map of the world indicate time and global operations.
  • Surveillance Equipment: There is what appears to be a reel-to-reel tape recorder, possibly used for recording conversations or decoding messages.
  • Communication Devices: Various control panels and dials hint at communication and control systems.
  • Weapons and Tools: A schematic-like image of weapons and other tools likely used in her missions.
  • Travel Elements: Image of an aircraft and a suitcase, suggesting travel and mobility.

Overall, Samantha is portrayed as a stylish and formidable secret agent, surrounded by the essential tools of her trade, ready to take on any mission with grace and efficiency.

Again, I didn’t even look at what the AI had produced and just told it to continue. But when I saw what it had produced, it was clear that several things had gone wrong. For example, the image on page 2, made from the prompt

An image of Secret Agent Samantha in a dimly lit office with various mission files scattered around. She is seen slipping on a pair of leather gloves, preparing for her mission. Samantha will be as described here: {Samantha-description}.

looked like this:

while the image on page 3, made from the prompt

An action-packed image of Samantha sprinting through the early morning streets, eyes focused and determined, in pursuit of a shadowy figure. The background captures an urban setting with the sunrise in the horizon. Samantha will be as described here: {Samantha-description}.

looked like this:

An obvious problem is that Samantha is Black on page 2 and White on page 3, and indeed there is nothing in her description that might suggest what colour she is. Perhaps this omission is a result of training the AI to be sensitive about race issues, which is certainly the right thing to do in some contexts, but here we need to know. The style is not consistent either: on page 2 we have a photorealistic style, and on page 3 a comic-book style. Here, I’m a little surprised that the style description didn’t force DALL-E-3 towards the comic-book style, but maybe it wasn’t explicit enough. Finally, we don’t want the multi-panel background on page 3; it would surely be better to have a single panel showing Samantha sprinting through the street. But this is what we’ve got from the style image.

In summary, if I’d paid attention and tuned the style image and description a bit at the beginning, I think I’d have got a much better result. A very interesting question for later is whether the AI can itself do such tuning if we ask it to look critically at the results. But let’s not get ahead of ourselves.

Multi-Word Expressions

I found that we also had problems with the MWEs. Since the theme of the text is set expressions, there are many complex MWEs. For example, here’s the text on page 4:

The first thing on my agenda was to blend in. Easier said than done when you’re in a disguise! I had to be all ears and not let the cat out of the bag.

The Chain of Thought MWE annotation step did a good job here and flagged “blend in”, “easier said than done”, “all ears” and “let the cat out of the bag” as MWEs. This was correctly transferred to the lemma tagging. But the glossing went less well. For example, all the words in “let the cat out of the bag” are glossed in French as “dévoiler”, which is not a very helpful gloss. This pattern is common throughout the text: glossing is consistently worse than lemma tagging.

Here, the solution which suggests itself is to change the annotation pipeline so that the output from the lemma tagging step is used as the input to the glossing step. If the glossing step sees the component words in an MWE (here, “let”, “the”, “cat”, “out”, “of”, “the” and “bag”) already tagged with the MWE lemma (here, “let the cat out of the bag”), I’m expecting it to be rather easier for the AI to assign a good gloss. This is a fairly easy change to make.

Next steps

I will make the changes suggested above and see what happens when I create a second version of the story. More about that in a day or two!



One response to “Making picture books in C-LARA (part 3)”

  1. […] addressed the issues described in the previous post and have just put up a revised version of Secret Agent Samantha. It’s much better than the […]

    Like

Leave a comment