C-LARA

An AI collaborates with humans to build a language learning app.


Making picture books in C-LARA (part 4)

I addressed the issues described in the previous post and have just put up a revised version of Secret Agent Samantha. It’s much better than the first take, though there’s still considerable room left for improvement.

Images

I cloned the project, deleted the images in the new version, and regenerated both the style image and the request sequence. The style image now looks like this:

The initial image of Samatha on the cover is created by this AI-generated request, which I left unedited,

An image that serves as the book cover titled ‘Secret Agent Samantha Loves Set Expressions’. The cover features a bold and confident secret agent named Samantha, dressed in sleek, tactical clothing with a determined look on her face. The background should convey a sense of intrigue and intensity, suitable for a spy thriller.

and looks like this:

The description of Samantha is generated by the following request, which I edited to mention age, ethnicity and other specific characteristics:

Look at the cover image, which depicts a secret agent named Samantha, and provide a detailed description of her appearance. Include in particular apparent age, apparent ethnicity, face, hair, build and outfit. This description will be used when generating other images, so make it as detailed as possible to maximize continuity.

We now get the following AI-generated description, which I left unedited:

The cover image showcases Samantha, a secret agent, with an appearance that is both commanding and graceful. To maximize continuity for generating other images of her character, the following detailed description is provided:

Apparent Age:

  • Samantha appears to be in her early 30s, suggesting she is experienced yet youthful and full of vitality.

Apparent Ethnicity:

  • She has a medium skin tone and features that could be interpreted as ethnically ambiguous, potentially reflecting a diverse background.

Face:

  • Samantha’s face is striking and symmetrical with high cheekbones and a sharp jawline.
  • She has pronounced, expressive dark eyebrows.
  • Her eyes are large and intense, likely depicted with dark eyeliner to emphasize their sharpness.
  • She sports a bold lipstick, in a dark red or plum shade, underlining her confidence and sophistication.

Hair:

  • Her hair is a lustrous black, sleek, and straight, likely falling to her shoulders or slightly beyond.
  • It has a neat, slightly voluminous style, indicating it’s well-maintained even in high-pressure situations.

Build:

  • Samantha has a fit and athletic build, suggesting she is highly capable in physical confrontations and athletic pursuits.
  • Her posture is upright and commanding, exuding confidence and readiness.

Outfit:

  • She wears a form-fitting black tactical suit, made from a material that provides both flexibility and protection.
  • The suit features multiple utility pouches and holsters strategically placed for easy access to gear and weapons.
  • She’s equipped with a combat belt, adding to the utilitarian aesthetic.
  • There is visible armor plating on her shoulders, and possibly at other critical points to offer extra protection.
  • Samantha is shown holding a firearm, a sleek black pistol, indicating her proficiency with weapons.
  • Accessories such as thigh holsters, and small gadgets, possibly including communication devices or other spy gear, are also part of her ensemble.

This detailed description should help ensure consistency when generating further images of Samantha, emphasizing her role as a highly competent and professional secret agent.

Continuity is still far from perfect, but much better. DALL-E-3 has accurately followed instructions about age, ethnicity and build, and Samantha’s face looks roughly the same in the different images. There is still, however, a mix of comic-book and photorealistic styles:

I am guessing that we can get better results if we edit the style description more carefully, though this remains to be seen.

Multi-Word Expressions

I implemented the idea I suggested in the previous post, so that glosses can be created using the lemma-tagged version as input. (This functionality is not yet installed on the server, still testing). Again, the result seems clearly better than the previous version, though there are still obvious mistakes. We can probably improve things by further tuning the annotation prompt and including a post-processing check to make sure that all components of an MWE have received the same gloss.

I also did a very simple example where I tried glossing in Ukrainian. My impression is that the performance gain we get from using the new gloss annotation method is larger with Ukrainian than with French, but so far this is just a guess. When I’ve got things stable, I will install the new functionality on the server and run some more substantial examples with Ukrainian glossing.

All in all, this seems to be developing well. I’m starting to feel quite optimistic.



3 responses to “Making picture books in C-LARA (part 4)”

  1. Cathy is disgusted by these stereotypically glamorous bimbos and suggests “Sumo-Wrestler Tanya Loves Polite Forms of Address”. Coming next.

    Like

  2. I created “Sumo-Wrestler Tanya Loves Polite Forms of Address”

    Like

  3. … you’ll find it here. But the images are not very convincing. I think DALL-E-3 is finding it hard to draw something as unusual as a female sumo-wrestler.

    Liked by 1 person

Leave a reply to mannyrayner Cancel reply