C-LARA

An AI collaborates with humans to build a language learning app.


Weekly summary, Oct 24-30 2024

There has been more progress on the new image generation functionality. We submitted an abstract to the 2025 edition of the Conference On Oceanic Languages.

Improved image generation

I have implemented the idea for improved checking of images that I outlined in last week’s update. When the AI is asked to check whether an image agrees with the DALL-E-3 prompt used to generate it, we now have the option of using the following strategy:

  • Invoke gpt-4o to look at the image and determine which of the known elements occur in it.
  • Invoke gpt-4o to look at the image and create a description of each element found, plus the style.
  • Invoke gpt-4o to look at the prompt and the list of elements present, and create a list of pairs of elements whose relationships may be important for determining the correctness of the image.
  • For each pair in found in the preceding step, invoke gpt-4o to look at the image and produce a description of the relationship between the two elements.
  • Take the combined output of the preceding stages and compare it against the prompt to estimate how well they agree.

Another piece of new functionality I’ve added is the option of giving the AI advice for writing the prompts used to create the elements and the page images.

The new scheme seems to be a clear improvement. Here’s yet another example from the Crow and the Fox, where it gets everything right. I had previously advised the AI that the cheese should be a small wedge that the Crow can hold in irs beak, rather than a whole wheel, and we’ve reached the bit where the Crow opens his beak and drops the cheese. The image produced is this:

and we get this description:

**Elements found in image**
The following elements were found in the image: ['Maître corbeau', 'Maître renard', "l'arbre", 'le fromage']

The image seems to capture the key elements from the text "Le Corbeau et le Renard" by Jean de La Fontaine. Here's a detailed description of each element and the overall style:

### Elements:

1. **Maître corbeau (The Raven)**:
- Perched on a branch of a large, gnarled tree.
- The raven is shown with its beak open, as if making a sound or dropping something.
- The plumage is detailed, in shades of black, reflecting some natural texture.

2. **Maître renard (The Fox)**:
- Positioned on the ground looking up at the raven.
- The fox is portrayed with reddish-brown fur, a white belly, and a bushy tail.

3. **L'arbre (The Tree)**:
- A prominent, old tree with twisted branches, occupying a central position in the image.
- The background includes additional trees, suggesting a forest setting.

4. **Le fromage (The Cheese)**:
- Illustrated as a large wedge of Swiss cheese with several holes.
- It appears to be falling from the beak of the raven.

### Overall Style:

- **Artistic Approach**:
- The artwork has a classic, almost storybook quality, with attention to detail in the depiction of the animals and the tree.
- The color palette uses more muted, natural tones suitable for a woodland scene.

- **Typography**:
- Text appears near the raven’s beak, suggesting the action of it speaking or the cheese falling.
- The text style is integrated into the illustration, matching the overall aesthetic.

- **Mood and Tone**:
- The scene conveys a sense of narrative action, capturing a pivotal moment from the fable.
- The illustration resembles traditional illustrations for fables or children’s tales, emphasizing storytelling through visual elements.

This description focuses on what is visually present rather than expected from the text.

**Descriptions of relationships between some pairs of elements**

- Relationship between 'Maître corbeau' and 'l'arbre':
In the image, Maître corbeau is positioned on a sturdy branch of the tree, slightly elevated above the ground. The tree has a gnarled, rough texture with leaves visible around the branches. The corbeau is depicted with its beak open, suggesting it was singing or speaking. A piece of cheese is illustrated in mid-air near the corbeau, indicating that it has just dropped the cheese it held in its beak. The tree’s branches create a natural perch for the corbeau, supporting its weight and forming a backdrop for the scene. The interaction between Maître corbeau and the tree highlights the bird's reliance on the tree for a vantage point.

- Relationship between 'Maître corbeau' and 'le fromage':
In the image, Maître corbeau (the crow) is perched on a branch of a tree. There is a piece of cheese shown floating in front of it, rather than being held in its beak. The crow has its beak open, as if making a sound or singing. This suggests a relationship where the cheese is separate from the crow, indicating the moment when the cheese might have been released from its grasp. The scene includes Maître renard (the fox) sitting on the ground, observing the crow and the cheese.

- Relationship between 'Maître renard' and 'le fromage':
In the image, Maître renard is depicted sitting on the ground, looking up attentively towards the cheese, which appears to be mid-air, having fallen from the grasp of Maître corbeau. The fox is positioned beneath the tree branch where the crow sits. This suggests that the fox’s gaze and attention are directed towards the cheese, indicating its role as the eager observer ready to seize the opportunity. The relationship between Maître renard and the cheese is one of anticipation and opportunism.

The AI has done very well here. But often, we still see mistakes: it goes too far in the direction of relying on the text when interpreting the image, or it gets a correct description but fails to draw the right inferences when comparing it against the prompt. I will carry on tuning the process, but it’s possible that we need to wait for the next model to get satisfactory performance. Hopefully multimodal o1 will be out before too long, and it may well be good enough.

Conference submission

We submitted an extended abstract to the Conference On Oceanic Languages, describing the C-LARA content we have developed for Kanak languages. We listed the AI as first author, and the organisers didn’t complain. We will know if it’s been accepted a month from now.

Next Zoom call

The next call will be at:

Thu Oct 24 2024, 20:00 Adelaide (= 08.30 Iceland = 09.30 Ireland/Faroe Islands = 10.30 Europe = 11.30 Israel = 12.00 Iran = 16.30 China = 18:30 Melbourne = 19.30 New Caledonia)



Leave a comment