This week, I’ve been concentrating on rewriting the image generation functionality. Support for non-AI languages now appears to be working quite well. The French Embassy have been helpful about the funding for the New Caledonian projects.
Improved image generation
The two central functionalities in C-LARA are text annotation and image generation. Following improvement in MWE annotation and the release of OpenAI’s new o1 models, I’m starting to think we’re almost there on text annotation. Image generation is however still far from satisfactory. If you’re creating a picture book where the images need to be coherent, it’s often quite laborious to get a good result: you need to edit the prompts and regenerate the images many times.
Following discussions with o1-preview, we are doing a major rewrite of image generation to address these issues. The idea is to involve the AI in the process of reviewing the images, generating multiple options and letting it evaluate them. This will use the following basic pattern:
- o1-mini or gpt-40 expands the initial text into several possible versions of a detailed specification.
- Each specification is passed to DALL-E-3 several times, to create multiple versions of the image.
- Each generated image is passed to gpt-4o to create a description.
- gpt-4o compares the description of the image with the specification used to create it, and assesses how well they match. We will also make it possible for human users to make the selection manually.
- The best-matching image is used.
Variants of this pattern will be used for all stages of image generation: creating the overall style, creating descriptions of recurring elements (characters, objects, locations), and creating the images themselves. Generation of multiple options is performed in parallel to keep things efficient.
As you can see, this is a nontrivial piece of implementation, and it’ll probably be a couple of weeks before we have a full version working end-to-end. We have parts of it in place, and so far it looks good. I’ll post updates as the work progresses.
Support for non-AI languages
Sophie says the new support for non-AI languages is now working correctly for her. More testing would of course be desirable.
New Caledonian projects
We had some funding left over from the previous New Caledonian project; given the current situation, it’s not feasible to use it for travel, which was the initial intention. Christèle contacted the French Embassy to ask if we could use it for OpenAI credits in the new project, and we were very pleased when they immediately agreed.
Next Zoom call
The next call will be at:
Thu Oct 3 2024, 18:00 Adelaide (= 08.30 Iceland = 09.30 Ireland/Faroe Islands = 10.30 Europe = 11.30 Israel = 12.00 Iran = 16.30 China = 18:30 Melbourne = 19.30 New Caledonia)
Leave a comment