The new image generation functionality is now working in standalone mode. We submitted an extended abstract to the ComputEL-8 meeting.
Improved image generation
The new image generation functionality is now working as standalone code. The basic idea is to create multiple text specifications, then multiple images from each specifications, then use the AI to find the specifications and images where the fits is best. In more detail, processing is as follows:
Set parameters
The user specifies the parameters: the number of specifications to produce and the number of images to produce from each specification, and the AI models to use for each operation.
Generate a detailed specification of the style
- The AI is instructed to expand a brief user-supplied description of the style into a detailed specification. This operation is carried out several times, producing multiple versions of the specification.
- Each specification is passed to DALL-E-3 several times, to create multiple versions of a resulting image.
- Each generated image is passed to image analysis, to produce a detailed description.
- The AI is instructed to compare the specification and the description, and rate the degree
of fit on a five-point scale. - The specification with the highest average degree of fit is chosen.
Generate detailed specification for the recurring elements.
Generate detailed specification for the recurring elements
- The AI is instructed to create a list of recurring elements (people, animals, objects, locations, etc) from the translated text.
- The AI is instructed to create multiple detailed specification of each element, incorporating the style specification from the previous stage.
- The best specification is chosen using the same method as for the style.
Generate images for each page of the text
- For each page, AI is instructed to create a list of relevant recurring elements and relevant previous pages.
- Taking as input the translation of the page text, the specifications of the the style, the relevant previous pages, and the relevant recurring elements, the AI is instructed to create multiple detailed specifications for the page.
- Multiple images are generated and automatically assessed for each specification, and the one with the best fit is selected.
At the end, a summary is compiled showing the prompts, the images and the evaluations.
Initial tests suggest that this improves on the earlier scheme, but we’re not there yet. In particular, we need to refine the way the style description is used and evaluated. We also need to integrate into C-LARA.
Submission to ComputEL-8
We submitted an extended abstract to ComputEL-8 on the “Building Tools Together” track. This outlined the content of the second project we are about to start with the New Caledonia group, where image generation is a major theme.
Next Zoom call
The next call will be at:
Thu Oct 17 2024, 19:00 Adelaide (= 08.30 Iceland = 09.30 Ireland/Faroe Islands = 10.30 Europe = 11.30 Israel = 12.00 Iran = 16.30 China = 18:30 Melbourne = 19.30 New Caledonia)
Leave a comment