This week, I have been looking at four separate issues: layout/presentation, integration of TTS engines, images, and multi-word expressions.
Layout/presentation
Branislav, Cathy and I had a productive session where we fixed many small problems and made C-LARA look substantially nicer. This is described in more detail in an earlier post.
Integrating TTS engines
I’ve added functionality in Simple C-LARA so that the user can now specify a preferred TTS engine: this new control is shown in conjunction with the step where you create the multimodal text. I’ve also integrated the Eleven Labs TTS engine, so far just with a Romanian voice – Claudia said the Google one wasn’t so great. It’s trivial to add more voices if people have suggestions.
Images
Christèle, Ivana and I met up on Tuesday and discussed how to improve C-LARA’s ability to create images with DALL-E-3. Two things stand out: we need to be able to create a stylistically consistent set of images, and we need to be able to revise an image in accordance with new instructions.
I have been discussing with the AI, and it looks like we can fairly easily modify the existing image editing screen to accommodate this new functionality. The result will look something like this:
- When creating a project, you will have the added option of specifying that you will have a set of stylistically consistent images.
- If you have checked this option, the first line on the image editing screen will be to create the style image.
- The image editing screen will also be slightly different in that it will replace the file upload control on each line with a prompt input box. There will also be a checkbox specifying that you want to (re-)generate the image in question, using the instructions in the prompt.
- When regenerating, C-LARA will first use GPT-4V to look at the current image, allowing it to understand requests from the user for specific modifications.
- Christèle, Ivana and I are meeting again next Tuesday at 12.30 Slovakia = 20.00 Adelaide. I am hoping to have a basic implementation of the new scheme working by then.
Multi-Word Expressions
I have been discussing Multi-Word Expressions (MWEs) with Francis Bond, who makes a strong case for the idea that we need a separate annotation level for MWEs. In a following discussion with the AI, we think we have a fairly simple way to extend the current annotation code and add an initial version of this functionality.
If we can make progress on this idea, it could have a large effect on C-LARA’s usability. All our evaluations, e.g. the one we did for ALTA 2023, suggest that poor performance on MWEs is one of the most serious issues in the current version of the platform.
Next Zoom call
The next call will be at:
Thu May 2 2024, 18:00 Adelaide (= 08.30 Iceland = 09.30 Ireland/Faroe Islands = 10.30 Europe = 11.30 Israel = 12.00 Iran = 16.30 China = 18:30 Melbourne = 19.30 New Caledonia)
Leave a comment