We have made good progress on both adaptation of the reinforcement learning/Chain of Thought idea to C-LARA and support for non-AI languages. A version of the Palgrave Encyclopaedia article has been published as a ResearchGate preprint.
Priority list
Reinforcement learning and Chain of Thought for MWEs. We have a first version of the core framework in place. The top level function takes as input a) a list of pairs combining a piece of text and an annotation showing the MWEs it contains, b) a pool of existing examples of correct CoT MWE analyses, c) a specification of the similarity metric to use, either POS-tag based or embeddings-based. For each pair, we pick the items from the pool that are closest to the text and use them as the few-shot examples when we request a new CoT analysis. We then compare the MWEs produced with the ones in the annotations.
This gives us the functionality for the basic loop, where we use the existing pool of examples to generate new ones which we add to the pool.
Support for non-AI languages. I have implemented a first version of the new functionality for non-AI languages that Sophie and I agreed on. We have the following:
- In the Edit Images screen, each page shows the different text versions for that page under the image, in editable form.
- When the user tries to save changes to the text versions, C-LARA checks for syntax errors like missing hashtags. If it finds any, it calls the AI to correct them. It also checks for misalignments with the segmented version, which is taken as primary, and if necessary corrects those too.
- In ‘coherent images’ mode, the different steps in creating the images are presented in the order in which they need to be carried out, so that the user cannot inadvertently skip a step. So first the user has to create the style image, then the description variables, and then the image request sequence.
Sophie and I are meeting up tomorrow for another review session. My guess is that we aren’t quite there yet, but I think we’re close. When we’re finished, it should be a great deal easier to use C-LARA for Indigenous languages.
Encyclopaedia article
The ResearchGate version of what was the Palgrave Encyclopaedia article has been published here. As of Wednesday Sep 11, we have 36 reads.
Next Zoom call
The next call will be at:
Thu Sep 12 2024, 18:00 Adelaide (= 08.30 Iceland = 09.30 Ireland/Faroe Islands = 10.30 Europe = 11.30 Israel = 12.00 Iran = 16.30 China = 18:30 Melbourne = 19.30 New Caledonia)
Leave a comment