C-LARA

An AI collaborates with humans to build a language learning app.


Weekly summary, Feb 15-21 2024

We’re planning to release our second C-LARA progress report soon, and I have spent most of this week working on things relevant to that. In more detail:

General

We have a lot more text: at the beginning of the week there were about 15 pages in the Overleaf document, and now there are over 50. Even though plenty of things are still missing and the text is in many cases only preliminary, we are making good progress.

Two examples: “Merci Proust” and “Paul und Emma”

I created two examples, using multiple screenshots, to illustrate use of simple C-LARA and full C-LARA respectively. The simple C-LARA example, a poem in French about Proust, was quick to construct. There is no point using full C-LARA unless you’re going to do something at least moderately substantial, so “Paul und Emma” is a small picture-book in German. I posted about it here. Each example is in an appendix.

Code and functionality appendices

I also added appendices listing the top-level functionalities and the main code files. The code appendix is not quite finished yet.

Integrating the ALTA and ComputEL-7 papers

I added a preliminary integration of the ALTA paper as the section “GPT-4 as a software component”. Note that the big table now has a new set of columns for filling in the results of the repeated experiments.

I added a preliminary integration of part of the ComputEL-7 paper as the section “Phonetic texts”.

Integration with Basm

Claudia has written a section about integration with Basm. Thank you Claudia!

ChatGPT-4 as a software engineer

I have downloaded transcripts of the six long conversation threads I’ve had with ChatGPT-4 where the topic has been the C-LARA collaboration. This is a total of about 4.3K turns or 760K words. Chat and I started by writing a script to convert them into a structured JSON form, which was straightforward.

We’ve now moved on to the second and more ambitious phase, where we’re trying to use GPT-4 to annotate the turns. The central goal here is to get quantitative data on how work has been divided between the human and AI participants. The AI/human collaboration is clearly one of the things that people find most interesting about C-LARA, but in the first report we only had anecdotal results.

The basic infrastructure is working as of earlier this evening, and it’s possible to annotate a sequence of turns with summary and topic information. Based on the 50 turns annotated during the preliminary experiments, it takes about 10-20 seconds and 2-4 cents a turn. The summaries look good, but the topic tags are not yet right; we have to find a way to explain to GPT-4 that the topics should as far as possible be taken from ones attached to earlier turns, so that we can extract sequences of turns which belong to the same topic. There are plenty of things to try, and the AI is full of ideas. If we can make this work, the result should be interesting in all sorts of ways!

Next Zoom call

Thu Feb 22, 2023, 20:00 Adelaide (= 09.30 Iceland = 09.30 Ireland/Faroe Islands = 10.30 Europe = 11.30 Israel = 13.00 Iran = 17.30 China = 20:30 Melbourne/New Caledonia)



Leave a comment