Rina and I were chatting on Zoom about what we should be doing next in C-LARA. We have so many interesting things going on, but development is currently rather haphazard and impulse-driven – a more systematic planning process would help.
Here are some thoughts arising from our talk, arranged under three headings: 1) short-term priorities, 2) medium-term priorities, and 3) how should we be discussing things?
Short-term priorities
The initial C-LARA progress report has already attracted over two thousand reads, and we should soon post another one. We’ve been saying that mid/late Feb would be a good target date. We need to decide what else we will do before writing it. Here’s a suggested list:
- “Simple C-LARA”. This is coming along very well, and I should be installing an initial version on the UniSA server tomorrow morning. I think it will make it much easier for non-experts to create C-LARA content.
- Social network. We have half a dozen or so useful-looking extensions currently under discussion. They are probably easy to implement, ChatGPT-4 is very good at this kind of thing.
- Better TTS. Easy to add. ElevenLabs looks like the one to use.
- GPT-4V. Integrating this is again easy, and would make it possible to create a C-LARA project starting with an image rather than a text prompt.
- Basm. The collaboration with Claudia’s Basm project is going well. It might help to add another integration endpoint or two, and it’s probably straightforward.
- “Reading histories”. It would be very nice to be able to link together multiple C-LARA texts into a single virtual document. We had a version of this in LARA, but it never worked well. We can do better in C-LARA. This is the least trivial of the ideas in this section, but I think it has a lot of potential.
Medium-term priorities
After we’ve released the second progress report, we have a large number of more challenging subprojects currently on the back-burner. Rina and I discussed. It would be very good to prioritise these and put together a sensible timeline. We may of course want to add other items, this certainly isn’t meant to be a definitive list.
- Importing LARA content into C-LARA. We have several person-years of good LARA content in the SourceForge repository, which we would like to make available through C-LARA. It’s not completely trivial to write an import function (I’m guessing 1-2 weeks of work), but the result would be a lot more good C-LARA content.
- Systematic archiving of C-LARA content. Conversely, we’re not currently archiving the material we’re developing in C-LARA. We should do that, it would already be rather tragic if we lost this stuff.
- Documentation/refactoring of codebase. The C-LARA codebase is now about 19K lines, and, as Rina said, only I understand it properly. Thanks to the AI’s good influence, it’s reasonably clean and well-structured, but these things can generally be improved a good deal if you put in a week or two of systematic tidying up.
- More social network extensions. After we’ve put in the current list of social network extensions, it would be surprising if more requests didn’t come up.
- Gamification/flashcards. This is again something that ChatGPT-4 will probably do well, and as we’ve said it’s something that should integrate well with the social network.
- Integration of Melbourne U student projects. I have not yet had time to do this. The “recording” and “image annotation” projects should be reasonably easy to do, and are also the most useful ones.
- More versions of Simple C-LARA. I am pretty sure we’ll want to do this. Likely candidates are a “Picturebook” version and a “Text from images” version.
- Improve phonetic texts. In particular, can we get GPT-4 to improve its own prompts for the phonetic lexicon entry construction task? This would be both useful and theoretically interesting.
- Organise user studies. Hopefully this will soon be happening. Once it does, it will no doubt throw up a bunch of new issues as we see how users react to the system.
- Journal papers. We should soon have material for at least one journal paper, maybe two or three.
- Organise C-LARA workshop at EUROCALL. Do we want to do this? I had not properly realised it would be in Slovakia this year.
How should we be discussing this?
We need to discuss in a systematic way. Options:
- Continue with Zoom meetings. But maybe Thursday isn’t the best day? I know it clashes with some people’s schedules.
- Email. This is always a possibility. Email discussions are often unstructured and chaotic, but much better than nothing.
- Discuss on C-LARA. This is an idea I’ve been wondering about for a while. Can we have an AI-moderated discussion on the C-LARA platform itself, probably inside one of the groups we’re soon planning to add? I’m not really sure yet how it would work, but if it did it would be amazing. And it doesn’t feel impossible.
Leave a comment