We sort of took a week off to visit Queensland, here is a summary for the last two weeks:
ALTA
Alex presented our paper at ALTA 2023, it sounded like it went well! You can read it here.
Phonetic texts
I’ve been mostly focussing on this, there’s been a lot of progress. I need to test more – there are some annoying divergences between my laptop and the UniSA server – but I think the following are now all working in both environments:
- We can enable phonetic text capabilities in a new language by uploading resource files (plain phonetic lexicon and aligned phonetic lexicon).
- For languages with regular letter-to-sound rules, we can alternatively define a phonetic orthography where a set of letter groups are given together with their phonetic values.
- When resources are available, we can automatically convert a plain text into a phonetic text.
- In languages which GPT-4 knows enough about, it can guess missing phonetic lexicon entries reasonably well.
- Phonetic texts can be rendered into HTML.
- We can upload audio files corresponding to the phonetic letter groups and have them incorporated in the HTML in the usual way.
- Phonetic texts interact correctly with images.
- If we have both normal and phonetic versions of the text in the same project, links are added in the HTML to allow switching between the two versions.
You can see the joint normal/phonetic version of Jabberwocky here. Note that there are some inconsistencies! Though of course this is quite a challenging example.
I am currently working on the following:
- Support for “mostly-regular” languages, where it makes sense to combine the regular phonetic orthography with a list of exceptions. In particular, this is what we need for Drehu, the language Pauline is working on. It is a merge of the two scheme: when a word is in the exception list, we use the phonetic-lexicon/alignment method, and when it isn’t we use the phonetic orthography method. Most of the necessary functionality is now in place.
- A screen for reviewing and correcting guessed alignments and phonetic entries. This is needed in order to bootstrap the alignment method for a new language. Once the aligner has a few hundred corrected alignments, it becomes quite accurate. I think this should be fairly easy.
Efficiency improvements
While working on the phonetic texts, I found that we had a major performance bottleneck in the rendering code – we were doing the database calls in a stupid way. Fixing this has made rendering much faster.
Upcoming conference submissions
We have been talking about submitting an abstract to INTED (due tomorrow, but it will not take long to write if we decide to do it). We have also been considering the idea of a paper for ComputEL-7, due Dec 15.
Next Zoom call
Note to Southern Hemisphere people: two hours later than winter time, i.e. same time as last week.
Thu Dec 7, 2023, 20:00 Adelaide (= 09.30 Iceland = 09.30 Ireland/Faroe Islands = 10.30 Europe = 11.30 Israel = 13.00 Iran = 17.30 China = 20:30 Melbourne/New Caledonia)
Leave a comment