Ocsai is faster! And model updates
Ocsai is faster!
Ocsai now does asynchronous scoring, which significantly speeds up scoring in most cases, and should greatly reduce timeouts when trying to score large files. You should see about a tenfold speed increase, from about 1.8 responses/second to 19 responses per second.
Try it here: https://openscoring.du.edu/scoringllm. This change accompanies a bigger behind the scenes refactor in anticipation of some upcoming features, so please let me know if you run into any peculiarities.
On a technical note, I was surprised how well Python has improved in its handling of asynchronous code and type hinting. During my past year of sabbatical I’ve been increasingly defaulting to TypeScript because of those features, and thought OCS would have to go that direction eventually, but it seems Python is catching up to our needs.
Additional Models
The models on Open Creativity Scoring have been updated slightly to align with what I use in practice. There are two new models ocsai-chatgpt2
and ocsai-davinci3
. They have two differences from their antecedents.
First, they’re trained on more data. It is still the data from Beyond semantic distance: Automated scoring of divergent thinking greatly improves with large language models, but includes in it's training the 15% of the data that had been for testing in the paper. Withheld testing data is necessary for evaluation in research, but for practice, you likely want the best possible model trained on the most data.
Second, the models support a few more tasks: in addition to uses, they support consequences, instances, and complete the sentence. These were reported on in Measuring original thinking in elementary school: Development and validation of a computational psychometric approach (Acar et al 2024).
Both models are trained on English-language responses. Next week, I’ll share evaluations of Ocsai 1.5, which is both multi-lingual and multi-test.
More Data
On the topic of model training, I posted exact training data from the Beyond Semantic Distance paper. Since the training is all on public data, we had previously provided a script that pulls the data from it’s various locations - wanting to make it more clear that the data is coming from a variety of other studies. It really is easier to just have the final file available that running a script, though. Just remember, if re-using it, cite the various datasets within!
Share your Papers
I've been seeing a good deal of interesting work in creativity measurement recently, including a slew of automated scoring studies in non-English contexts. Geocke et al. used XLM-RoBERTa for scoring scientific creativity in German, while Yang et al. evaluated newer semantic embedding models for scoring in Chinese. In Polish, Zielińska et al found that OCS - the pre-1.5, English focused models - worked well with a cleverly direct approach: pre-translating them to English Raw responses in Polish also worked.
It's harder to keep on top of work after the collapse of Twitter. If you have anything new, I'd love to include it in this newsletter. Send me an email (peter.organisciak@du.edu).
-- Peter