The Creativity Byte logo

The Creativity Byte

Subscribe
Archives
August 22, 2024

Ocsai 1.6: An Easy Swap-in Update

Brief note today.

I updated the Ocsai 1.5 and best Ocsai 1 models to new versions, named ocsai-1.6 and ocsai1-4o. No technical differences from the earlier models, just a change of base model to GPT-4o-mini. The improvements are across the board, particularly for Arabic (+18), Chinese (+11), Spanish (+17), and Hebrew (+30 point jump!).

The changes

Alternate/Unusual Uses:

lang

Ocsai 1.5

Ocsai 1.6

diff

ara

0.273

0.450

+0.177

chi

0.543

0.654

+0.111

dut

0.726

0.797

+0.071

eng

0.736

0.764

+0.028

fre

0.722

0.779

+0.057

ger

0.754

0.814

+0.060

heb

0.463

0.764

+0.301

ita

0.602

0.683

+0.081

pol

0.672

0.735

+0.063

rus

0.614

0.723

+0.109

spa

0.603

0.771

+0.168

Other tasks, English:

task

Ocsai 1.5

Ocsai 1.6

diff

completion

0.860

0.889

+0.029

consequences

0.560

0.691

+0.131

instances

0.917

0.939

+0.022

metaphors

0.704

0.750

+0.046

And for the English-focused model in the original Ocsai 1 format, ocsai1-4o, the new performance is r=0.812, versus r=0.781 and r=0.777 for its immediate predecessors (ocsai-davinci2 and ocsai-chatgpt). These benchmarks are on a model with withheld data; the version available on OCS doesn't withhold data, so it will perform slightly better (but immeasurably so!).

I understand the frequent model updates may cause whiplash. The intent with OCS is to provide the best possible tools for automated scoring, which is why they're updated so much. To lend some clarity, the new updates are accompanied in the web interface with a 'Recommended' Tag or a 'Deprecated' classification (i.e. there's a new, better model, but the old one is still available for replicability).

The updates are at https://openscoring.du.edu/scoringllm.

Don't miss what's next. Subscribe to The Creativity Byte:
GitHub Open Creativity Scoring
Powered by Buttondown, the easiest way to start and grow your newsletter.