Ocsai 1.6: An Easy Swap-in Update
Brief note today.
I updated the Ocsai 1.5 and best Ocsai 1 models to new versions, named ocsai-1.6
and ocsai1-4o
. No technical differences from the earlier models, just a change of base model to GPT-4o-mini. The improvements are across the board, particularly for Arabic (+18), Chinese (+11), Spanish (+17), and Hebrew (+30 point jump!).
The changes:
Alternate/Unusual Uses:
language | Ocsai 1.5 | Ocsai 1.6 | Change |
---|---|---|---|
ara | 0.273 | 0.450 | +0.177 |
chi | 0.543 | 0.654 | +0.111 |
dut | 0.726 | 0.797 | +0.071 |
eng | 0.736 | 0.764 | +0.028 |
fre | 0.722 | 0.779 | +0.057 |
ger | 0.754 | 0.814 | +0.060 |
heb | 0.463 | 0.764 | +0.301 |
ita | 0.602 | 0.683 | +0.081 |
pol | 0.672 | 0.735 | +0.063 |
rus | 0.614 | 0.723 | +0.109 |
spa | 0.603 | 0.771 | +0.168 |
Other tasks, English:
type | Ocsai 1.5 | Ocsai 1.6 | Change |
---|---|---|---|
completion | 0.860 | 0.889 | +0.029 |
consequences | 0.560 | 0.691 | +0.131 |
instances | 0.917 | 0.939 | +0.022 |
metaphors | 0.704 | 0.750 | +0.046 |
And for the English-focused model in the original Ocsai 1 format, ocsai1-4o
, the new performance is r=0.812
, versus r=0.781
and r=0.777
for its immediate predecessors (ocsai-davinci2 and ocsai-chatgpt). These benchmarks are on a model with withheld data; the version available on OCS doesn't withhold data, so it will perform slightly better (but immeasurably so!).
I understand the frequent model updates may cause whiplash. The intent with OCS is to provide the best possible tools for automated scoring, which is why they're updated so much. To lend some clarity, the new updates are accompanied in the web interface with a 'Recommended' Tag or a 'Deprecated' classification (i.e. there's a new, better model, but the old one is still available for replicability).
The updates are at https://openscoring.du.edu/scoringllm.