Skip to content

Releases: callstackincubator/evals

2 new categories and 4 new models

17 Apr 14:22

Choose a tag to compare

We've added 2 new categories (lists, react-native-apis) and 4 new models (claude-opus-4.7, composer-2-fast, gemma-4-31B-it, minimax-m2.7)

Composer-2 benchmark & full generated data

24 Mar 20:30

Choose a tag to compare

We've added Composer 2 model from Cursor to the benchmark and published all model benchmark artifacts