diff --git a/README.md b/README.md index 3989f5c9..5384fa4c 100644 --- a/README.md +++ b/README.md @@ -99,7 +99,16 @@ Here is post-training result which **over 50% SFT data** comes from GraphGen and | Math | AIME24 | **20.6** | 16.7 | | | AIME25 | **22.7** | 7.2 | +### RLVR +We applied reinforcement learning directly to the Qwen2.5-7B base model without any prior SFT. Here are the results. +| Domain | Dataset | Ours | Qwen2.5-7B-Instruct (baseline) | +|:---------:|:---------------------------------------------------------:|:--------:|:------------------------------:| +| Plant | [SeedBench](https://github.com/open-sciencelab/SeedBench) | **66.8** | 51.5 | +| Law | LawBench | **55.2** | 54.76 | +| Medicine | MedQA | **87.1** | 80.7 | +| General | BBH | **55.3** | 49.6 | +More details can be found at [`examples/generate/generate_masked_fill_in_blank_qa`](./examples/generate/generate_masked_fill_in_blank_qa). ## ⚙️ Support List