Finally, we annotate the data with crucial information such as answer types and subfields, yielding a dataset that is clean, accurate, and detailed. OlympiadBench includes open-ended questions and proof problems. For the open-ended questions, we standardize the answer format and develop an ...
Recent advancements in large language models (LLMs) have led to significant breakthroughs in mathematical reasoning capabilities. However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1 achieves 94.8\% on MATH dataset), indicating their inadequacy...
The downloaded dataset contains two folders,dataandimages. Thedatacontains the categorized data. For example, OE_MM_physics_en_COMP.json, TP_TO_maths_zh_CEE.json. * OE: Open-ended questions * TP: Theorem proof problems * MM: Multimodal * TO: Text-only * physics: Physics problems * maths...