ZebraLogic,一个全面的AI评估框架,利用基于约束满足问题(CSPs)的逻辑网格谜题,测试大型语言模型(LLMs)的逻辑推理性能,发现模型推理能力随谜题复杂度增加而显著下降,即便增加模型规模和运用诸如Best-of-N采样等方法也收效甚微,表明当前LLM架构在扩展逻辑推理方面存在固有局限,需要进一步研究更有效的推理框架和结构化逻辑建模方法
(太长不看版)最新研究通过ZebraLogic逻辑网格谜题评估框架,发现当前大语言模型(LLM)在复杂逻辑推理任务中存在明显的"复杂性诅咒"现象——随着问题复杂度的提升,模型准确率急剧下降。即使增大模型规模或增加推理计算量,这一根本性限制依然存在。研究特别指出,像o1这类能生成大量"隐藏推理token"的模型表现更优,但所有模型...
LOGIC: LLM-originated guidance for internal cognitive improvement of small language models in stance detectiondoi:10.7717/peerj-cs.2585Lee, WoojinLee, JaewookKim, HarksooPeerJ Computer Science
Logic-RL 的安装和使用方法 1、安装 创建conda 环境:conda create -n logic python=3.9 安装PyTorch:pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121 (注意:这需要 CUDA 12.1 支持) 安装其他依赖:pip3 install vllm==0.6.3 ray flash-attn --no-build-isolation ...
LLMs之DeepSeek r1:TinyZero的简介、特点、安装和使用方法、案例应用Logic-RL的简介、安装和使用方法、案例应用之详细攻略 目录 TinyZero的简介 TinyZero 项目是对 DeepSeek R1 Zero 在倒计时和乘法任务上的一个简洁、最小化且易于访问的复现。它基于 veRL 构建,通过强化学习,使 3B 基础大型语言模型自主地发展出...
Fully Unstructured Approaches: Embedding Business Logic in System Prompts Unstructured approaches, like those using raw large language models (LLMs), are the wild child of the AI world. They're incredibly flexible and can handle a wide range of queries. But here's the catch - they're about ...
Large Language Model API interface. Contribute to mutablelogic/go-llm development by creating an account on GitHub.
[NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation - logic-star-ai/swt-bench
Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learning (RL) in large reasoning models. To analyze reasoning dynamics, we use synthetic logic puzzles as training data due to their controllable complexity and straightforward answer verification. We make ...
Paper tables with annotated results for Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning