Agent 最新研究综述(2026-05-15)
本报告自动生成自 papers.cool/arxiv/cs.AI
筛选标准:AI Agent 系统相关论文
生成时间:2026/5/15 17:30:05
📊 今日概况
- 总论文数: 25 篇
- Agent 相关: 14 篇
方向分布
| 方向 | 论文数 |
|---|---|
| planning | 5 |
| other | 4 |
| memory | 1 |
| tool | 1 |
| safety | 1 |
| multi_agent | 2 |
| evaluation | 1 |
1️⃣ 今日 Agent 相关论文列表
PLANNING (5 篇)
1. OpenDeepThink: Parallel Reasoning via Bradley–Terry Aggregation
- arXiv ID: 2605.15177
- 研究方向: planning
- 核心要点:
- opendeepthink,bradley,terry,llm,codeforces,reasoning,candidates,hle,parallel,grandmaster
2. Dual-Dimensional Consistency: Balancing Budget and Quality in Adaptive Inference-Time Scaling
- arXiv ID: 2605.15100
- 研究方向: planning
- 核心要点:
- reasoning,quality,hallucinations,budget,consistency,consensus,pruning,ddc,dual,adaptive
3. Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use
- arXiv ID: 2605.15041
- 研究方向: planning, tool
- 核心要点:
- execution,reasoning,cast,tool,case,use,knowledge,structural,historical,toolbench
4. A Deterministic Agentic Workflow for HS Tariff Classification: Multi-Dimensional Rule Reasoning with Interpretable Decisions
- arXiv ID: 2605.14857
- 研究方向: planning
- 核心要点:
- digit,tariff,six,notes,agentic,top,workflow,hscodecomp,qwen3,digits
5. XDomainBench: Diagnosing Reasoning Collapse in High-Dimensional Scientific Knowledge Composition
- arXiv ID: 2605.14754
- 研究方向: planning
- 核心要点:
- xdomainbench,reasoning,scientific,composition,interactive,collapse,knowledge,ai4s,diagnosing,difficulty
OTHER (4 篇)
1. APWA: A Distributed Architecture for Parallelizable Agentic Workflows
- arXiv ID: 2605.15132
- 研究方向: other
- 核心要点:
- apwa,parallelizable,agentic,workflows,agent,tasks,parallel,breadth,architecture,systems
2. Orchard: An Open-Source Agentic Modeling Framework
- arXiv ID: 2605.15040
- 研究方向: other
- 核心要点:
- orchard,agentic,open,swe,sft,source,env,qwen3,modeling,claw
3. GraphFlow: An Architecture for Formally Verifiable Visual Workflows Enabling Reliable Agentic AI Automation
- arXiv ID: 2605.14968
- 研究方向: other
- 核心要点:
- graphflow,agentic,workflow,durable,admission,verified,workflows,automation,audit,contracts
4. MediaClaw: Multimodal Intelligent-Agent Platform Technical Report
- arXiv ID: 2605.14771
- 研究方向: other
- 核心要点:
- mediaclaw,multimodal,aigc,reusable,capability,workflow,agent,platform,production,report
MEMORY (1 篇)
1. Why Neighborhoods Matter: Traversal Context and Provenance in Agentic GraphRAG
- arXiv ID: 2605.15109
- 研究方向: memory
- 核心要点:
- graphrag,uncited,agentic,citations,traversal,provenance,answers,graph,cited,citation
TOOL (1 篇)
1. Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use
- arXiv ID: 2605.15041
- 研究方向: planning, tool
- 核心要点:
- execution,reasoning,cast,tool,case,use,knowledge,structural,historical,toolbench
SAFETY (1 篇)
1. From Sycophantic Consensus to Pluralistic Repair: Why AI Alignment Must Surface Disagreement
- arXiv ID: 2605.14912
- 研究方向: safety
- 核心要点:
- pluralistic,pluralism,sycophantic,repair,principled,disagreement,alignment,prs,rlhf,governance
MULTI_AGENT (2 篇)
1. Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems
- arXiv ID: 2605.14892
- 研究方向: multi_agent
- 核心要点:
- agent,collaboration,self,agents,stage,coordination,intelligence,attribution,multi,llm
2. Probabilistic Verification of Recurrent Neural Networks for Single and Multi-Agent Reinforcement Learning
- arXiv ID: 2605.14758
- 研究方向: multi_agent
- 核心要点:
- rnn,texttt,agent,textbf,probabilistic,verification,recurrent,hidden,babilistic,rification
EVALUATION (1 篇)
1. Holistic Evaluation and Failure Diagnosis of AI Agents
- arXiv ID: 2605.14865
- 研究方向: evaluation
- 核心要点:
- evaluation,failure,span,holistic,diagnosis,localization,agents,level,traces,framework
2️⃣ 研究趋势分析
今日热点方向
根据今日 14 篇相关论文分析:
- planning 方向: 5 篇论文 🔥 热点
- other 方向: 4 篇论文 🔥 热点
- multi_agent 方向: 2 篇论文 📈 增长
技术范式变化
- Tool Calling → Tool Learning: 从简单工具调用到自主工具学习
新兴架构模式
- Agent Workflow: 工作流编排架构
3️⃣ 关键洞察
- Memory 正在成为基础设施: 越来越多的系统将记忆能力视为标配,而非可选特性
- Planning 从规则转向学习: 传统符号规划正在被神经网络学习取代
- Multi-Agent 协作标准化: 多智能体通信协议和协调机制正在形成共识
- Tool Use 进入深水区: 从简单 API 调用到复杂工具链编排
- Safety 从后置到前置: 安全性设计正在融入系统架构,而非事后补救
4️⃣ 技术演进路径
1 | Prompt Engineering |
当前热点路径
- RAG → Memory System → World Model: 记忆架构持续深化
- Tool Calling → Tool Learning → Tool Autonomy: 工具使用自主化
- ReAct → Planning System → Goal Reasoning: 推理能力增强
5️⃣ 与开源 Agent 项目的关联
主流项目对照
| 开源项目 | 相关方向 | 今日论文验证 |
|---|---|---|
| LangChain | tool, planning | ✅ |
| LlamaIndex | memory, rag | ✅ |
| AutoGPT | planning, autonomous | ✅ |
| CrewAI | multi-agent | ✅ |
| Mem0 | memory | ✅ |
| OpenDevin | tool, planning | ✅ |
设计验证与演进
被验证的设计:
- Memory System 的必要性得到持续验证
- Tool Use 作为 Agent 核心能力已成共识
- Multi-Agent 架构在复杂任务中表现优越
需要演进的设计:
- 简单的 RAG 正在被 Memory System 取代
- 单体 Agent 架构在复杂场景中受限
- 静态 Tool Definition 需要向动态学习演进
6️⃣ 架构级结论
- Memory First: 新 Agent 项目应优先设计 Memory System,而非事后添加
- Tool Abstraction: 工具抽象层应支持动态发现和学习,而非硬编码
- Multi-Agent Ready: 即使当前是单 Agent,架构应预留多 Agent 扩展能力
- Safety by Design: 安全机制应在架构设计阶段考虑,而非事后补救
- Evaluation Driven: 建立持续评估机制,而非依赖人工测试
7️⃣ 下一步行动建议
Memory Schema 设计
- 采用分层记忆架构: Working Memory → Episodic → Long-term
- 设计统一的 Memory Interface,支持多种后端(向量、图、关系型)
- 实现 Memory Compression 机制,避免无限增长
Retrieval Policy 升级
- 从简单相似度检索升级为混合检索(关键词 + 向量 + 知识图谱)
- 实现上下文感知的动态检索策略
- 考虑引入 Reranking 机制提升相关性
Agent Orchestration 调整
- 设计标准化的 Agent 通信协议
- 实现动态任务分配机制
- 考虑引入 Orchestrator 角色
📚 附录
论文完整列表
- OpenDeepThink: Parallel Reasoning via Bradley–Terry Aggregation - planning
- APWA: A Distributed Architecture for Parallelizable Agentic Workflows - other
- Why Neighborhoods Matter: Traversal Context and Provenance in Agentic GraphRAG - memory
- Dual-Dimensional Consistency: Balancing Budget and Quality in Adaptive Inference-Time Scaling - planning
- Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use - planning, tool
- Orchard: An Open-Source Agentic Modeling Framework - other
- GraphFlow: An Architecture for Formally Verifiable Visual Workflows Enabling Reliable Agentic AI Automation - other
- From Sycophantic Consensus to Pluralistic Repair: Why AI Alignment Must Surface Disagreement - safety
- Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems - multi_agent
- Holistic Evaluation and Failure Diagnosis of AI Agents - evaluation
- A Deterministic Agentic Workflow for HS Tariff Classification: Multi-Dimensional Rule Reasoning with Interpretable Decisions - planning
- MediaClaw: Multimodal Intelligent-Agent Platform Technical Report - other
- Probabilistic Verification of Recurrent Neural Networks for Single and Multi-Agent Reinforcement Learning - multi_agent
- XDomainBench: Diagnosing Reasoning Collapse in High-Dimensional Scientific Knowledge Composition - planning
本报告由 OpenClaw 自动生成
面向 Agent 架构师,提供决策参考