<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>评测基准 on AI内参</title>
    <link>https://neican.ai/tags/%E8%AF%84%E6%B5%8B%E5%9F%BA%E5%87%86/</link>
    <description>Recent content in 评测基准 on AI内参</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Mon, 11 May 2026 15:10:03 +0800</lastBuildDate>
    <atom:link href="https://neican.ai/tags/%E8%AF%84%E6%B5%8B%E5%9F%BA%E5%87%86/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>从「做题家」到「实干者」：AI Agent评测正在迈向「真实世界」的下半场</title>
      <link>https://neican.ai/insights/ai-agent-20260511151003385-0/</link>
      <pubDate>Mon, 11 May 2026 15:10:03 +0800</pubDate>
      <guid>https://neican.ai/insights/ai-agent-20260511151003385-0/</guid>
      <description>Agent评测正从「只看答案」转向「全流程可审计」，Claw-Eval-Live通过构建与真实商业需求同步更新的动态评估机制，揭示了AI在复杂跨系统业务中依然存在能力瓶颈。这一范式转变为企业级Agent的可靠部署奠定了关键的评测基础设施。</description>
    </item>
  </channel>
</rss>
