<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>模型性能 on AI内参</title>
    <link>https://www.neican.ai/tags/%E6%A8%A1%E5%9E%8B%E6%80%A7%E8%83%BD/</link>
    <description>Recent content in 模型性能 on AI内参</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Fri, 12 Jun 2026 13:10:10 +0800</lastBuildDate>
    <atom:link href="https://www.neican.ai/tags/%E6%A8%A1%E5%9E%8B%E6%80%A7%E8%83%BD/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>智能体的“真工”时刻：当基准测试不再仅是学术游戏的竞技场</title>
      <link>https://www.neican.ai/insights/article-20260612131010740-0/</link>
      <pubDate>Fri, 12 Jun 2026 13:10:10 +0800</pubDate>
      <guid>https://www.neican.ai/insights/article-20260612131010740-0/</guid>
      <description>智能体最后的考试（ALE）标志着AI评估从静态知识测试转向真实工业流程的实战检验。测试揭示了当前顶尖模型在长流程任务中的逻辑缺陷与执行无力，预示着未来AI的发展重点将从规模扩张转向交互效率与系统级集成。</description>
    </item>
  </channel>
</rss>
