<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>基准评测 on AI内参</title>
    <link>https://www.neican.ai/tags/%E5%9F%BA%E5%87%86%E8%AF%84%E6%B5%8B/</link>
    <description>Recent content in 基准评测 on AI内参</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 27 May 2026 19:40:07 +0800</lastBuildDate>
    <atom:link href="https://www.neican.ai/tags/%E5%9F%BA%E5%87%86%E8%AF%84%E6%B5%8B/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>编程基准的“测不准定理”：当AI从刷榜走向真实工程的十字路口</title>
      <link>https://www.neican.ai/insights/article-20260527194007328-1/</link>
      <pubDate>Wed, 27 May 2026 19:40:07 +0800</pubDate>
      <guid>https://www.neican.ai/insights/article-20260527194007328-1/</guid>
      <description>DeepSWE基准测试揭示了主流AI编程测评中存在的严重数据污染与误判现象，标志着行业评价标准正从单纯的“刷榜”转向评估真实工程中的自主推理能力。这一转折迫使模型开发者重塑技术路线，同时也预示了未来AI软件工程向自主化、高可信度方向的深层演进。</description>
    </item>
  </channel>
</rss>
