
{"id":11105,"date":"2025-05-22T10:01:12","date_gmt":"2025-05-22T10:01:12","guid":{"rendered":"https:\/\/novelis.io\/?post_type=research-lab&#038;p=11105"},"modified":"2025-07-07T09:41:35","modified_gmt":"2025-07-07T09:41:35","slug":"dans-les-coulisses-du-cadre-agent-as-a-judge","status":"publish","type":"research-lab","link":"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/","title":{"rendered":"Dans les coulisses du cadre \u00ab\u00a0Agent-as-a-Judge\u00a0\u00bb"},"content":{"rendered":"\n<p>\u00c0 mesure que l\u2019IA passe de mod\u00e8les statiques \u00e0 des syst\u00e8mes agentiques, l\u2019\u00e9valuation devient l\u2019un des plus grands d\u00e9fis du domaine. Les m\u00e9thodes traditionnelles se concentrent sur les r\u00e9sultats finaux ou reposent sur des \u00e9valuations humaines co\u00fbteuses et lentes. M\u00eame les approches automatis\u00e9es comme <em>LLM-as-a-Judge<\/em>, bien qu&rsquo;utiles, ne permettent pas d\u2019\u00e9valuer le raisonnement \u00e9tape par \u00e9tape ou la planification it\u00e9rative, qui sont pourtant au c\u0153ur des agents modernes comme les g\u00e9n\u00e9rateurs de code IA. Pour r\u00e9pondre \u00e0 cela, des chercheurs de Meta AI et KAUST proposent une nouvelle approche : <strong>Agent-as-a-Judge<\/strong>, un \u00e9valuateur modulaire et agentique con\u00e7u pour \u00e9valuer les syst\u00e8mes agentiques de mani\u00e8re globale \u2013 non seulement selon <em>ce qu\u2019ils produisent<\/em>, mais aussi <em>comment<\/em> ils le produisent.<\/p>\n\n\n\n<p><strong>Pourquoi les \u00e9valuations classiques sont insuffisantes<\/strong><\/p>\n\n\n\n<p>Les agents IA actuels raisonnent sur plusieurs \u00e9tapes, interagissent avec des outils, s\u2019adaptent dynamiquement et accomplissent des t\u00e2ches complexes sur le long terme. Les \u00e9valuer comme de simples bo\u00eetes noires passe \u00e0 c\u00f4t\u00e9 de l\u2019essentiel. Les r\u00e9sultats finaux ne r\u00e9v\u00e8lent pas si la d\u00e9marche \u00e9tait pertinente, les \u00e9valuations humaines sont peu scalables, et les jugements par LLM n\u2019arrivent pas \u00e0 saisir pleinement le raisonnement modulaire ou les d\u00e9cisions contextuelles.<\/p>\n\n\n\n<p><strong>Voici Agent-as-a-Judge<\/strong><\/p>\n\n\n\n<p>Ce nouveau cadre int\u00e8gre une \u00e9valuation structur\u00e9e gr\u00e2ce aux capacit\u00e9s agentiques. Il utilise plusieurs modules sp\u00e9cialis\u00e9s :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Ask<\/strong> : poser des questions sur les exigences floues ou manquantes.<\/li>\n\n\n\n<li><strong>Read<\/strong> : analyser les r\u00e9sultats et les fichiers interm\u00e9diaires de l\u2019agent.<\/li>\n\n\n\n<li><strong>Locate<\/strong> : localiser les sections de code ou documentation pertinentes.<\/li>\n\n\n\n<li><strong>Retrieve<\/strong> : r\u00e9cup\u00e9rer du contexte depuis des sources associ\u00e9es.<\/li>\n\n\n\n<li><strong>Graph<\/strong> : comprendre les liens logiques et structurels de la t\u00e2che.<\/li>\n<\/ul>\n\n\n\n<p>On peut le voir comme un relecteur de code dot\u00e9 de capacit\u00e9s de raisonnement, qui \u00e9value non seulement <em>ce qui a \u00e9t\u00e9 fait<\/em>, mais aussi <em>comment cela a \u00e9t\u00e9 fait<\/em>.<\/p>\n\n\n\n<p><strong>DevAI : un benchmark plus proche de la r\u00e9alit\u00e9<\/strong><\/p>\n\n\n\n<p>Pour tester ce cadre, l\u2019\u00e9quipe a con\u00e7u <strong>DevAI<\/strong>, un benchmark comprenant 55 t\u00e2ches r\u00e9elles de d\u00e9veloppement IA et 365 crit\u00e8res d\u2019\u00e9valuation, allant des d\u00e9tails techniques \u00e0 la logique fonctionnelle globale. Contrairement aux benchmarks existants, ceux-ci refl\u00e8tent les objectifs complexes et parfois d\u00e9sordonn\u00e9s rencontr\u00e9s en production.<\/p>\n\n\n\n<p><strong>Les r\u00e9sultats : Agent-as-a-Judge vs. Human-as-a-Judge et LLM-as-a-Judge<\/strong><\/p>\n\n\n\n<p>Trois agents IA (MetaGPT, GPT-Pilot, OpenHands) ont \u00e9t\u00e9 \u00e9valu\u00e9s par des experts humains, par LLM-as-a-Judge, et par le nouveau cadre Agent-as-a-Judge.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>L\u2019\u00e9valuation humaine reste la r\u00e9f\u00e9rence, mais reste lente et co\u00fbteuse.<\/li>\n\n\n\n<li>LLM-as-a-Judge offre une pr\u00e9cision mod\u00e9r\u00e9e (~70 %) avec des gains de temps et de co\u00fbts.<\/li>\n\n\n\n<li>Agent-as-a-Judge atteint une concordance de plus de <strong>95 %<\/strong> avec les jugements humains, tout en \u00e9tant <strong>97,64 % moins cher<\/strong> et <strong>97,72 % plus rapide<\/strong>.<\/li>\n<\/ul>\n\n\n\n<p><strong>Ce que cela change<\/strong><\/p>\n\n\n\n<p>Ce syst\u00e8me pourrait ouvrir la voie \u00e0 une boucle d\u2019auto-am\u00e9lioration : des agents qui \u00e9valuent d\u2019autres agents pour g\u00e9n\u00e9rer de meilleures donn\u00e9es et former des syst\u00e8mes plus robustes. Cette <strong>\u00ab\u00a0flywheel agentique\u00a0\u00bb<\/strong> dessine un futur o\u00f9 les agents pourraient s\u2019auto-critiquer, se corriger et progresser sans intervention humaine.<br><strong>Agent-as-a-Judge<\/strong> ne se contente pas d\u2019am\u00e9liorer l\u2019\u00e9valuation : il pourrait bien transformer la mani\u00e8re dont on comprend, supervise et fiabilise les comportements des agents IA.<\/p>\n\n\n\n<p>Lectures compl\u00e9mentaires :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/arxiv.org\/abs\/2410.10934\" target=\"_blank\" rel=\"noopener\">https:\/\/arxiv.org\/abs\/2410.10934<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/cognition.ai\/blog\/evaluating-coding-agents\" target=\"_blank\" rel=\"noopener\">https:\/\/cognition.ai\/blog\/evaluating-coding-agents<\/a><\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1200\" height=\"1200\" src=\"https:\/\/novelis.io\/wp-content\/uploads\/2025\/05\/Agent-as-a-Judge.png\" alt=\"\" class=\"wp-image-11099\" style=\"width:662px;height:auto\" srcset=\"https:\/\/novelis.io\/wp-content\/uploads\/2025\/05\/Agent-as-a-Judge.png 1200w, https:\/\/novelis.io\/wp-content\/uploads\/2025\/05\/Agent-as-a-Judge-600x600.png 600w, https:\/\/novelis.io\/wp-content\/uploads\/2025\/05\/Agent-as-a-Judge-250x250.png 250w, https:\/\/novelis.io\/wp-content\/uploads\/2025\/05\/Agent-as-a-Judge-768x768.png 768w, https:\/\/novelis.io\/wp-content\/uploads\/2025\/05\/Agent-as-a-Judge-30x30.png 30w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><\/figure>\n\n\n\n<p><\/p>\n","protected":false},"featured_media":11103,"template":"","categories":[510],"custom_tag":[87],"class_list":["post-11105","research-lab","type-research-lab","status-publish","has-post-thumbnail","hentry","category-lab-news-2","custom_tag-ia"],"acf":{"externel_link":"","summary":"","filter_opacity":"70","subtitle":"","reading_time":"","authors":"","document_to_download":{"upload_a_file":false,"download_without_form":false,"file":false,"url":""},"show_recent_block_on_the_bottom_of_the_page":false},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Dans les coulisses du cadre &quot;Agent-as-a-Judge&quot;<\/title>\n<meta name=\"description\" content=\"Agent-as-a-Judge, un \u00e9valuateur modulaire et agentique con\u00e7u pour \u00e9valuer les syst\u00e8mes agentiques de mani\u00e8re globale\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/\" \/>\n<meta property=\"og:locale\" content=\"fr_FR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Dans les coulisses du cadre &quot;Agent-as-a-Judge&quot;\" \/>\n<meta property=\"og:description\" content=\"Agent-as-a-Judge, un \u00e9valuateur modulaire et agentique con\u00e7u pour \u00e9valuer les syst\u00e8mes agentiques de mani\u00e8re globale\" \/>\n<meta property=\"og:url\" content=\"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/\" \/>\n<meta property=\"og:site_name\" content=\"Novelis innovation\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/novelis.io\" \/>\n<meta property=\"article:modified_time\" content=\"2025-07-07T09:41:35+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/novelis.io\/wp-content\/uploads\/2025\/05\/image-Site-23-scaled.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1440\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@novelis_io\" \/>\n<meta name=\"twitter:label1\" content=\"Dur\u00e9e de lecture estim\u00e9e\" \/>\n\t<meta name=\"twitter:data1\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/\",\"url\":\"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/\",\"name\":\"Dans les coulisses du cadre \\\"Agent-as-a-Judge\\\"\",\"isPartOf\":{\"@id\":\"https:\/\/novelis.io\/fr\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/novelis.io\/wp-content\/uploads\/2025\/05\/image-Site-23-scaled.jpg\",\"datePublished\":\"2025-05-22T10:01:12+00:00\",\"dateModified\":\"2025-07-07T09:41:35+00:00\",\"description\":\"Agent-as-a-Judge, un \u00e9valuateur modulaire et agentique con\u00e7u pour \u00e9valuer les syst\u00e8mes agentiques de mani\u00e8re globale\",\"breadcrumb\":{\"@id\":\"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/#breadcrumb\"},\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/#primaryimage\",\"url\":\"https:\/\/novelis.io\/wp-content\/uploads\/2025\/05\/image-Site-23-scaled.jpg\",\"contentUrl\":\"https:\/\/novelis.io\/wp-content\/uploads\/2025\/05\/image-Site-23-scaled.jpg\",\"width\":2560,\"height\":1440},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\/\/novelis.io\/fr\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Dans les coulisses du cadre &#8220;Agent-as-a-Judge&#8221;\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/novelis.io\/fr\/#website\",\"url\":\"https:\/\/novelis.io\/fr\/\",\"name\":\"Novelis innovation\",\"description\":\"Novelis innovation\",\"publisher\":{\"@id\":\"https:\/\/novelis.io\/fr\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/novelis.io\/fr\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"fr-FR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/novelis.io\/fr\/#organization\",\"name\":\"Novelis innovation\",\"url\":\"https:\/\/novelis.io\/fr\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/novelis.io\/fr\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/novelis.io\/wp-content\/uploads\/2021\/12\/logo-1.png\",\"contentUrl\":\"https:\/\/novelis.io\/wp-content\/uploads\/2021\/12\/logo-1.png\",\"width\":479,\"height\":98,\"caption\":\"Novelis innovation\"},\"image\":{\"@id\":\"https:\/\/novelis.io\/fr\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/novelis.io\",\"https:\/\/x.com\/novelis_io\",\"https:\/\/www.linkedin.com\/company\/novelis-consulting\/\",\"https:\/\/www.youtube.com\/channel\/UCJ5eJR22n2GtfKaTWueWRPQ\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Dans les coulisses du cadre \"Agent-as-a-Judge\"","description":"Agent-as-a-Judge, un \u00e9valuateur modulaire et agentique con\u00e7u pour \u00e9valuer les syst\u00e8mes agentiques de mani\u00e8re globale","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/","og_locale":"fr_FR","og_type":"article","og_title":"Dans les coulisses du cadre \"Agent-as-a-Judge\"","og_description":"Agent-as-a-Judge, un \u00e9valuateur modulaire et agentique con\u00e7u pour \u00e9valuer les syst\u00e8mes agentiques de mani\u00e8re globale","og_url":"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/","og_site_name":"Novelis innovation","article_publisher":"https:\/\/www.facebook.com\/novelis.io","article_modified_time":"2025-07-07T09:41:35+00:00","og_image":[{"width":2560,"height":1440,"url":"https:\/\/novelis.io\/wp-content\/uploads\/2025\/05\/image-Site-23-scaled.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_site":"@novelis_io","twitter_misc":{"Dur\u00e9e de lecture estim\u00e9e":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/","url":"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/","name":"Dans les coulisses du cadre \"Agent-as-a-Judge\"","isPartOf":{"@id":"https:\/\/novelis.io\/fr\/#website"},"primaryImageOfPage":{"@id":"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/#primaryimage"},"image":{"@id":"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/#primaryimage"},"thumbnailUrl":"https:\/\/novelis.io\/wp-content\/uploads\/2025\/05\/image-Site-23-scaled.jpg","datePublished":"2025-05-22T10:01:12+00:00","dateModified":"2025-07-07T09:41:35+00:00","description":"Agent-as-a-Judge, un \u00e9valuateur modulaire et agentique con\u00e7u pour \u00e9valuer les syst\u00e8mes agentiques de mani\u00e8re globale","breadcrumb":{"@id":"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/#breadcrumb"},"inLanguage":"fr-FR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/"]}]},{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/#primaryimage","url":"https:\/\/novelis.io\/wp-content\/uploads\/2025\/05\/image-Site-23-scaled.jpg","contentUrl":"https:\/\/novelis.io\/wp-content\/uploads\/2025\/05\/image-Site-23-scaled.jpg","width":2560,"height":1440},{"@type":"BreadcrumbList","@id":"https:\/\/novelis.io\/fr\/research-lab\/dans-les-coulisses-du-cadre-agent-as-a-judge\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/novelis.io\/fr\/"},{"@type":"ListItem","position":2,"name":"Dans les coulisses du cadre &#8220;Agent-as-a-Judge&#8221;"}]},{"@type":"WebSite","@id":"https:\/\/novelis.io\/fr\/#website","url":"https:\/\/novelis.io\/fr\/","name":"Novelis innovation","description":"Novelis innovation","publisher":{"@id":"https:\/\/novelis.io\/fr\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/novelis.io\/fr\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"fr-FR"},{"@type":"Organization","@id":"https:\/\/novelis.io\/fr\/#organization","name":"Novelis innovation","url":"https:\/\/novelis.io\/fr\/","logo":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/novelis.io\/fr\/#\/schema\/logo\/image\/","url":"https:\/\/novelis.io\/wp-content\/uploads\/2021\/12\/logo-1.png","contentUrl":"https:\/\/novelis.io\/wp-content\/uploads\/2021\/12\/logo-1.png","width":479,"height":98,"caption":"Novelis innovation"},"image":{"@id":"https:\/\/novelis.io\/fr\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/novelis.io","https:\/\/x.com\/novelis_io","https:\/\/www.linkedin.com\/company\/novelis-consulting\/","https:\/\/www.youtube.com\/channel\/UCJ5eJR22n2GtfKaTWueWRPQ"]}]}},"_links":{"self":[{"href":"https:\/\/novelis.io\/fr\/wp-json\/wp\/v2\/research-lab\/11105"}],"collection":[{"href":"https:\/\/novelis.io\/fr\/wp-json\/wp\/v2\/research-lab"}],"about":[{"href":"https:\/\/novelis.io\/fr\/wp-json\/wp\/v2\/types\/research-lab"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/novelis.io\/fr\/wp-json\/wp\/v2\/media\/11103"}],"wp:attachment":[{"href":"https:\/\/novelis.io\/fr\/wp-json\/wp\/v2\/media?parent=11105"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/novelis.io\/fr\/wp-json\/wp\/v2\/categories?post=11105"},{"taxonomy":"custom_tag","embeddable":true,"href":"https:\/\/novelis.io\/fr\/wp-json\/wp\/v2\/custom_tag?post=11105"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}