
{"id":10603,"date":"2024-07-04T12:53:36","date_gmt":"2024-07-04T12:53:36","guid":{"rendered":"https:\/\/novelis.io\/?post_type=scientific-pub&#038;p=10603"},"modified":"2025-07-07T13:09:45","modified_gmt":"2025-07-07T13:09:45","slug":"optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale","status":"publish","type":"scientific-pub","link":"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/","title":{"rendered":"Optimisation des agents d&rsquo;interface utilisateur graphique pour l&rsquo;ancrage des instructions visuelles utilisant des syst\u00e8mes d&rsquo;Intelligence Artificielle multimodale."},"content":{"rendered":"\n<p>D\u00e9couvrez la premi\u00e8re version de notre publication scientifique \u00ab\u00a0Optimisation des agents d&rsquo;interface utilisateur graphique pour l&rsquo;ancrage des instructions visuelles utilisant des syst\u00e8mes d&rsquo;Intelligence Artificielle multimodale\u00a0\u00bb publi\u00e9e dans <a href=\"https:\/\/arxiv.org\/abs\/2407.01558\" target=\"_blank\" rel=\"noopener\">arxiv<\/a> et soumise \u00e0 la revue <strong>Engineering Applications of Artificial Intelligence.<\/strong> Cet article, r\u00e9dig\u00e9 en anglais, est d\u00e9j\u00e0 disponible au public.<\/p>\n\n\n\n<p>Merci \u00e0 <a href=\"https:\/\/novelis.io\/fr\/laboratoire-rd\/\">l&rsquo;\u00e9quipe de recherche de Novelis<\/a> pour leur savoir-faire et leur expertise.<\/p>\n\n\n\n<div style=\"height:100px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/arxiv.org\/abs\/2407.01558\" target=\"_blank\" rel=\"noopener\">Aller sur arXiv<\/a><\/div>\n<\/div>\n\n\n\n<div style=\"height:100px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">A propos<\/h2>\n\n\n\n<p>Most instance perception and image understanding solutions focus mainly on natural images. However, applications for synthetic images, and more specifically, images of Graphical User Interfaces (GUI) remain limited. This hinders the development of autonomous computer-vision-powered Artificial Intelligence (AI) agents. In this work, we present Search Instruction Coordinates or SIC, a multi-modal solution for object identification in a GUI. More precisely, given a natural language instruction and a screenshot of a GUI, SIC locates the coordinates of the component on the screen where the instruction would be executed. To this end, we develop two methods. The first method is a three-part architecture that relies on a combination of a Large Language Model (LLM) and an object detection model. The second approach uses a multi-modal foundation model.<\/p>\n\n\n\n<p><strong>arXiv est une archive ouverte de pr\u00e9publications \u00e9lectroniques d&rsquo;articles scientifiques dans diff\u00e9rents domaines techniques, tels que la physique, les math\u00e9matiques, l&rsquo;informatique et bien plus encore, gratuitement accessible par Internet.<\/strong><\/p>\n\n\n\n<p><\/p>\n","protected":false},"featured_media":10597,"template":"","categories":[24],"custom_tag":[87,455,460],"class_list":["post-10603","scientific-pub","type-scientific-pub","status-publish","has-post-thumbnail","hentry","category-publication-scientifique","custom_tag-ia","custom_tag-llm","custom_tag-llm-fr"],"acf":{"externel_link":"","summary":"","filter_opacity":"70","subtitle":"","reading_time":"","authors":"","document_to_download":{"upload_a_file":true,"download_without_form":true,"file":{"ID":10600,"id":10600,"title":"Graphical user interface agents optimization for visual instruction grounding using multi-modal artificial intelligence systems","filename":"Graphical-user-interface-agents-optimization-for-visual-instruction-grounding-using-multi-modal-artificial-intelligence-systems-1.pdf","filesize":581109,"url":"https:\/\/novelis.io\/wp-content\/uploads\/2024\/07\/Graphical-user-interface-agents-optimization-for-visual-instruction-grounding-using-multi-modal-artificial-intelligence-systems-1.pdf","link":"https:\/\/novelis.io\/fr\/?attachment_id=10600","alt":"","author":"10","description":"","caption":"","name":"graphical-user-interface-agents-optimization-for-visual-instruction-grounding-using-multi-modal-artificial-intelligence-systems","status":"inherit","uploaded_to":10599,"date":"2024-07-04 12:47:18","modified":"2024-07-04 12:47:18","menu_order":0,"mime_type":"application\/pdf","type":"application","subtype":"pdf","icon":"https:\/\/novelis.io\/wp-includes\/images\/media\/document.png"},"url":""},"show_recent_block_on_the_bottom_of_the_page":false},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Optimisation des agents d&#039;interface utilisateur graphique<\/title>\n<meta name=\"description\" content=\"D\u00e9couvrez la premi\u00e8re version de notre publication scientifique &quot;Optimisation des agents d&#039;interface utilisateur graphique pour l&#039;ancrage des instructions visuelles utilisant des syst\u00e8mes d&#039;Intelligence Artificielle multimodale&quot;\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/\" \/>\n<meta property=\"og:locale\" content=\"fr_FR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Optimisation des agents d&#039;interface utilisateur graphique\" \/>\n<meta property=\"og:description\" content=\"D\u00e9couvrez la premi\u00e8re version de notre publication scientifique &quot;Optimisation des agents d&#039;interface utilisateur graphique pour l&#039;ancrage des instructions visuelles utilisant des syst\u00e8mes d&#039;Intelligence Artificielle multimodale&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/\" \/>\n<meta property=\"og:site_name\" content=\"Novelis innovation\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/novelis.io\" \/>\n<meta property=\"article:modified_time\" content=\"2025-07-07T13:09:45+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/novelis.io\/wp-content\/uploads\/2024\/07\/Graphical-user-interface-agents-optimization-for-visual-instruction-grounding-using-multi-modal-artificial-intelligence-systems.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1067\" \/>\n\t<meta property=\"og:image:height\" content=\"600\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@novelis_io\" \/>\n<meta name=\"twitter:label1\" content=\"Dur\u00e9e de lecture estim\u00e9e\" \/>\n\t<meta name=\"twitter:data1\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/\",\"url\":\"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/\",\"name\":\"Optimisation des agents d'interface utilisateur graphique\",\"isPartOf\":{\"@id\":\"https:\/\/novelis.io\/fr\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/novelis.io\/wp-content\/uploads\/2024\/07\/Graphical-user-interface-agents-optimization-for-visual-instruction-grounding-using-multi-modal-artificial-intelligence-systems.png\",\"datePublished\":\"2024-07-04T12:53:36+00:00\",\"dateModified\":\"2025-07-07T13:09:45+00:00\",\"description\":\"D\u00e9couvrez la premi\u00e8re version de notre publication scientifique \\\"Optimisation des agents d'interface utilisateur graphique pour l'ancrage des instructions visuelles utilisant des syst\u00e8mes d'Intelligence Artificielle multimodale\\\"\",\"breadcrumb\":{\"@id\":\"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/#breadcrumb\"},\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/#primaryimage\",\"url\":\"https:\/\/novelis.io\/wp-content\/uploads\/2024\/07\/Graphical-user-interface-agents-optimization-for-visual-instruction-grounding-using-multi-modal-artificial-intelligence-systems.png\",\"contentUrl\":\"https:\/\/novelis.io\/wp-content\/uploads\/2024\/07\/Graphical-user-interface-agents-optimization-for-visual-instruction-grounding-using-multi-modal-artificial-intelligence-systems.png\",\"width\":1067,\"height\":600},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\/\/novelis.io\/fr\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Optimisation des agents d&#8217;interface utilisateur graphique pour l&#8217;ancrage des instructions visuelles utilisant des syst\u00e8mes d&#8217;Intelligence Artificielle multimodale.\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/novelis.io\/fr\/#website\",\"url\":\"https:\/\/novelis.io\/fr\/\",\"name\":\"Novelis innovation\",\"description\":\"Novelis innovation\",\"publisher\":{\"@id\":\"https:\/\/novelis.io\/fr\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/novelis.io\/fr\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"fr-FR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/novelis.io\/fr\/#organization\",\"name\":\"Novelis innovation\",\"url\":\"https:\/\/novelis.io\/fr\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/novelis.io\/fr\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/novelis.io\/wp-content\/uploads\/2021\/12\/logo-1.png\",\"contentUrl\":\"https:\/\/novelis.io\/wp-content\/uploads\/2021\/12\/logo-1.png\",\"width\":479,\"height\":98,\"caption\":\"Novelis innovation\"},\"image\":{\"@id\":\"https:\/\/novelis.io\/fr\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/novelis.io\",\"https:\/\/x.com\/novelis_io\",\"https:\/\/www.linkedin.com\/company\/novelis-consulting\/\",\"https:\/\/www.youtube.com\/channel\/UCJ5eJR22n2GtfKaTWueWRPQ\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Optimisation des agents d'interface utilisateur graphique","description":"D\u00e9couvrez la premi\u00e8re version de notre publication scientifique \"Optimisation des agents d'interface utilisateur graphique pour l'ancrage des instructions visuelles utilisant des syst\u00e8mes d'Intelligence Artificielle multimodale\"","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/","og_locale":"fr_FR","og_type":"article","og_title":"Optimisation des agents d'interface utilisateur graphique","og_description":"D\u00e9couvrez la premi\u00e8re version de notre publication scientifique \"Optimisation des agents d'interface utilisateur graphique pour l'ancrage des instructions visuelles utilisant des syst\u00e8mes d'Intelligence Artificielle multimodale\"","og_url":"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/","og_site_name":"Novelis innovation","article_publisher":"https:\/\/www.facebook.com\/novelis.io","article_modified_time":"2025-07-07T13:09:45+00:00","og_image":[{"width":1067,"height":600,"url":"https:\/\/novelis.io\/wp-content\/uploads\/2024\/07\/Graphical-user-interface-agents-optimization-for-visual-instruction-grounding-using-multi-modal-artificial-intelligence-systems.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_site":"@novelis_io","twitter_misc":{"Dur\u00e9e de lecture estim\u00e9e":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/","url":"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/","name":"Optimisation des agents d'interface utilisateur graphique","isPartOf":{"@id":"https:\/\/novelis.io\/fr\/#website"},"primaryImageOfPage":{"@id":"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/#primaryimage"},"image":{"@id":"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/#primaryimage"},"thumbnailUrl":"https:\/\/novelis.io\/wp-content\/uploads\/2024\/07\/Graphical-user-interface-agents-optimization-for-visual-instruction-grounding-using-multi-modal-artificial-intelligence-systems.png","datePublished":"2024-07-04T12:53:36+00:00","dateModified":"2025-07-07T13:09:45+00:00","description":"D\u00e9couvrez la premi\u00e8re version de notre publication scientifique \"Optimisation des agents d'interface utilisateur graphique pour l'ancrage des instructions visuelles utilisant des syst\u00e8mes d'Intelligence Artificielle multimodale\"","breadcrumb":{"@id":"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/#breadcrumb"},"inLanguage":"fr-FR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/"]}]},{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/#primaryimage","url":"https:\/\/novelis.io\/wp-content\/uploads\/2024\/07\/Graphical-user-interface-agents-optimization-for-visual-instruction-grounding-using-multi-modal-artificial-intelligence-systems.png","contentUrl":"https:\/\/novelis.io\/wp-content\/uploads\/2024\/07\/Graphical-user-interface-agents-optimization-for-visual-instruction-grounding-using-multi-modal-artificial-intelligence-systems.png","width":1067,"height":600},{"@type":"BreadcrumbList","@id":"https:\/\/novelis.io\/fr\/scientific-pub\/optimisation-des-agents-dinterface-utilisateur-graphique-pour-lancrage-des-instructions-visuelles-utilisant-des-systemes-dintelligence-artificielle-multimodale\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/novelis.io\/fr\/"},{"@type":"ListItem","position":2,"name":"Optimisation des agents d&#8217;interface utilisateur graphique pour l&#8217;ancrage des instructions visuelles utilisant des syst\u00e8mes d&#8217;Intelligence Artificielle multimodale."}]},{"@type":"WebSite","@id":"https:\/\/novelis.io\/fr\/#website","url":"https:\/\/novelis.io\/fr\/","name":"Novelis innovation","description":"Novelis innovation","publisher":{"@id":"https:\/\/novelis.io\/fr\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/novelis.io\/fr\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"fr-FR"},{"@type":"Organization","@id":"https:\/\/novelis.io\/fr\/#organization","name":"Novelis innovation","url":"https:\/\/novelis.io\/fr\/","logo":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/novelis.io\/fr\/#\/schema\/logo\/image\/","url":"https:\/\/novelis.io\/wp-content\/uploads\/2021\/12\/logo-1.png","contentUrl":"https:\/\/novelis.io\/wp-content\/uploads\/2021\/12\/logo-1.png","width":479,"height":98,"caption":"Novelis innovation"},"image":{"@id":"https:\/\/novelis.io\/fr\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/novelis.io","https:\/\/x.com\/novelis_io","https:\/\/www.linkedin.com\/company\/novelis-consulting\/","https:\/\/www.youtube.com\/channel\/UCJ5eJR22n2GtfKaTWueWRPQ"]}]}},"_links":{"self":[{"href":"https:\/\/novelis.io\/fr\/wp-json\/wp\/v2\/scientific-pub\/10603"}],"collection":[{"href":"https:\/\/novelis.io\/fr\/wp-json\/wp\/v2\/scientific-pub"}],"about":[{"href":"https:\/\/novelis.io\/fr\/wp-json\/wp\/v2\/types\/scientific-pub"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/novelis.io\/fr\/wp-json\/wp\/v2\/media\/10597"}],"wp:attachment":[{"href":"https:\/\/novelis.io\/fr\/wp-json\/wp\/v2\/media?parent=10603"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/novelis.io\/fr\/wp-json\/wp\/v2\/categories?post=10603"},{"taxonomy":"custom_tag","embeddable":true,"href":"https:\/\/novelis.io\/fr\/wp-json\/wp\/v2\/custom_tag?post=10603"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}