Découvrez la première version de notre publication scientifique « Optimisation des agents d’interface utilisateur graphique pour l’ancrage des instructions visuelles utilisant des systèmes d’Intelligence Artificielle multimodale » publiée dans arxiv et soumise à la revue Engineering Applications of Artificial Intelligence. Cet article, rédigé en anglais, est déjà disponible au public.
Merci à l’équipe de recherche de Novelis pour leur savoir-faire et leur expertise.
A propos
Most instance perception and image understanding solutions focus mainly on natural images. However, applications for synthetic images, and more specifically, images of Graphical User Interfaces (GUI) remain limited. This hinders the development of autonomous computer-vision-powered Artificial Intelligence (AI) agents. In this work, we present Search Instruction Coordinates or SIC, a multi-modal solution for object identification in a GUI. More precisely, given a natural language instruction and a screenshot of a GUI, SIC locates the coordinates of the component on the screen where the instruction would be executed. To this end, we develop two methods. The first method is a three-part architecture that relies on a combination of a Large Language Model (LLM) and an object detection model. The second approach uses a multi-modal foundation model.
arXiv est une archive ouverte de prépublications électroniques d’articles scientifiques dans différents domaines techniques, tels que la physique, les mathématiques, l’informatique et bien plus encore, gratuitement accessible par Internet.
Découvrez la première version de notre publication scientifique « Évaluation comparative des modèles de langage open-source pour une réponse efficace aux questions dans les applications industrielles » publiée dans arxiv et soumise à la revue Engineering Applications of Artificial Intelligence. Cet article, rédigé en anglais, est déjà disponible au public.
Merci à l’équipe de recherche de Novelis pour leur savoir-faire et leur expertise.
A propos
In the rapidly evolving landscape of Natural Language Processing (NLP),Large Language Models (LLMs) have demonstrated remarkable capabilitiesin tasks such as question answering (QA). However, the accessibility andpracticality of utilizing these models for industrial applications pose signif-icant challenges, particularly concerning cost-effectiveness, inference speed,and resource efficiency. This paper presents a comprehensive benchmarkingstudy comparing open-source LLMs with their non-open-source counterpartson the task of question answering. Our objective is to identify open-source al-ternatives capable of delivering comparable performance to proprietary mod-els while being lightweight in terms of resource requirements and suitable forCentral Processing Unit (CPU)-based inference. Through rigorous evalua-tion across various metrics including accuracy, inference speed, and resourceconsumption, we aim to provide insights into selecting efficient LLMs forreal-world applications. Our findings shed light on viable open-source al-ternatives that offer acceptable performance and efficiency, addressing thepressing need for accessible and efficient NLP solutions in industry settings.
arXiv est une archive ouverte de prépublications électroniques d’articles scientifiques dans différents domaines techniques, tels que la physique, les mathématiques, l’informatique et bien plus encore, gratuitement accessible par Internet.
Découvrez la première version de notre publication scientifique « Modèles de langage profonds low-cost : Enquête et évaluation des performances sur la génération de code Python » publié dans arxiv et soumis au journal Engineering Applications of Artificial Intelligence. Cet article rédigé en anglais est déjà disponible au public.
Merci à l’équipe de recherche de Novelis – notamment Jessica López Espejel, Mahaman Sanoussi Yahaya Alassan, Merieme Bouhandi, Walid Dahhane, El Hassane Ettifouri – pour son savoir-faire et son expertise.
A propos
« Large Language Models (LLMs) have become the go-to solution for many Natural Language Processing (NLP) tasks due to their ability to tackle various problems and produce high-quality results. Specifically, they are increasingly used to automatically generate code, easing the burden on developers by handling repetitive tasks. However, this improvement in quality has led to high computational and memory demands, making LLMs inaccessible to users with limited resources. In this paper, we focus on Central Processing Unit (CPU)-compatible models and conduct a thorough semi-manual evaluation of their strengths and weaknesses in generating Python code. We enhance their performance by introducing a Chain-of-Thought prompt that guides the model in problem-solving. Additionally, we propose a dataset of 60 programming problems with varying difficulty levels for evaluation purposes. Our assessment also includes testing these models on two state-of-the-art datasets: HumanEval and EvalPlus. We commit to sharing our dataset and experimental results publicly to ensure transparency. »
arXiv est une archive ouverte de prépublications électroniques d’articles scientifiques dans différents domaines techniques, tels que la physique, les mathématiques, l’informatique et bien plus encore, gratuitement accessible par Internet.
Découvrez notre publication scientifique « GPT-3.5, GPT-4, or BARD? Evaluating LLMs reasoning ability in zero-shot learning and performance boosting through prompts » publiée dans Elsevier et repris dans ScienceDirect. Cet article est en anglais.
Merci à l’équipe de recherche de Novelis – notamment Jessica López Espejel, Mahaman Sanoussi Yahaya Alassan, El Mehdi Chouham, El Hassane Ettifouri, Walid Dahhane – pour son savoir-faire et son expertise.
A propos
“Large Language Models (LLMs) have exhibited remarkable performance on various Natural Language Processing (NLP) tasks. However, there is a current hot debate regarding their reasoning capacity. In this paper, we examine the performance of GPT-3.5, GPT-4, and BARD models, by performing a thorough technical evaluation on different reasoning tasks across eleven distinct datasets. Our paper provides empirical evidence showcasing the superior performance of ChatGPT-4 in comparison to both ChatGPT-3.5 and BARD in zero-shot setting throughout almost all evaluated tasks. While the superiority of GPT-4 compared to GPT-3.5 might be explained by its larger size and NLP efficiency, this was not evident for BARD. We also demonstrate that the three models show limited proficiency in Inductive, Mathematical, and Multi-hop Reasoning Tasks. To bolster our findings, we present a detailed and comprehensive analysis of the results from these three models. Furthermore, we propose a set of engineered prompts that enhances the zero-shot setting performance of all three models.”
Elsevier est une entreprise d’analyse de données qui aide les institutions, les professionnels de santé et des sciences à améliorer leurs performances pour le bien-être de l’humanité.
ScienceDirect est la première source mondiale de recherche scientifique, technique et médicale.