How do you determine which stocks to buy, sell, or hold? This is a complex question that requires considering multiple factors: geopolitical events, market trends, company-specific news, and macroeconomic conditions. For individuals or small to medium businesses, taking all these factors into account can be overwhelming. Even large corporations with dedicated financial analysts face challenges due to organizational silos or lack of communication.
Inspired by the success of GPT-4’s reasoning abilities, researchers from Alpha Tensor Technologies Ltd., the University of Piraeus, and Innov-Acts have developed MarketSenseAI, a GPT-4-based framework designed to assist with stock-related decisions—whether to buy, sell, or hold. MarketSenseAI provides not only predictive capabilities and a signal evaluation mechanism but also explains the rationale behind its recommendations.
The platform is highly customizable to suit an individual’s or company’s risk tolerance, investment plans, and other preferences. It consists of five core modules:
Progressive News Summary – Summarizes recent developments in the company or sector, alongside past news reports.
Macroeconomic Summary – Examines the macroeconomic factors influencing the current market environment.
Stock Price Dynamics – Analyzes the stock’s price movements and trends.
Signal Generation – Integrates the information from all the modules to deliver a comprehensive investment recommendation for a specific stock, along with a detailed rationale.
This framework serves as a valuable assistant in the decision-making process, empowering investors to make more informed choices. Integrating AI into investment decisions offers several key advantages: it introduces less bias compared to human analysts, efficiently processes large volumes of unstructured data, and identifies patterns, outliers, and discrepancies that traditional analysis might overlook.
Despite the impressive capabilities of LLMs, they can sometimes confidently generate inaccurate information. This is known as “hallucination” and it is a key challenge in Generative AI. This issue is even more pronounced in relation to numerical and statistical facts. Indeed, statistical data introduces unique challenges :
First, pretraining with user queries pertaining to statistical information involves a variety of logical, arithmetic, or comparison operations with varying degrees of complexity.
Second, public statistical data exists in diverse formats and schemas, frequently necessitating significant contextual background for accurate interpretation. This creates particular difficulties for RAG-based systems.
DataGemma: An Innovative Solution
Researchers at Google present DataGemma, the interfacing LLMs that harness the knowledge of Data Commons — a vast unified repository of public statistical data — to tackle the challenges mentioned earlier. Furthermore, two different approaches are employed : Retrieval Interleaved Generation (RIG) and Retrieval Augmented Generation (RAG). The team utilizes Google’s open-source Gemma and Gemma-2 models to develop fine-tuned versions tailored for both RIG and RAG.
Key Features of DataGemma
1. Data Commons is one of the largest unified repositories of public statistical data. It contains more than 240 billion data points across hundreds of thousands of statistical variables. The data is sourced from trusted organizations like the World Health Organization (WHO), the United Nations (UN), Centers for Disease Control and Prevention (CDC) and Census Bureaus.
2. RIG (Retrieval-Interleaved Generation) improves the capabilities of Gemma 2 by actively querying reliable sources and using information in Data Commons for fact-checking. When we ask DataGemma to generate a response, the model first identifies instances of statistical data and then retrieves the answer from Data Commons. Although the RIG methodology itself is well-established, the novelty lies in its use within the DataGemma framework.
3. RAG (Retrieval-Augmented Generation) allows language models to access relevant external information in addition to the training data, providing them with richer context and enabling more detailed, accurate responses. DataGemma implements this by utilizing Gemini 1.5 Pro’s extended context window. Before generating a response, DataGemma retrieves relevant information from Data Commons, reducing the likelihood of hallucinations and improving response accuracy.
Promising results
The initial results from using RIG and RAG are promising, though still in the early stages. The reseachers report significant improvements in the language models’ ability to manage numerical data, indicating that users are likely to encounter fewer hallucinations when applying the models for research, decision-making, or general inquiries.
Discover the first version of our scientific publication “Graphical user interface agents optimization for visual instruction grounding using multi-modal artificial intelligence systems” published in arxiv and submitted to the Engineering Applications of Artificial Intelligence journal. This article is already available to the public.
Most instance perception and image understanding solutions focus mainly on natural images. However, applications for synthetic images, and more specifically, images of Graphical User Interfaces (GUI) remain limited. This hinders the development of autonomous computer-vision-powered Artificial Intelligence (AI) agents. In this work, we present Search Instruction Coordinates or SIC, a multi-modal solution for object identification in a GUI. More precisely, given a natural language instruction and a screenshot of a GUI, SIC locates the coordinates of the component on the screen where the instruction would be executed. To this end, we develop two methods. The first method is a three-part architecture that relies on a combination of a Large Language Model (LLM) and an object detection model. The second approach uses a multi-modal foundation model.
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Discover the first version of our scientific publication “Benchmarking Open-Source Language Models for Efficient Question Answering in Industrial Applications” published in arxiv and submitted to the Engineering Applications of Artificial Intelligence journal. This article is already available to the public.
In the rapidly evolving landscape of Natural Language Processing (NLP),Large Language Models (LLMs) have demonstrated remarkable capabilitiesin tasks such as question answering (QA). However, the accessibility andpracticality of utilizing these models for industrial applications pose signif-icant challenges, particularly concerning cost-effectiveness, inference speed,and resource efficiency. This paper presents a comprehensive benchmarkingstudy comparing open-source LLMs with their non-open-source counterpartson the task of question answering. Our objective is to identify open-source al-ternatives capable of delivering comparable performance to proprietary mod-els while being lightweight in terms of resource requirements and suitable forCentral Processing Unit (CPU)-based inference. Through rigorous evalua-tion across various metrics including accuracy, inference speed, and resourceconsumption, we aim to provide insights into selecting efficient LLMs forreal-world applications. Our findings shed light on viable open-source al-ternatives that offer acceptable performance and efficiency, addressing thepressing need for accessible and efficient NLP solutions in industry settings.
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Discover the first version of our scientific publication “Low-cost deep language models: Survey and performance evaluation on Python code generation” published in arxiv and submitted to the Engineering Applications of Artificial Intelligence journal. This article is already available to the public.
Thanks to the Novelis research team – including Jessica López Espejel, Mahaman Sanoussi Yahaya Alassan, Merieme Bouhandi, Walid Dahhane, El Hassane Ettifouri – for their know-how and expertise.
“Large Language Models (LLMs) have become the go-to solution for many Natural Language Processing (NLP) tasks due to their ability to tackle various problems and produce high-quality results. Specifically, they are increasingly used to automatically generate code, easing the burden on developers by handling repetitive tasks. However, this improvement in quality has led to high computational and memory demands, making LLMs inaccessible to users with limited resources. In this paper, we focus on Central Processing Unit (CPU)-compatible models and conduct a thorough semi-manual evaluation of their strengths and weaknesses in generating Python code. We enhance their performance by introducing a Chain-of-Thought prompt that guides the model in problem-solving. Additionally, we propose a dataset of 60 programming problems with varying difficulty levels for evaluation purposes. Our assessment also includes testing these models on two state-of-the-art datasets: HumanEval and EvalPlus. We commit to sharing our dataset and experimental results publicly to ensure transparency.”
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Discover the application of AI for efficiently utilizing data from temporal series forecasts.
CHRONOS – Foundation Model for Time Series Forecasting
Time series forecasting is crucial for decision-making in various areas, such as retail, energy, finance, healthcare, and climate science. Let’s talk about how AI can be leveraged to effectively harness such crucial data. The emergence of deep learning techniques has challenged traditional statistical models that dominated time series forecasting. These techniques have mainly been made possible by the availability of extensive time series data. However, despite the impressive performance of deep learning models, there is still a need for a general-purpose “foundation” forecasting model in the field.
Recent efforts have explored using large language models (LLMs) with zero-shot learning capabilities for time series forecasting. These approaches prompt pretrained LLMs directly or fine-tune them for time series tasks. However, they all require task-specific adjustments or computationally expensive models.
With Chronos, presented in the new paper “Chronos: Learning the Language of Time Series“, the team at Amazon takes a novel approach by treating time series as a language and tokenizing them into discrete bins. This allows off-the-shelf language models to be trained on the “language of time series” without altering the traditional language model architecture.
Pretrained Chronos models, ranging from 20M to 710M parameters, are based on the T5 family and trained on a diverse dataset collection. Additionally, data augmentation strategies address the scarcity of publicly available high-quality time series datasets. Chronos is now the state-of-the-art in-domain and zero-shot forecasting model, outperforming traditional models and task-specific deep learning approaches.
Why is this essential? As a language model operating over a fixed vocabulary, Chronos integrates with future advancements in LLMs, positioning it as an ideal candidate for further development as a generalist time series model.
Multivariate Time Series – A Transformer-Based Framework for Multivariate Time Series Representation Learning
Multivariate time series (MTS) data is common in various fields, including science, medicine, finance, engineering, and industrial applications. It tracks multiple variables simultaneously over time. Despite the abundance of MTS data, labeled data for training models remains scarce. Today’s post presents a transformer-based framework for unsupervised representation learning of multivariate time series by providing an overview of a research paper titled “A Transformer-Based Framework for Multivariate Time Series Representation Learning,” authored by a team from IBM and Brown University. Pre-trained models generated from this framework can be applied to various downstream tasks, such as regression, classification, forecasting, and missing value imputation.
The method works as follows: the main idea of the proposed approach is to use a transformer encoder. The transformer model is adapted from the traditional transformer to process sequences of feature vectors that represent multivariate time series instead of sequences of discrete word indices. Positional encodings are incorporated to ensure the model understands the sequential nature of time series data. In an unsupervised pre-training fashion, the model is trained to predict masked values as part of an autoregressive denoising task where some input is hidden.
Namely, they mask a proportion of each variable sequence in the input independently across each variable. Using a linear layer on top of the final vector representations, the model tries to predict the full, uncorrupted input vectors. This unsupervised pre-training approach leverages the same labeled data samples, and in some cases, it demonstrates performance improvements even when compared to the fully supervised methods. Like any transformer architecture, the pre-trained can be used for regression and classification tasks by adding output layers.
The paper introduces an interesting approach to using transformer-based models for effective representation learning in multivariate time series data. When evaluated on various benchmark datasets, it shows improvements over existing methods and outperforms them in multivariate time series regression and classification. The framework demonstrates superior performance even with limited training samples while maintaining computational efficiency.
Have you ever had a lengthy conversation with a chatbot (such as ChatGPT), only to realize that it has lost track of previous discussions or is no longer fluent? Or you’ve faced a situation where the input limit has been exhausted when using language model providers’ APIs. The main challenge with large language models (LLMs) is the context length limitation, which prevents us from having prolonged interactions with them and utilizing their full potential.
Researchers from the Massachusetts Institute of Technology, Meta AI, and Carnegie Mellon University have released a paper titled “Efficient Streaming Language Models With Attention Sinks”. The paper introduces a new technique for increasing the input lengths of LLMs without any loss in efficiency or performance degradation, all without model retraining.
The StreamingLLM framework stores the initial four tokens (called “sinks”) in a KV Cache as an “Attention Sink” on the already pre-trained models like LLaMA, Mistral, Falcon, etc. These crucial tokens effectively address the performance challenges associated with conventional “Window Attention” in LLMs, allowing them to extend their capabilities beyond their original input length and cache size limits. Using the StreamingLLM framework can help reduce both the perplexity (which measures how well a model predicts the next word based on context) and the computational complexity of the model.
Why is this important? This technique expands current LLMs to model sequences of over 4 million tokens without retraining while minimizing latency and memory footprint compared to previous methods.
RLHF : adapt AI models with human input
Unlocking the Power of Reinforcement Learning from Human Feedback for Natural Language Processing
Reinforcement Learning from Human Feedback (RLHF) is a significant breakthrough in Natural Language Processing (NLP). It allows machine learning models to be refined using human intuition, leading to more contextually aware AI systems. RLHF is a machine learning method that adapt AI models (here, LLMs) using human input. The process involves creating a “reward model” based on human feedback, which is then used to optimize the behavior of an AI agent through reinforcement learning algorithms. Simply put, RLHF helps machines learn and improve by using the insights of human evaluators. For instance, an AI model can be trained to generate compelling summaries or engage in meaningful conversations using RLHF. The technique collects human feedback, often in the form of rankings or preferences, to create a reward model. This model helps the AI agent distinguish between good and bad outcomes and subsequently undergoes fine-tuning to align its behavior with the preferences identified in the human feedback. The result is more accurate, nuanced, and contextually appropriate responses.
OpenAI’s ChatGPT is a prime example of RLHF’s implementation in natural language processing applications.
Why is this essential? A clear understanding of RLHF is crucial to understanding the evolution of NLP and LLM and how they offer coherent, engaging, and easy-to-understand responses. RLHF helps AI models align with human values, providing answers that align with our preferences.
RAG : combine LLMs with external databases
The Surprisingly Simple Efficiency of Retrieval Augmented Generation (RAG)
Artificial intelligence is evolving rapidly, with large language models (LLMs) like GPT-4, Mistral, Llama, and Zephyr setting new standards. Although these models have improved interactions between humans and machines, they are still limited by existing knowledge. In September 2020, Meta AI introduced an AI framework called Retrieval Augmented Generation (RAG), which resolves some issues previously encountered by LMs and LLMs. RAG is designed to enhance the quality of responses generated by LLMs by incorporating external sources of knowledge and enriching the LLMs’ internal databases with accurate and up-to-date information. RAG is an AI system that combines LLMs with external databases to provide accurate and up-to-date answers to queries.
RAG has undergone continual refinement and integration with diverse language models, including the state-of-the-art GPT-4 and Llama 2.
Why is this essential? Reliance on potentially outdated data and a predisposition to generate inaccurate or misleading information are common issues faced by LLMs. However, RAG effectively addresses these problems by ensuring factual accuracy and consistency. It significantly mitigates the risks associated with data integrity breaches and dissemination of erroneous information. Moreover, RAG has displayed prowess across diverse benchmarks such as Natural Questions, WebQuestions, and CuratedTrec. This exemplifies its robustness and reliability. By integrating RAG, the need for frequent model retraining is reduced. This, in turn, reduces the computational and financial resources required to maintain LLMs.
CoT : design the best prompts to produce the best results
Chain-of-Thought: Can large language models reason?
This month, we’ve been diving into the fascinating world of language modeling and generative AI. Today, we’ll be discussing on how to better use these LLMs. Ever heard of prompt engineering? This is the field of research dedicated to the design of better prompts in order for the large language model (LLM) you’re using to return the very best results. We’ll be introducing one such prompt engineering technique: Chain-of-Thought (CoT).
CoT prompting is a simple method that very closely resembles the way in which humans go about solving complex problems. If a problem seems a little long or a little too complex, we often tend to break that problem down into smaller sub-problems that we can more easily reason about. Well turns out this method works pretty well when replicated within (really) large language models (like GPT, BARD, PaLM, etc.). Give the model a couple examples of similar problems, explain how you’d handle them in plain language and that’s all! This works great for arithmetic problems, commonsense, and symbolic reasoning (aka good ol’ fashioned AI like rule-based problem solving).
Why is this essential? Applying CoT prompting has the potential to produce better results when handling arithmetic, commonsense, or rule-based problems when using your LLM of choice. It also helps to figure out where your LLM might be going wrong when trying to solve a problem (though the why of this question remains unknown). Try it out yourself! Now does this prove that our LLMs can really reason? That remains the million-dollar question.
Discover the linguistic modeling technologies, and LLMs in particular. In two informative articles, our team of experts shared with you the existing technologies.
LLM (large language model) : type of artificial intelligence program that can recognize and generate text.
Language Modelling and Generative AI
This month’s focus is on language modeling, an innovative AI technology that has emerged in the field of artificial intelligence, transforming industries, communication, and information retrieval. Using machine learning methods, language modeling creates language models (LMs) to help computers understand human language, and it powers virtual assistants and applications like ChatGPT. Let’s take a closer look at how it works.
For computers to understand written language, LMs transform it into numerical representations. Current LMs analyze large text datasets, and, using statistical and probabilistic techniques, they use
the likelihood of a word appearing in a sentence to create the words’ vector representations. LMs are trained through pretraining tasks. Such a task could involve predicting a word based on its context
(i.e., its preceding or following words). In the sentences “X is a small feline” and “The X ate the mouse”, the model would have to figure out that the X refers to the word “cat”.
Once these representations are created, they can be used for different tasks and applications. One of these applications is language generation. The procedure for generating language using a language model is the following: 1) given the context, generate a probability distribution for the next token over all the tokens in the vocabulary; 2) pick the token with the highest probability; 3) add this token to the sequence, and repeat. A function that computes the performance loss of the model checks for correct responses and updates the model accordingly.
Why is this essential? All generative AI models, like ChatGPT, use these methods as the core foundation for their language generation abilities.
New models LLM models are being released every other day. Some of the most well-known models are the proprietary GPT (3.5 and 4) models, while others, such as LLaMa and Falcon, are open-source. Recently, Mistral released a new model made in France, showing promising results.
Optimization of large models : improve model efficiency, accuracy and speed
Unlocking LLM Potential: Optimizing Techniques for Seamless Corporate Deployment
Large Language Models (LLMs) have millions or billions of parameters. Consequently, deploying them for use in corporate tasks is a challenging task, given the limitation of resources within companies.
Therefore, researchers have been striving to achieve comparable or competitive performance from smaller models compared to their larger counterparts. Let’s take a look at these methods and how they can be used for optimizing the deployment of LLM in a corporate setting.
The initial method is called distillation. In distillation, we have two models: the student and the teacher. The student model is trained to replicate the statistical behavior of the teacher model, either focusing on the final predictions or the hidden layers of the model. The second approach, called quantization, involves reducing the precision or bit-width of numerical values, optimizing computational efficiency and memory usage. Lastly, pruning entails the removal of unnecessary or less critical connections, weights, or neurons to reduce the model’s size and computational requirements. The most well-known pruning technique is LoRA, a method crucial for achieving efficient and compact large language models.
Why is this essential? Leveraging smaller models to achieve comparable or superior performance compared to their larger counterparts offers a promising solution for companies striving to develop cutting-edge technology with limited resources.
Discover our scientific publication “GPT-3.5, GPT-4, or BARD? Evaluating LLMs reasoning ability in zero-shot learning and performance boosting through prompts” published in Elsevier and reviewed in ScienceDirect.
Thanks to the Novelis research team – notably Jessica López Espejel, Mahaman Sanoussi Yahaya Alassan, El Mehdi Chouham, El Hassane Ettifouri, Walid Dahhane – for their know-how and expertise.
Large Language Models (LLMs) have exhibited remarkable performance on various Natural Language Processing (NLP) tasks. However, there is a current hot debate regarding their reasoning capacity. In this paper, we examine the performance of GPT-3.5, GPT-4, and BARD models, by performing a thorough technical evaluation on different reasoning tasks across eleven distinct datasets. Our paper provides empirical evidence showcasing the superior performance of ChatGPT-4 in comparison to both ChatGPT-3.5 and BARD in zero-shot setting throughout almost all evaluated tasks. While the superiority of GPT-4 compared to GPT-3.5 might be explained by its larger size and NLP efficiency, this was not evident for BARD. We also demonstrate that the three models show limited proficiency in Inductive, Mathematical, and Multi-hop Reasoning Tasks. To bolster our findings, we present a detailed and comprehensive analysis of the results from these three models. Furthermore, we propose a set of engineered prompts that enhances the zero-shot setting performance of all three models.
Elsevier is a data analytics company that helps institutions, health and science professionals improve their performance for the benefit of humanity.
ScienceDirect is the world’s leading source for scientific, technical and medical research.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.