R&D Engineer in Artificial Intelligence and NLP M/F

[Webinar] Facilitate and accelerate GDPR compliance with data anonymization

Technological advancements (connected objects, development of 5G) have made the exchange of massive data in our society more fluid. Today, this data represents a real wealth in terms of the quantities of information that could be used to analyze political climates, predict crises, and improve services, products, and processes, for example. This phenomenon of massification and circulation of data thus raises the question of the risk of privacy violations due to the exposure of personal data.

According to the latest report by the American giant McAfee, a survey of cloud service users shows that : 

  • 91% of respondents do not encrypt inactive data,  
  • 87% do not delete data immediately after closing an account. 

While GDPR currently requires companies to do everything possible to secure their personal data without risking heavy fines, anonymization is not a general obligation. But this technique, coupled with AI and automation, is increasingly being seen as the most effective means of compliance.

Anonymization will allow companies to continue processing personal data while respecting the rights and freedoms of individuals, thus significantly reducing their exposure to potential attacks. This also strengthens system security and reduces the risk of data theft, as once anonymized, the data has no value.

Where previously GDPR could be a constraint around data, it now becomes an opportunity to better protect oneself.

Former RGPD lawyer & certified DPO – CEO & Co-founder of Dipeeo, Raphaël Buchard, will give us the keys to stay RGPD compliant. 

Our technical and business experts at Novelis – Sanoussi Alassan, Data Scientist and Raphaël Brunel, Data Analyst – will talk about the technical solution we propose: data anonymization coupled with AI for structured data processing and automation for unstructured data processing.  

On the agenda of this webinar:  

  • RGPD & Compliance  
  • Presentation of use cases 
  • Knowing the different anonymization methods and equipping yourself with a professional solution  
  • Demonstration

Our best 2022 content: practices and feedback on intelligent process automation

Client testimonials, white papers, articles, webinars… Throughout the year, Novelis teams have created a lot of content to share with you the best practices and feedback on intelligent process automation. In this article, you’ll find our most popular content for 2022 to kick off 2023 and identify the levers that will boost your operational efficiency!

BLOG – White papers, articles, interview…

Anonymization of sensitive data by the combined approach of NLP and neural models: “Data exploitation is more than ever a major issue within any type of organization […] Pseudonymization/anonymization thus appears to be an indispensable technique for protecting personal data and promoting compliance with regulations.”

How can Process Intelligence tools be a springboard to your operational efficiency objective?: “The lessons learned from a Process Intelligence solution allow organizations to base their strategy for improving the operational efficiency of processes on an in-depth analysis of historical data and not only on qualitative interviews.”

[WHITE PAPER] How automation can help you overcome customer relationship challenges: “Consumer expectations have changed and customer experience has become a major differentiator, especially since its quality is increasingly measurable and comparable. […] Novelis offers you to discover the benefits in its white paper “How automation can help you overcome customer relationship challenges” divided in three parts…”

[USE CASES] RPA: Tasks with high automation potential in finance: “The digital revolution is changing the face of the financial sector, regardless of the business line: treasury, management control, accounting, finance management, etc. Transforming to innovate is becoming an obligation for these players, who must be ever faster, more reliable and more efficient in the execution of processes.”

[INTERVIEW] How APICIL Épargne decided to launch a major project to modernize, containerize and urbanize its information system: “In order to become the French leader in life insurance, APICIL Épargne decided to launch a major project to modernize, containerize and urbanize its information system. It is in this context that Novelis has been supporting APICIL Épargne for 4 years in their digital transformation on strategic subjects.”

Novelis wins Blue Prism 2022 Best AI & Cloud Innovation Solution Award with SmartRoby: “During the Partner Forum 2022 organized by Blue Prism on May 24th, Novelis has been awarded for its Automation as a Service solution SmartRoby, recognized as the best Solution of the Year in the AI & Cloud Innovation – EMEA & Global category by the leading RPA vendor.”

[USE CASES] RPA: tasks with high automation potential in insurance and for mutual: “Insurance and mutual insurance companies are facing new issues and challenges every day. RPA provides an answer to these challenges, making it a truly essential solution for these insurance and mutual organizations, which have a wide range of processes with high automation potential.”

REPLAYS – Rediscover our webinars

[Event] Novelis and NICE partners of the CX Paris All Verticals event: This CX Paris All Vertica edition will focus on the experience economy and will highlight the different levels of maturity in customer experience within different industries: banking, insurance, retail, BtoB, public services, luxury, automotive, energy… 

[Webinar] Cybersecurity: how to gain efficiency through automation?: Novelis invites you to discover how automation can become an essential operational efficiency lever for your cyber teams.

[Webinar] Customer success story CMB Monaco – Compliance and automation: the winning duo: Come and discover how to accelerate and make your compliance strategy more reliable with automation through the experience of our client CMB Monaco.

[Webinar] RPA: a solution to the challenges of the insurance industry: In this session, learn about RPA and insurance industry use cases as well as the key success factors of an automation program.

[Webinar] Accelerate your Process Automation by 30% with Process Intelligence: Discover a unified solution that combines process intelligence with automation dedicated to process exploration, optimization and monitored execution of automated processes.

Yolov7: Artificial Intelligence for real-time object detection in an image

In this article we will discover the Yolov7 model, an object detection algorithm. We will first study its use and its characteristics through a public dataset. Then we will see how to train this model ourselves from this dataset. Finally, we will train Yolov7 to identify custom objects from our own data.

What is Yolo? Why Yolov7 ?

Yolo is an algorithm for detecting objects in an image. The goal of object detection is to automatically classify, using a neural network, the presence and position of humanly identifiable objects in an image. The interest is therefore based on the capacities and performances in terms of detection, recognition and localization of the algorithms, which have multiple practical applications in the image domain. Yolo’s strength lies in its ability to perform these tasks in real time, which makes it particularly useful with video streams of tens of images per second.

YOLO is actually an acronym for “You Only Look Once”. Indeed, unlike many detection algorithms, Yolo is a neural network that evaluates the position and class of identified objects from a single end-to-end network that detects classes using a fully connected layer. Yolo therefore only needs to “see” an image once to detect the objects present, where some algorithms only detect regions of interest, before re-evaluating these to identify the classes present. Before mentioning the other versions of Yolo, it seems important here to explain the different metrics used to compare the accuracy and efficiency of a model.

Intersection over Union : IoU

Intersection over Union (literally Intersection over Union, or IoU) is a metric for measuring the accuracy of object location. As its name indicates, it is calculated from the ratio between the intersection area of the detected object and the union area of these same objects (see equation 1). By noting Adetected and Aactual the respective areas of the object detected by YOLO and the object as actually located on the image, we can then write :

Note that an IoU of 0 indicates that the 2 areas are completely distinct and that an IoU of 1 indicates that the 2 objects are perfectly superimposed. In general, an IoU > 0.5 represents a valid localization criterion.

(mean) Average Precision : mAP

Average Precision is a classification accuracy metric. It is based on the average of the correct predictions over the total predictions. So we try to get closer to a 100% mAP score (no error when determining the class of an object).

Coming back to our previous point, Yolo remains an architecture model, and not the property of a particular developer. This explains why the versions of Yolo are from different contributors. Indeed, we increment the version of Yolo (Yolov7 to date: January 2023) each time the previously mentioned metrics (especially the mAP and its associated execution time) clearly exceed the previous model and thus the state of the art. Thus, each new YolovX model is actually an improvement shown by an associated research paper published in parallel.

How does Yolo work?

Yolo works by segmenting the image that it analyzes. It will first grid the space, then perform 2 operations: localization and classification.

Figure 1: Architecture of the Yolo model, operating a grid from successive convolutions
Figure 2: Gridded image

First, Yolo identifies all the objects present with the help of frames by associating them a degree of confidence (here represented by the thickness of the box).

Figure 3: Location of objects

Then, the algorithm assigns a class to each box according to the object that it believes it has detected from the probability map.

Figure 4: Class probability map
Figure 5: Object detection

Finally, Yolo removes all unnecessary boxes using the NMS method.

NMS : Non-Maxima Suppression

The NMS method is based on a path of the high confidence index boxes, then a removal of the boxes superimposed on those by measuring the IoU. For this, we follow 4 steps. Starting from the complete list of detected boxes:

  1. Remove all boxes with a low confidence index.
  2. Identification of the box with the highest confidence index.
  3. Deleting all boxes with too large IoU (i.e. all boxes too similar to our reference box).
  4. Ignoring the reference box thus used, repeat steps 2) and 3) until all boxes in our original list have been eliminated (i.e. taking the 2nd largest confidence index box, then the 3rd, etc.).

We then obtain the following result:

Figure 6: Post-NMS output image showing the objects detected by Yolo

How to use Yolov7 with the COCO dataset ?

Now that we have seen the Yolo model in detail, we will study its use with an image database: the COCO dataset. The MICROSOFT COCO dataset (for Common Objects in COntext), more commonly called MS COCO, is a set of images representing common objects in a common context. However, unlike the usual databases used for object detection and recognition, MS COCO does not present isolated objects or scenes. Indeed, the goal when creating this dataset was to have images close to real life, in order to have a more robust training base for classical image streams, reflecting daily life.

Figure 7: Examples of isolated objects
Figure 8: Examples of isolated scenes
Figure 9: Classical scenes of everyday life.

Thus, by training our Yolov7 model with the MS COCO dataset, it is possible to obtain a recognition algorithm of nearly a hundred classes and categorizing the majority of objects, people and elements of everyday life. Finally, MS COCO is today the main reference for measuring the accuracy and efficiency of a model. To get an idea, below are the results of the different versions of Yolo.

Figure 10: Average Precision (AP) versus analysis time per image

On the x-axis, the times given to the networks to evaluate an image are indicated. The lower the time, the more we can afford to send a large flow of images to our algorithm, at the cost of accuracy. On the ordinate, the average accuracy of the models is indicated as a function of the time allowed, as seen previously.

We then notice 3 important points:

  1. Regardless of the time given to the network, Yolov7 outperforms the other Yolo models in terms of detection accuracy on the MS COCO dataset. This explains its presence as a reference in the current state of the art of real-time image-based object detection.
  2. The increase of the inference time on each image has no/few interest once the 30ms/image is exceeded. This implies that the model is more optimal on a use requiring fast image processing, such as a video stream (> 25 fps).
  3. Regardless of the model concerned, none of them exceeds 57% of detection accuracy. This implies that the model is still far from being able to be used reliably in a public setting.

To obtain the above results yourself, just follow the instructions on the GitHub page of the yolov7 model pre-trained from the MS COCO dataset: https://github.com/WongKinYiu/yolov7.

First, follow the heading :

  • Installation.

Then the sidebar:

  • Testing.

How to train Yolov7 ?

Now that we have seen how to test Yolov7 with a dataset on which it is trained, we are going to look at how we can train Yolov7 with our own dataset. We will first start a training with already prepared data, here the MS COCO dataset. Again, the Yolov7 GitHub has a specific insert for this purpose:

  • Training.

It is broken down into 2 simple steps:

  1. Download the already annotated MS COCO dataset.
  2. Launch the script ” train.py ” intrinsic to the Git directory with the dataset previously downloaded.

This one will then run on 300 steps to conform to the MS COCO dataset. It should be noted that in reality this operation has more of an instructive purpose since Yolov7 is already trained on the MS COCO dataset and thus already has an adequate model.

Prepare your own training data

Now that we have seen what Yolov7 is, how to test it and train it, we just have to provide it with our own image base to train it on our use case. We will therefore follow 4 steps to create our own dataset directly usable to train Yolov7 :

  1. Choice of our image database.
  2. Optional: Labeling of all our images.
  3. Preparation of the launch (use case of Google Collab).
  4. Training (and split operation).

To illustrate the sequence of these operations, we will take a case similar to the Novelis work used on AIDA: the detection of elements drawn on a sheet of paper.

Figure 11: Starting image: a handwritten color drawing on a sheet

To start, we will need to get a sufficient quantity of similar images. Either from our own collection, or by using a pre-existing database (for example by taking the dataset of our choice from this link. On our part, we will use the Quick Draw dataset. Once our database is formed, we will annotate our images. For that, many softwares exist, most of the time allowing to create boxes, or polygons, and to label them as a class. In our case, our database is already labeled, otherwise we would have to create a class for each element to be detected, and then identify by hand on each image the exact areas of presence of these classes. Once our dataset is labelled, we can launch a session on Google Colab and start a new Python Notebook. We will call it “MyYolov7Project.ipynb” for example.

First step: copy your dataset in your drive. In our case, we have already added to our drive a folder “Yolov7_Dataset”. Here is the tree structure of the folder:

Figure 12: Tree structure of the “Yolov7_Dataset” folder

For each folder, there is an images folder, containing the images, and a labels folder containing the associated labels generated previously. In our case, we use 20 000 images in total, including 15 000 for training, 4 000 for validation and 1 000 for testing.

The data.yaml file contains all the paths to the :

Then the characteristics of the classes:

We will not show the 345 classes in detail but they should be present in your file. We can now start our script “MyYolov7Project.ipynb” on Colab. First step, link our Drive to Colab in order to save our results (Be careful : the data of the trained network are voluminous).

Once our Drive is linked, we can now clone Yolov7 from the official Git:

By placing us in the installed folder, we check the prerequisites:

We will also need the sys and torch libraries.

We can then run the training script for our network:

Note that the batch size can be modified according to the capacities of your GPU (with the free version of Collab, 16 is the maximum possible). Don’t forget to modify your path to the “data.yaml” file according to the tree structure of your Drive. At the end of the training, we get a file with the training metrics and a trained model on our database. By launching the detection script (detect.py), we can obtain the detection result on our starting image:

Figure 13: Starting image annotated by Yolov7

As we can see, some elements were not detected (the river, the grass in the foreground) and some were mislabeled (the two mountains perceived as volcanoes, probably due to the sunlight passing by). Our model can therefore be further improved, either by refining our database or by modifying the training parameters.

Optional: Split network training (when using the free version of Google Colab)

Although our use case remains simplistic, when using the free version of Google Colab, the training of our network can take several days before being completed. However, the restrictions of Google Colab (free version) prevent a program to run more than 12 hours. To keep the training, you just have to restart it after stopping a session with our last recorded weight as a parameter (weights):

Here is an example launched with the 8th run (replace the folder “yolov78” by the last training done). You can find all your trainings in the associated folder in the Yolov7 tree.

Figure 14: Training tree. Here we are at the 12th launch

The training then resumes from the last epoch used, and allows you to progress without losing the time previously spent on your network.

References:

Decompartmentalise feedback processing with AI

Combining OCR, RPA and AI to solve a complex process

Novelis at the Ecole Polytechnique Féminine (EPF) for Research Day

Research Day at EPF: organized for 20 years, this day is dedicated to research and innovation.

On the occasion of the EPF Research Day, Novelis will be present at the school to host a round table on innovation in digital technology. Following this presentation, students will be able to meet our team at our stand and learn more about the work of Novelis’ internal R&D laboratory by talking directly with members of the research and recruitment teams.

At Novelis, we aim to use new technologies to meet our client’ business needs and thus offer them adapted solutions to support them in their digital transformation.
This is reflected in our R&D Lab, in which we invest more than 25% of our revenue. Our doctoral researchers work daily on fundamental and experimental research around AI (machine learning, image processing and NLP) with the objective of exceeding the state of the art in AI and NLP.

We are very proud to invest in scientific research to help build our future, so we are delighted to be able to share the results of our work with the students of the EPF engineering school.

Novelis wins Blue Prism 2022 Best AI & Cloud Innovation Solution Award with SmartRoby

During the Partner Forum 2022 organized by Blue Prism on May 24th, Novelis has been awarded for its Automation as a Service solution SmartRoby, recognized as the best Solution of the Year in the AI & Cloud Innovation – EMEA & Global category by the leading RPA vendor.

In 2021, Novelis already received the Business Solution of the Year award with SmartRoby. This year, for the second time in a row, we received the regional and global award from Blue Prism, recognizing our positive impact on a client’s business through the innovative use of Artificial Intelligence and Intelligent Automation in the Cloud.

We have designed SmartRoby, an Automation as a Service solution, to give human-sized organizations easier access to cutting-edge RPA and Intelligent Automation technologies. Complementary to Blue Prism offers, our solution is available on AWS, DX, OVH and On-Prem, entirely self-service and connected to Blue Prism. It offers a business-oriented interface with a pricing model based on the actual consumption of digital workers, to drive and control all automated processes of an organization. This allows organizations to implement an automation solution in a matter of weeks and at a lower cost.

It also offers other advantages to go further than traditional automation with access to AI and NLP algorithms. But also more autonomy for business teams to manage a set of features autonomously without depending on IT teams. The business is thus able to easily follow the impact of automation within the organization thanks to quantified reports, time saved and ROI. Putting the focus on the business side and being able to measure the impact makes it much easier to go further.

A year ago, when we released our SmartRoby solution, we already wanted to make automation accessible to all organizations regardless of their size. We still believe that as a digital player it is our duty to give all companies access to solutions like SmartRoby, which digitally transform the way organizations operate.

Winning the Partner Excellence Awards 2022 in the “AI & Cloud Innovation” category is a true recognition of SmartRoby’s innovative character and rewards the investments made on this platform. This award also highlights the strong partnership we have with Blue Prism, a leader in robotic process automation (RPA). The synergies are huge, we are talking about a major global digital player with whom we are building bridges between BPM and automation.

“From day one, we understood the power of the solutions offered by Blue Prism.SmartRoby completes this offer with an out-of-the-box solution regardless of the type or size of the organization: our goal is to democratize access to an amazing technology like Blue Prism. In a few words, SmartRoby is the platform that gives access to automation easily and quickly.” says Mehdi Nafe, CEO & Co-Founder of Novelis.

Doctor in AI/ML/NLP – M/F

Lab R&D – Permanent Contract – Paris – PhD

Artificial intelligence and medical science: what disruptions for tomorrow?

Artificial intelligence or AI is a multidisciplinary computer science that can predict conclusions without any direct human intervention.

To do this, it needs :

  • Several algorithms;
  • Deep & Machine Learning;
  • Heuristics ;
  • Correspondence models;
  • Cognitive computing.

AI was primarily designed to provide solutions to complex problems that humans are not 100% capable of solving.

How has AI made its way into medicine today and how will it revolutionise the world tomorrow?

Artificial Intelligence and medicine: a promising start

AI is now being used in a number of areas ranging from agriculture to automotive, medicine and healthcare services. It took several years before AI and e-health knowledge engineering could be proven in medicine and biology. Some AI-based projects related to healthcare have even attracted more investment than those in other sectors of the global economy.

40% of pharmaceutical and life sciences companies say they have already deployed AI technologies and are satisfied with the results.

Quite encouraging.

But what is the reality? Hospitals and clinics often have vast amounts of medical data at their disposal. But how do we process all the data intelligently without the risk of missing out on information that is crucial to the quality of care?

When confronted with too much information, information overload leads to misinformation and this can lead to dysfunctional decision-making throughout the organisation.

AI will thus be able to intervene in the process by modelling and analysing data to predict diseases and find cures, notably thanks to: innovative treatment materials; the estimation of life expectancy; the speed of diagnoses; and finally the rapid understanding of correlations between certain factors and our health conditions.

From the data and with the help of Deep & Machine Learning, AI-based computing power can predict significant trends.

How is AI used in medicine today?

AI-based solutions in the medical field are growing rapidly and are above all very diverse. Here are some classic examples:

  • Automatic appointment scheduling;
  • Registration in medical centres;
  • Digitisation of medical contracts and records;
  • Automatic vaccine reminders for children and pregnant women;
  • Algorithms for personalising drug doses;
  • Or improved genomic editing.

Let’s focus on the 4 main applications of AI in medicine today, which have been very successful in recent years:

  1. Automatic diagnosis of diseases: With the growing progress of deep learning, diagnosing a disease is now easier and faster. Powerful models can now detect complex diseases such as cancers or ophthalmic pathologies with the same precision and accuracy as medical professionals.
  2. Rapid production of medicines: The pharmaceutical and drug distribution industry remains one of the most expensive economic sectors, both for states and citizens. But with the advent of AI in the analytical processes of drug manufacturing, data processing is becoming more efficient, saving hundreds of millions in investment and years of work.
  3. Personalize the treatment: The personalization of treatments is a very complex statistical work that AI manages to automate. Thanks to this work, it will be able to help better understand and anticipate the reactions of patients to a given treatment. Thus, with the analysis of all these characteristics, the algorithm is able to predict the treatment best suited to the patient according to his pathology.
  4. Improving gene editing: With AI, the development of RNA (Ribonucleic Acid) has accelerated, allowing impressive genomic editing by introducing genetic material into cells. The use of genome editing implies unprecedented possibilities for treating certain diseases.

What future for Artificial Intelligence and medicine?

The healthcare sector is evolving as AI and Machine Learning gain popularity. Studies show that spending on AI in the medical sector is expected to increase by 48% at an annual rate between 2017 and 2023.

Several predictions can be made about the impact that AI will have on healthcare in the near future. Here are some of them:

  • Integrating the mind with the machine: AI-supported brain-machine interfaces (BMIs) may soon be able to enhance motor function in some patients. Controlling the body by thought would be a significant advance in the world of AI and health.
  • Better radiology tools: AI-enhanced radiology tools will provide sufficient accuracy to replace tissue samples in the near future.
  • Electronic Health Records 4.0: EHRs allow to compile all data (social, clinical, psychological…) of a patient to determine the risk of disease and find a possible treatment as soon as possible. However, it is sometimes difficult for a doctor to analyse all the data efficiently and draw up an assessment quickly. AI comes into play here by automating the completion of EHRs and helping to reliably predict disease risk by identifying hidden connections between data sets.

According to BMC Medical Informatics and Decision Making, an AI was able to analyse the clinical notes of 55,516 EHRs comprising 150,990 notes and identified 3,138 prostate cancer patients in just 8 seconds. 8 seconds! Imagine how long it would take a human to achieve the same result?

  • Reducing the risk of antibiotic resistance: EHR data could also be used to identify and anticipate infection patterns and warn patients at risk, even before they develop symptoms.
  • More accurate analysis of pathology images: Since AI is able to scan images to the nearest pixel, researchers may be able to identify details and nuances invisible to the human eye.
  • Further use of immunotherapy in cancer treatment: AI will be able to analyse a complex set of data about a patient’s unique genetic make-up. The ultimate goal would then be to target the ideal therapy to eradicate the disease as quickly as possible.
  • A mobile lab for quick and easy diagnosis: Ideal for people who are isolated or unable to travel to hospitals or clinics. How does it work? Simply enter the patient’s symptoms on the connected mobile device. The “mini lab” will analyse the data and announce the samples that need to be taken (saliva, urine, blood, etc.). The analyses will then be sent to a health professional who will be able to make a complete diagnosis quickly and remotely.

The next few years will be crucial for Artificial Intelligence and medicine. If technical advances continue to progress at this rate, the growth of innovative technologies in medicine will be pharaonic!