Showing posts with label AI. Show all posts
Showing posts with label AI. Show all posts

Monday, August 30, 2021

Building Custom Language Translation Model using Azure Translator Services

This video talks about what is the custom translator along with it's need. It also explains, how to create a completely customized language translator model with personalized training data and how to deploy the model in multiple Azure regions. For validating the model, C# console application was created.

Thursday, July 22, 2021

Which Azure AI Service to Select and Why ?

This video talks about which Azure Artificial Intelligence service to select and for which purpose, along with few example scenarios.

Thursday, July 15, 2021

Creating And Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

In a previous article, we saw how one can utilize a prebuilt model to read data from a sales receipt. In this article, we will learn to create our own ML model, train it, and then extract information from a sales receipt. Here custom model means a model which is completely tailored to meet a specific need or a use case.

Steps involved

To perform this end-to-end workflow, there are 4 major steps.

Step 1 - Create Training Dataset

For training a model we need at least 5 documents of the same type, which means if we are planning to analyze receipts, then we need at least 5 samples of the sales receipts. If we are planning to extract data from a business card, then we need to have at least 5 samples of a business card, and so on and these documents can be either text or handwritten.

Step 2 - Upload Training Dataset

Once the training documents are collected, we need to upload that to Azure Storage. To perform this step, one should have Storage Account created on the Azure portal and one can upload images in the container using the below steps,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

The above screenshot will guide you on how to create a container named receipts. Once the container is created successfully, documents can be uploaded to the newly created container by clicking on the Upload button as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

The below screenshot depicts the five images uploaded to the container.

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Once we collected the training data, we need to make a decision on whether we are going with supervised learning or unsupervised learning. In the case of supervised learning, we have to label our training data, which means along with sample training data, we should have additional files to hold information about OCR and Labels.

Step 3 - Running the OCR and Labelling the Training Dataset

For data labeling and training, I’m using Form Recognizer Sample Labeling Tool, which is available online on FOTT website.

Once the web page is opened, one needs to click on the New Project shown in the center of the screen and it will open up a new page as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Adding a New Connection

Clicking on the button Add Connection will open up a new page, wherein we need to provide SAS URI. To obtain SAS URI, we need to open the same Azure Storage resource and get the SAS generated as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Getting Form Recognizer Service URI and Key

To get the URI and Key, we need to open up the Azure Form Recognizer resource and copy the required fields as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Once the project is saved successfully, you will notice that all the blob objects are loaded on the left-hand side as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Running the OCR

Next, we need to run the OCR for all 5 documents. Doing this will mark the identified text areas in a yellow rectangle and the respective coordinates will be saved in a new file having a name ending with .ocr.JSON. These marks can be changed and rectified if required. Once this process is completed, you will notice that container is updated with new files as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Constructing the Tag List

After running the OCR, next, we need to construct the tag list and this can be done by clicking the button on the right as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

This will allow us to add all the required tags as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Labeling the Dataset

When it comes to labeling, we have to perform this for all the training documents. For this, select the text on receipt and then click on the corresponding tag on the right side. On doing so, values got added to the respective tag. On completion, it would look something like this,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Before moving ahead, we need to verify whether labeling is done for all the documents and this can be done by looking at our container. If everything went well, then you will notice that new files ending with labels.json got added as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Step 4 - Training the Model

To train the model, we need to click on the Train button shown on the left side as,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

On completion of the training process, the complete summary will be shown as below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

On the bottom right, you can see Average accuracy, which tells how our model behaved with the given training set. If this figure is not satisfactory then we can add more documents to the training dataset and re-visit the labeling step.

Step 5 - Testing the Model

This is the very important step wherein we need to test our model and see how it is performing on test data. In this step, we need to write few lines of Python code, which will use our training dataset and model id to perform this testing. Here is the code:

import json
import time
from requests import get, post
endpoint = "FORMRECOGNIZER_ENDPOINT"
key = "FORMRECOGNIZER_KEY"
model_id = "MODEL_ID"
post_at = endpoint + "/formrecognizer/v2.0/custom/models/%s/analyze" % model_id
input_image = "IMAGE_TO_TEST"
headers = {
    'Content-Type': 'image/jpeg',
    'Ocp-Apim-Subscription-Key': key,
}

f = open(input_image, "rb")
try:
    response = post(url = post_at, data = f.read(), headers = headers)
    if response.status_code == 202:
        print("POST operation successful")
    else:
        print("POST operation failed:\n%s" % json.dumps(response.json()))
        quit()

    get_url = response.headers["operation-location"]
except Exception as ex:
    print("Exception details:%s" % str(ex))
    quit()


response = get(url = get_url, headers = {"Ocp-Apim-Subscription-Key": key})
json_response = response.json()
if response.status_code != 200:
    print("GET operation failed:\n%s" % json.dumps(json_response))
    quit()
status = json_response["status"]
if status == "succeeded":
    print("Operation successful: %s" % json.dumps(json_response))
    quit()
if status == "failed":
    print("Analysis failed:\n%s" % json.dumps(json_response))
Python

On execution of the above code, you will see JSON output with a confidence score.

Summary

In this article, we had seen how to analyze a sales receipt with a customized ML model. To know, all the steps in detail, I would recommend you to watch the complete demonstration on my channel.

Tuesday, July 6, 2021

Extract Text from Sales Receipt using Pre-Built Model - Azure Form Recognizer

Nowadays, where almost everything is turning to online and virtual modes, a very common problem any organization is facing is the processing of receipts that were scanned and submitted electronically for reimbursement purposes. 

Now for any claim or reimbursements to get clear, first those must reach to proper accounts department based on the organization and the sector, and one way to perform this activity is by manual intervention. A person or a team must go through all those digitally scanned receipts manually and filter them based on the departments or any other validation and eligibility criteria they may have.

The situation becomes more tragic when the volume of such scanned receipts is too high. So, get rid of this manual effort, a lot many organizations have already opted for a solution that is AI-based, and lot many are in a process of doing so.

Definitely, one can go for OCR, which is short for Optical Character Recognization technologies to extract data but here the problem is not only about data extraction, but it is also about data interpretation. Because there could be an incident, wherein the user uploaded a wrong document altogether, which is not a receipt. So, the solution should be robust enough to filter out these scenarios.

How can AI-based solutions be achieved?

Like many other Azure services, here also we can utilize a service named Form Recognizer, which consists of intelligent processing capabilities and allow us to automate the processing of forms and receipts. Basically, it is a combination of OCR and predictive models, which in turn falls under the umbrella of Azure Cognitive Services.

Here OCR will work on text extraction and models will help us to filter the useful information, like invoice date, address, amount, description, name or could be any other relevant field, which business demands.

What all models are supported by Form Recognizer?

Form Recognizer supports two types of models: Pre-built and Custom models.

  • Prebuilt – Are the ones, which are provided out-of-box and are already trained with some basic sales data based on USA sales format.
  • Custom Models – Are the ones, which can be tailored based on our needs with our own data and business needs.

So, in this article, I’ll be focusing on the pre-built models and will cover custom model integration as part of another article.

How to get started with Form Recognizer?

The very first thing, we need is login to the Azure portal at portal.azure.com to create Azure Resource. There are two ways to create Azure resources.

  • Using Azure Form Recognizer
  • Using Azure Cognitive Services

If anyone is planning to use other services under Cognitive Services, then existing/new resources can be used. But if one needs to work only with Form Recognizer Service, then also it can be done.

Implementation Details

For development, I'm using Python as a language and Visual Studio Code having Jupyter Notebook. Here is the core implementation:

key = "KEY_TO_BE_REPLACED"
endPoint = "ENDPOINT_TO_BE_REPLACED"

import os
from azure.ai.formrecognizer import FormRecognizerClient
from azure.core.credentials import AzureKeyCredential

client = FormRecognizerClient(endpoint = endPoint, credential = AzureKeyCredential(key))
image = "IMAGE_FILE_PATH"
fd = open(image, "rb")

analyzeReceipt = client.begin_recognize_receipts(receipt = fd)
result = analyzeReceipt.result()

print('Address: ', result[0].fields.get("MerchantAddress").value)
print('Contact Number: ', result[0].fields.get("MerchantPhoneNumber").value)
print('Receipt Date: ', str(result[0].fields.get("TransactionDate").value))
print('Tax Paid: ', result[0].fields.get("Tax").value)
print('Total Amount Paid: ', result[0].fields.get("Total").value)

items = result[0].fields.items()
for name, field in items:
    if name=="Items":
        for items in field.value:
            for item_name, item in items.value.items():
                print(item_name, ': ', item.value)

Sample Input and Output

I've taken the below receipt as an input,















and for this, above code generated below output:






Summary

This article mentions high-level steps of how one can use a pre-built ML model to read information from a sales receipt, with an assumption that the reader is already aware of how to use Python, VS Code, Jupyter Notebook along with how to import Python modules. But if you are new to any of these, I would recommend you to watch my below video explaining this article from start to end.

Thursday, July 1, 2021

Getting Started with Reading Text from an Image using Azure Cognitive Services

In this article, we will learn about how we can read or extract text from an image, irrespective of whether it is handwritten or printed.

In order to read the text, two things come into the picture. The first one is Computer Vision and the second one is NLP, which is short for Natural Language Processing. Computer vision helps us to read the text and then NLP is used to make sense of that identified text. In this article, I’ll mention specifically about text extraction part.

How Computer Vision Performs Text Extraction

To execute this text extraction task, Computer Vision provides us with two APIs:

  • OCR API
  • Read API

OCR API, works with many languages and is very well suited for relatively small text but if you have so much text in any image or say text-dominated image, then Read API is your option.

OCR API provides information in the form of Regions, Lines, and Words. The region in the given image is the area that contains the text. So, the output hierarchy would be - Region, Lines of text in each region, and then Words in each line.

Read API, works very well with an image, that is highly loaded with text. The best example of a text-dominated image is any scanned or printed document. Here output hierarchy is in the form of Pages, Lines, and Words. As this API deals with a high number of lines and words, it works asynchronously. Hence do not block our application until the whole document is read. Whereas OCR API works in a synchronous fashion.

Here is the table depicting, when to use what:

OCR API

Read API

Good for relatively small text

Good for text-dominated image, i.e Scanned Docs

Output hierarchy would be Regions >> Lines >> Words

Output hierarchy would be Pages >> Lines >> Words

Works in a synchronous manner

Works in an asynchronous manner.

 Do watch out my attached video for the demo and code walkthrough:

Monday, June 21, 2021

Understanding Recall and Precision in Simplest Way

This video talks about how to evaluate any Machine Learning classification model and what all matrices are available to do so. It contains very simple to follow examples along with calculations and brief overview of confusion matrix.

Monday, June 14, 2021

Building End-to-End Custom Image Classifier using Azure Custom Vision

This video talks about the complete flow of image classification, starting from image collection, image tagging/labelling, training classification model, evaluating the model, and testing the predictions using Prediction API. This complete flow relies on Azure Custom Vision service.

Detecting Vehicles in an Image using Azure Custom Vision Service

This video explains, how to start with object detection using Azure custom vision service. It talks about how one can detect cars and buses from any valid image along with some of the best practices about training the model. One will also learn about some of the limitations of detection API.

Monday, May 31, 2021

Getting Started with Image Analysis using Azure Cognitive Services

This is the very first video of this series and talks about what is image analysis, what image analysis can do, how to use Azure Cognitive Services to analyze any image, followed by C# code to achieve the image analysis results.