Tuesday, September 14, 2021

Extracting Sensitive Information from Document using Azure Text Analytics

This video explains the basics of NER (Named Entity Recognition) - PII (Personal Identifiable Information) and how it can be used to redact the sensitive/confidential information before passing it to next stage. It also includes code walk through and the Azure Text Analytics instance creation.

Tuesday, September 7, 2021

Monday, August 30, 2021

Building Custom Language Translation Model using Azure Translator Services

This video talks about what is the custom translator along with it's need. It also explains, how to create a completely customized language translator model with personalized training data and how to deploy the model in multiple Azure regions. For validating the model, C# console application was created.

Friday, August 27, 2021

Translate Document from One Language To Another - Azure Cognitive Services

In this article, I’m going to write about another interesting Azure-based service named Translator, which falls under the umbrella of Azure Cognitive Services. This service helps us to translate documents from one language to another and at the same time, it retains the formatting and the structure of the source document. So, let’s say, if any text in the source document is in italics, then the newly translated document, will also have the text in italics.

Key Features of Translator Service

Let’s have a look at a few of the key features, of the Translator service,

  • Auto-detection of the language of the source document
  • Translates large files
  • Translates multiple files in a shot
  • Preserves formatting of the source document
  • Supports custom translations
  • Supports custom glossaries
  • Supported document types – pdf, csv, html/htm, doc/docx, msg, rtf, txt, etc.
  • Implementation can be done using C#/Python as SDKs are available. Supports REST API too.
How to Translate 

To perform this entire translation process, here are the major steps, one needs to take care of:

Step 1

The first step is to login into the Azure portal and creates an instance of the Translator service.







Clicking on Create will open up a new page, furnish all the details and click on the Review + Create button. Doing this will create an instance of a Translator service.

Step 2

Grabbing the key and the endpoint of the Translator service:















Step 3

Create an instance of Azure Storage service as we need to create two containers.

  • The first container named inputdocs - holds source documents, which need to be translated
  • The second container named translateddocs - holds target documents, which are the translated documents

Once containers are created, you could see them listed under your storage account as shown below:











Step 4

Upload all the documents which need to be translated, under inputdocs container.

Step 5

Next is to generate the SAS tokens for both source and target containers. Note that the source container must have at least Read and List permissions enabled, whereas the target container must have Write and List permissions enabled while generating SAS. Below are the steps to generate SAS token for the source document:










Similar steps need to be performed for the target container too.

Step 6

Now comes the C# code, which utilizes all the information from the above steps:

class Program {
    static readonly string route = "/batches";
    static readonly string endpoint = "<TRANSLATOR_SERVICE_ENDPOINT>/translator/text/batch/v1.0";
    static readonly string key = "<TRANSLATOR_SERVICE_KEY>";
    static readonly string json = ("" + "{\"inputs\": " + "[{\"source\": " + "{\"sourceUrl\": \"<SOURCE_SAS_TOKEN>\"," + "\"storageSource\": \"AzureBlob\"" + "}," + "\"targets\": " + "[{\"targetUrl\": \"<TARGET_SAS_TOKEN>\"," + "\"storageSource\": \"AzureBlob\"," + "\"language\": \"fr\"}]}]}");
    static async Task Main(string[] args) {
        using HttpClient client = new HttpClient();
        using HttpRequestMessage request = new HttpRequestMessage(); {
            StringContent data = new StringContent(json, Encoding.UTF8, "application/json");
            request.Method = HttpMethod.Post;
            request.RequestUri = new Uri(endpoint + route);
            request.Headers.Add("Ocp-Apim-Subscription-Key", key);
            request.Content = data;
            HttpResponseMessage response = await client.SendAsync(request);
            string result = response.Content.ReadAsStringAsync().Result;
            if (response.IsSuccessStatusCode) {
                Console.WriteLine($ "Operation successful with status code: {response.StatusCode}");
            } else Console.Write($ "Error occurred. Status code: {response.StatusCode}");
        }
    }
}

Step 7 - Sample input(English) and output document(French)

On executing the above C# code, you will notice that translated files got added to translateddocs container.

Takeaway

In this article, we have learned how to translate any document which is placed in Azure Blob to other languages. Here is the list of all the supported languages as of today. I've also recorded this entire flow on my channel, in case if you want to have a look.

Hope you enjoyed learning about Azure Translator Service.

Monday, August 9, 2021

Setting up Anaconda on Windows 10 machine

This video will guide you through a step-by-step procedure to set up Anaconda on Windows 10 machine, along with a Python sample code.

Friday, August 6, 2021

Using Customer Reviews To Know Product's Performance In Market - Azure Sentiment Analysis

Today I'll be mentioning one of the useful functions of Azure Text Analytics - Sentiment Analysis. Azure text analytics is a cloud-based offering from Microsoft and it provides Natural Language Processing over raw text. 

Use Case Described

In this article, I will explain how to use customer-provided product reviews to understand the market insight and how one can take a call on manufacturing the products in the future. Here is the pictorial representation of this use case.



 





Here are the high-level steps of how we can achieve this entire flow:

Step 1

This entire process starts with the data collection part and for this, I'm using a CSV file with customer-provided reviews. Here is the gist of it:




Step 2

Once data is collected, we need to import the data and for that, I'm using Jupyter Notebook inside Visual Studio Code. Here is the Python code to read and extract data from CSV file:

import csv
feedbacks = []
counter = 0
with open('Feedback.csv', mode='r', encoding='utf8') as csv_file:
    reader = csv.DictReader(csv_file)
    for row in reader:
        counter+=1
        if (counter <= 9):
            feedbacks.append(row['reviews.title'] + '.')
Python

Step 3

Next, we need to create a Text Analytics resource in Azure to get a key and an endpoint. This can be done by log onto the Azure portal and search for Text Analytics to create a new instance.



 


 




key = "TEXT_ANALYTICS_KEY"
endPoint = "TEXT_ANALYTICS_ENDPOINT"
Python

Step 5

Next is to install the required Python module. In VS Code, open a new terminal and install the below module using Pip:

pip install azure.ai.textanalytics
Python

Step 6

Import the modules and create client objects as shown below:

from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential

client = TextAnalyticsClient(endpoint=endPoint, credential=AzureKeyCredential(key))
response = client.analyze_sentiment(documents=feedbacks)
review = type('', (), {'positive':0, 'negative':0, 'neutral':0})()
for idx, sentence in enumerate(response):
    print("Sentence {}: {}".format(idx+1, sentence.sentiment))
    if(sentence.sentiment == "positive"):
        review.positive = review.positive + 1
    elif (sentence.sentiment == "negative"):
        review.negative = review.negative + 1
    else:
        review.neutral = review.neutral + 1

At this point, if you will run the code, you will get the results from sentiment analysis. 

Step 7

Now, it's time to plot the analysis results. This can be done by using MatplotLib. If VS Code is not detecting it, then you can install it using Pip (pip install matplotlib).

Here is the code to plot the results:

import matplotlib.pyplot as plot
figure = plot.figure()
ax = figure.add_axes([0,0,1,1])
x_values = ['Positive', 'Negative', 'Neutral']
y_values = [review.positive, review.negative, review.neutral]
ax.bar(x_values, y_values)
Python

Step 8

If everything went well so far, then on executing the application, you will see similar output as shown below:



 

 

 







Conclusion and Takeaway

Looking at the above chart, the manufacturer can take a call and decide, whether he needs to increase the production or slow down the production and understand the customer's pain points.

Hope you enjoyed reading this article. There may be a few steps, which I didn't explain here. So, in case, if you got stuck at any point while reading this, I would recommend you to watch out for my video demonstrating end-to-end flow on my channel.

Wednesday, July 28, 2021

Thursday, July 22, 2021

Which Azure AI Service to Select and Why ?

This video talks about which Azure Artificial Intelligence service to select and for which purpose, along with few example scenarios.

Thursday, July 15, 2021

Creating And Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

In a previous article, we saw how one can utilize a prebuilt model to read data from a sales receipt. In this article, we will learn to create our own ML model, train it, and then extract information from a sales receipt. Here custom model means a model which is completely tailored to meet a specific need or a use case.

Steps involved

To perform this end-to-end workflow, there are 4 major steps.

Step 1 - Create Training Dataset

For training a model we need at least 5 documents of the same type, which means if we are planning to analyze receipts, then we need at least 5 samples of the sales receipts. If we are planning to extract data from a business card, then we need to have at least 5 samples of a business card, and so on and these documents can be either text or handwritten.

Step 2 - Upload Training Dataset

Once the training documents are collected, we need to upload that to Azure Storage. To perform this step, one should have Storage Account created on the Azure portal and one can upload images in the container using the below steps,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

The above screenshot will guide you on how to create a container named receipts. Once the container is created successfully, documents can be uploaded to the newly created container by clicking on the Upload button as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

The below screenshot depicts the five images uploaded to the container.

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Once we collected the training data, we need to make a decision on whether we are going with supervised learning or unsupervised learning. In the case of supervised learning, we have to label our training data, which means along with sample training data, we should have additional files to hold information about OCR and Labels.

Step 3 - Running the OCR and Labelling the Training Dataset

For data labeling and training, I’m using Form Recognizer Sample Labeling Tool, which is available online on FOTT website.

Once the web page is opened, one needs to click on the New Project shown in the center of the screen and it will open up a new page as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Adding a New Connection

Clicking on the button Add Connection will open up a new page, wherein we need to provide SAS URI. To obtain SAS URI, we need to open the same Azure Storage resource and get the SAS generated as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Getting Form Recognizer Service URI and Key

To get the URI and Key, we need to open up the Azure Form Recognizer resource and copy the required fields as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Once the project is saved successfully, you will notice that all the blob objects are loaded on the left-hand side as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Running the OCR

Next, we need to run the OCR for all 5 documents. Doing this will mark the identified text areas in a yellow rectangle and the respective coordinates will be saved in a new file having a name ending with .ocr.JSON. These marks can be changed and rectified if required. Once this process is completed, you will notice that container is updated with new files as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Constructing the Tag List

After running the OCR, next, we need to construct the tag list and this can be done by clicking the button on the right as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

This will allow us to add all the required tags as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Labeling the Dataset

When it comes to labeling, we have to perform this for all the training documents. For this, select the text on receipt and then click on the corresponding tag on the right side. On doing so, values got added to the respective tag. On completion, it would look something like this,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Before moving ahead, we need to verify whether labeling is done for all the documents and this can be done by looking at our container. If everything went well, then you will notice that new files ending with labels.json got added as shown below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

Step 4 - Training the Model

To train the model, we need to click on the Train button shown on the left side as,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

On completion of the training process, the complete summary will be shown as below,

Creating and Training Custom ML Model to Read Sales Receipts Using AI-Powered Azure Form Recognizer

On the bottom right, you can see Average accuracy, which tells how our model behaved with the given training set. If this figure is not satisfactory then we can add more documents to the training dataset and re-visit the labeling step.

Step 5 - Testing the Model

This is the very important step wherein we need to test our model and see how it is performing on test data. In this step, we need to write few lines of Python code, which will use our training dataset and model id to perform this testing. Here is the code:

import json
import time
from requests import get, post
endpoint = "FORMRECOGNIZER_ENDPOINT"
key = "FORMRECOGNIZER_KEY"
model_id = "MODEL_ID"
post_at = endpoint + "/formrecognizer/v2.0/custom/models/%s/analyze" % model_id
input_image = "IMAGE_TO_TEST"
headers = {
    'Content-Type': 'image/jpeg',
    'Ocp-Apim-Subscription-Key': key,
}

f = open(input_image, "rb")
try:
    response = post(url = post_at, data = f.read(), headers = headers)
    if response.status_code == 202:
        print("POST operation successful")
    else:
        print("POST operation failed:\n%s" % json.dumps(response.json()))
        quit()

    get_url = response.headers["operation-location"]
except Exception as ex:
    print("Exception details:%s" % str(ex))
    quit()


response = get(url = get_url, headers = {"Ocp-Apim-Subscription-Key": key})
json_response = response.json()
if response.status_code != 200:
    print("GET operation failed:\n%s" % json.dumps(json_response))
    quit()
status = json_response["status"]
if status == "succeeded":
    print("Operation successful: %s" % json.dumps(json_response))
    quit()
if status == "failed":
    print("Analysis failed:\n%s" % json.dumps(json_response))
Python

On execution of the above code, you will see JSON output with a confidence score.

Summary

In this article, we had seen how to analyze a sales receipt with a customized ML model. To know, all the steps in detail, I would recommend you to watch the complete demonstration on my channel.