Showing posts with label Azure Cognitive Services. Show all posts
Showing posts with label Azure Cognitive Services. Show all posts

Tuesday, September 14, 2021

Extracting Sensitive Information from Document using Azure Text Analytics

This video explains the basics of NER (Named Entity Recognition) - PII (Personal Identifiable Information) and how it can be used to redact the sensitive/confidential information before passing it to next stage. It also includes code walk through and the Azure Text Analytics instance creation.

Tuesday, September 7, 2021

Monday, August 30, 2021

Building Custom Language Translation Model using Azure Translator Services

This video talks about what is the custom translator along with it's need. It also explains, how to create a completely customized language translator model with personalized training data and how to deploy the model in multiple Azure regions. For validating the model, C# console application was created.

Friday, August 27, 2021

Translate Document from One Language To Another - Azure Cognitive Services

In this article, I’m going to write about another interesting Azure-based service named Translator, which falls under the umbrella of Azure Cognitive Services. This service helps us to translate documents from one language to another and at the same time, it retains the formatting and the structure of the source document. So, let’s say, if any text in the source document is in italics, then the newly translated document, will also have the text in italics.

Key Features of Translator Service

Let’s have a look at a few of the key features, of the Translator service,

  • Auto-detection of the language of the source document
  • Translates large files
  • Translates multiple files in a shot
  • Preserves formatting of the source document
  • Supports custom translations
  • Supports custom glossaries
  • Supported document types – pdf, csv, html/htm, doc/docx, msg, rtf, txt, etc.
  • Implementation can be done using C#/Python as SDKs are available. Supports REST API too.
How to Translate 

To perform this entire translation process, here are the major steps, one needs to take care of:

Step 1

The first step is to login into the Azure portal and creates an instance of the Translator service.







Clicking on Create will open up a new page, furnish all the details and click on the Review + Create button. Doing this will create an instance of a Translator service.

Step 2

Grabbing the key and the endpoint of the Translator service:















Step 3

Create an instance of Azure Storage service as we need to create two containers.

  • The first container named inputdocs - holds source documents, which need to be translated
  • The second container named translateddocs - holds target documents, which are the translated documents

Once containers are created, you could see them listed under your storage account as shown below:











Step 4

Upload all the documents which need to be translated, under inputdocs container.

Step 5

Next is to generate the SAS tokens for both source and target containers. Note that the source container must have at least Read and List permissions enabled, whereas the target container must have Write and List permissions enabled while generating SAS. Below are the steps to generate SAS token for the source document:










Similar steps need to be performed for the target container too.

Step 6

Now comes the C# code, which utilizes all the information from the above steps:

class Program {
    static readonly string route = "/batches";
    static readonly string endpoint = "<TRANSLATOR_SERVICE_ENDPOINT>/translator/text/batch/v1.0";
    static readonly string key = "<TRANSLATOR_SERVICE_KEY>";
    static readonly string json = ("" + "{\"inputs\": " + "[{\"source\": " + "{\"sourceUrl\": \"<SOURCE_SAS_TOKEN>\"," + "\"storageSource\": \"AzureBlob\"" + "}," + "\"targets\": " + "[{\"targetUrl\": \"<TARGET_SAS_TOKEN>\"," + "\"storageSource\": \"AzureBlob\"," + "\"language\": \"fr\"}]}]}");
    static async Task Main(string[] args) {
        using HttpClient client = new HttpClient();
        using HttpRequestMessage request = new HttpRequestMessage(); {
            StringContent data = new StringContent(json, Encoding.UTF8, "application/json");
            request.Method = HttpMethod.Post;
            request.RequestUri = new Uri(endpoint + route);
            request.Headers.Add("Ocp-Apim-Subscription-Key", key);
            request.Content = data;
            HttpResponseMessage response = await client.SendAsync(request);
            string result = response.Content.ReadAsStringAsync().Result;
            if (response.IsSuccessStatusCode) {
                Console.WriteLine($ "Operation successful with status code: {response.StatusCode}");
            } else Console.Write($ "Error occurred. Status code: {response.StatusCode}");
        }
    }
}

Step 7 - Sample input(English) and output document(French)

On executing the above C# code, you will notice that translated files got added to translateddocs container.

Takeaway

In this article, we have learned how to translate any document which is placed in Azure Blob to other languages. Here is the list of all the supported languages as of today. I've also recorded this entire flow on my channel, in case if you want to have a look.

Hope you enjoyed learning about Azure Translator Service.

Friday, August 6, 2021

Using Customer Reviews To Know Product's Performance In Market - Azure Sentiment Analysis

Today I'll be mentioning one of the useful functions of Azure Text Analytics - Sentiment Analysis. Azure text analytics is a cloud-based offering from Microsoft and it provides Natural Language Processing over raw text. 

Use Case Described

In this article, I will explain how to use customer-provided product reviews to understand the market insight and how one can take a call on manufacturing the products in the future. Here is the pictorial representation of this use case.



 





Here are the high-level steps of how we can achieve this entire flow:

Step 1

This entire process starts with the data collection part and for this, I'm using a CSV file with customer-provided reviews. Here is the gist of it:




Step 2

Once data is collected, we need to import the data and for that, I'm using Jupyter Notebook inside Visual Studio Code. Here is the Python code to read and extract data from CSV file:

import csv
feedbacks = []
counter = 0
with open('Feedback.csv', mode='r', encoding='utf8') as csv_file:
    reader = csv.DictReader(csv_file)
    for row in reader:
        counter+=1
        if (counter <= 9):
            feedbacks.append(row['reviews.title'] + '.')
Python

Step 3

Next, we need to create a Text Analytics resource in Azure to get a key and an endpoint. This can be done by log onto the Azure portal and search for Text Analytics to create a new instance.



 


 




key = "TEXT_ANALYTICS_KEY"
endPoint = "TEXT_ANALYTICS_ENDPOINT"
Python

Step 5

Next is to install the required Python module. In VS Code, open a new terminal and install the below module using Pip:

pip install azure.ai.textanalytics
Python

Step 6

Import the modules and create client objects as shown below:

from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential

client = TextAnalyticsClient(endpoint=endPoint, credential=AzureKeyCredential(key))
response = client.analyze_sentiment(documents=feedbacks)
review = type('', (), {'positive':0, 'negative':0, 'neutral':0})()
for idx, sentence in enumerate(response):
    print("Sentence {}: {}".format(idx+1, sentence.sentiment))
    if(sentence.sentiment == "positive"):
        review.positive = review.positive + 1
    elif (sentence.sentiment == "negative"):
        review.negative = review.negative + 1
    else:
        review.neutral = review.neutral + 1

At this point, if you will run the code, you will get the results from sentiment analysis. 

Step 7

Now, it's time to plot the analysis results. This can be done by using MatplotLib. If VS Code is not detecting it, then you can install it using Pip (pip install matplotlib).

Here is the code to plot the results:

import matplotlib.pyplot as plot
figure = plot.figure()
ax = figure.add_axes([0,0,1,1])
x_values = ['Positive', 'Negative', 'Neutral']
y_values = [review.positive, review.negative, review.neutral]
ax.bar(x_values, y_values)
Python

Step 8

If everything went well so far, then on executing the application, you will see similar output as shown below:



 

 

 







Conclusion and Takeaway

Looking at the above chart, the manufacturer can take a call and decide, whether he needs to increase the production or slow down the production and understand the customer's pain points.

Hope you enjoyed reading this article. There may be a few steps, which I didn't explain here. So, in case, if you got stuck at any point while reading this, I would recommend you to watch out for my video demonstrating end-to-end flow on my channel.

Thursday, July 1, 2021

Getting Started with Reading Text from an Image using Azure Cognitive Services

In this article, we will learn about how we can read or extract text from an image, irrespective of whether it is handwritten or printed.

In order to read the text, two things come into the picture. The first one is Computer Vision and the second one is NLP, which is short for Natural Language Processing. Computer vision helps us to read the text and then NLP is used to make sense of that identified text. In this article, I’ll mention specifically about text extraction part.

How Computer Vision Performs Text Extraction

To execute this text extraction task, Computer Vision provides us with two APIs:

  • OCR API
  • Read API

OCR API, works with many languages and is very well suited for relatively small text but if you have so much text in any image or say text-dominated image, then Read API is your option.

OCR API provides information in the form of Regions, Lines, and Words. The region in the given image is the area that contains the text. So, the output hierarchy would be - Region, Lines of text in each region, and then Words in each line.

Read API, works very well with an image, that is highly loaded with text. The best example of a text-dominated image is any scanned or printed document. Here output hierarchy is in the form of Pages, Lines, and Words. As this API deals with a high number of lines and words, it works asynchronously. Hence do not block our application until the whole document is read. Whereas OCR API works in a synchronous fashion.

Here is the table depicting, when to use what:

OCR API

Read API

Good for relatively small text

Good for text-dominated image, i.e Scanned Docs

Output hierarchy would be Regions >> Lines >> Words

Output hierarchy would be Pages >> Lines >> Words

Works in a synchronous manner

Works in an asynchronous manner.

 Do watch out my attached video for the demo and code walkthrough: