This video explains the basics of NER (Named Entity Recognition) - PII (Personal Identifiable Information) and how it can be used to redact the sensitive/confidential information before passing it to next stage. It also includes code walk through and the Azure Text Analytics instance creation.
Tuesday, September 14, 2021
Tuesday, September 7, 2021
This video explains about how to make a POST API call to PII to extract redacted text information using Postman.
Monday, August 30, 2021
This video talks about what is the custom translator along with it's need. It also explains, how to create a completely customized language translator model with personalized training data and how to deploy the model in multiple Azure regions. For validating the model, C# console application was created.
Friday, August 27, 2021
In this article, I’m going to write about another interesting Azure-based service named Translator, which falls under the umbrella of Azure Cognitive Services. This service helps us to translate documents from one language to another and at the same time, it retains the formatting and the structure of the source document. So, let’s say, if any text in the source document is in italics, then the newly translated document, will also have the text in italics.
Let’s have a look at a few of the key features, of the Translator service,
- Auto-detection of the language of the source document
- Translates large files
- Translates multiple files in a shot
- Preserves formatting of the source document
- Supports custom translations
- Supports custom glossaries
- Supported document types – pdf, csv, html/htm, doc/docx, msg, rtf, txt, etc.
- Implementation can be done using C#/Python as SDKs are available. Supports REST API too.
To perform this entire translation process, here are the major steps, one needs to take care of:
The first step is to login into the Azure portal and creates an instance of the Translator service.
Clicking on Create will open up a new page, furnish all the details and click on the Review + Create button. Doing this will create an instance of a Translator service.
Grabbing the key and the endpoint of the Translator service:
Create an instance of Azure Storage service as we need to create two containers.
- The first container named inputdocs - holds source documents, which need to be translated
- The second container named translateddocs - holds target documents, which are the translated documents
Once containers are created, you could see them listed under your storage account as shown below:
Upload all the documents which need to be translated, under inputdocs container.
Next is to generate the SAS tokens for both source and target containers. Note that the source container must have at least Read and List permissions enabled, whereas the target container must have Write and List permissions enabled while generating SAS. Below are the steps to generate SAS token for the source document:
Similar steps need to be performed for the target container too.
Now comes the C# code, which utilizes all the information from the above steps:
route endpoint key json args client request data json Encoding UTF8 request Method HttpMethod Post request RequestUri endpoint route request Headers key request Content data response client request result response Content Result response IsSuccessStatusCode Console $ Console $
Step 7 - Sample input(English) and output document(French)
On executing the above C# code, you will notice that translated files got added to translateddocs container.
In this article, we have learned how to translate any document which is placed in Azure Blob to other languages. Here is the list of all the supported languages as of today. I've also recorded this entire flow on my channel, in case if you want to have a look.
Hope you enjoyed learning about Azure Translator Service.
Friday, August 6, 2021
Today I'll be mentioning one of the useful functions of Azure Text Analytics - Sentiment Analysis. Azure text analytics is a cloud-based offering from Microsoft and it provides Natural Language Processing over raw text.
Use Case Described
In this article, I will explain how to use customer-provided product reviews to understand the market insight and how one can take a call on manufacturing the products in the future. Here is the pictorial representation of this use case.
Here are the high-level steps of how we can achieve this entire flow:
Conclusion and Takeaway
Thursday, July 1, 2021
In this article, we will learn about how we can read or extract text from an image, irrespective of whether it is handwritten or printed.
In order to read the text, two things come into the picture. The first one is Computer Vision and the second one is NLP, which is short for Natural Language Processing. Computer vision helps us to read the text and then NLP is used to make sense of that identified text. In this article, I’ll mention specifically about text extraction part.
How Computer Vision Performs Text Extraction
To execute this text extraction task, Computer Vision provides us with two APIs:
- OCR API
- Read API
OCR API, works with many languages and is very well suited for relatively small text but if you have so much text in any image or say text-dominated image, then Read API is your option.
OCR API provides information in the form of Regions, Lines, and Words. The region in the given image is the area that contains the text. So, the output hierarchy would be - Region, Lines of text in each region, and then Words in each line.
Read API, works very well with an image, that is highly loaded with text. The best example of a text-dominated image is any scanned or printed document. Here output hierarchy is in the form of Pages, Lines, and Words. As this API deals with a high number of lines and words, it works asynchronously. Hence do not block our application until the whole document is read. Whereas OCR API works in a synchronous fashion.
Here is the table depicting, when to use what:
Good for relatively small text
Good for text-dominated image, i.e Scanned Docs
Output hierarchy would be Regions >> Lines >> Words
Output hierarchy would be Pages >> Lines >> Words
Works in a synchronous manner
Works in an asynchronous manner.
Do watch out my attached video for the demo and code walkthrough: