In a article, we saw how one can utilize a prebuilt model to read data from a sales receipt. In this article, we will learn to create our own ML model, train it, and then extract information from a sales receipt. Here custom model means a model which is completely tailored to meet a specific need or a use case.
Thursday, July 15, 2021
Tuesday, July 6, 2021
Nowadays, where almost everything is turning to online and virtual modes, a very common problem any organization is facing is the processing of receipts that were scanned and submitted electronically for reimbursement purposes.
Now for any claim or reimbursements to get clear, first those must reach to proper accounts department based on the organization and the sector, and one way to perform this activity is by manual intervention. A person or a team must go through all those digitally scanned receipts manually and filter them based on the departments or any other validation and eligibility criteria they may have.
The situation becomes more tragic when the volume of such scanned receipts is too high. So, get rid of this manual effort, a lot many organizations have already opted for a solution that is AI-based, and lot many are in a process of doing so.
Definitely, one can go for OCR, which is short for Optical Character Recognization technologies to extract data but here the problem is not only about data extraction, but it is also about data interpretation. Because there could be an incident, wherein the user uploaded a wrong document altogether, which is not a receipt. So, the solution should be robust enough to filter out these scenarios.
How can AI-based solutions be achieved?
Like many other Azure services, here also we can utilize a service named Form Recognizer, which consists of intelligent processing capabilities and allow us to automate the processing of forms and receipts. Basically, it is a combination of OCR and predictive models, which in turn falls under the umbrella of Azure Cognitive Services.
Here OCR will work on text extraction and models will help us to filter the useful information, like invoice date, address, amount, description, name or could be any other relevant field, which business demands.
What all models are supported by Form Recognizer?
Form Recognizer supports two types of models: Pre-built and Custom models.
- Prebuilt – Are the ones, which are provided out-of-box and are already trained with some basic sales data based on USA sales format.
- Custom Models – Are the ones, which can be tailored based on our needs with our own data and business needs.
So, in this article, I’ll be focusing on the pre-built models and will cover custom model integration as part of another article.
How to get started with Form Recognizer?
The very first thing, we need is login to the Azure portal at portal.azure.com to create Azure Resource. There are two ways to create Azure resources.
- Using Azure Form Recognizer
- Using Azure Cognitive Services
If anyone is planning to use other services under Cognitive Services, then existing/new resources can be used. But if one needs to work only with Form Recognizer Service, then also it can be done.
For development, I'm using Python as a language and Visual Studio Code having Jupyter Notebook. Here is the core implementation:
Sample Input and Output
I've taken the below receipt as an input,
This article mentions
high-level steps of how one can use a pre-built ML model to read information
from a sales receipt, with an assumption that the reader is already aware of
how to use Python, VS Code, Jupyter Notebook along with how to import Python modules.
But if you are new to any of these, I would recommend you to watch my below video explaining this article from start to end.
Thursday, July 1, 2021
In this article, we will learn about how we can read or extract text from an image, irrespective of whether it is handwritten or printed.
In order to read the text, two things come into the picture. The first one is Computer Vision and the second one is NLP, which is short for Natural Language Processing. Computer vision helps us to read the text and then NLP is used to make sense of that identified text. In this article, I’ll mention specifically about text extraction part.
How Computer Vision Performs Text Extraction
To execute this text extraction task, Computer Vision provides us with two APIs:
- OCR API
- Read API
OCR API, works with many languages and is very well suited for relatively small text but if you have so much text in any image or say text-dominated image, then Read API is your option.
OCR API provides information in the form of Regions, Lines, and Words. The region in the given image is the area that contains the text. So, the output hierarchy would be - Region, Lines of text in each region, and then Words in each line.
Read API, works very well with an image, that is highly loaded with text. The best example of a text-dominated image is any scanned or printed document. Here output hierarchy is in the form of Pages, Lines, and Words. As this API deals with a high number of lines and words, it works asynchronously. Hence do not block our application until the whole document is read. Whereas OCR API works in a synchronous fashion.
Here is the table depicting, when to use what:
Good for relatively small text
Good for text-dominated image, i.e Scanned Docs
Output hierarchy would be Regions >> Lines >> Words
Output hierarchy would be Pages >> Lines >> Words
Works in a synchronous manner
Works in an asynchronous manner.
Do watch out my attached video for the demo and code walkthrough:
Monday, June 21, 2021
This video talks about how to evaluate any Machine Learning classification model and what all matrices are available to do so. It contains very simple to follow examples along with calculations and brief overview of confusion matrix.