To implement Google’s Firebase Machine Learning— Cloud Vision’s text recognition API with Flutter for developing a simple Android Optical Character Recognition application.
What is Google’s Firebase Machine Learning?
Firebase Machine Learning is a mobile SDK that brings Google’s machine learning expertise to Android and iOS apps in a powerful yet easy-to-use package. Firebase ML provides convenient APIs that help you use your custom TensorFlow Lite models in your mobile apps.
Firebase ML provides two key capabilities around on-device custom models:
- Custom model deployment: Deploy custom models to your users’ devices by uploading them to our servers. Your Firebase-enabled app will download the model to the device on demand. This allows you to keep your app’s initial install size small, and you can swap the ML model without having to republish your app.
- AutoML Vision Edge: This service helps you create your own on-device custom image classification models with an easy-to-use web interface. Then, you can seamlessly host the models you create with the service mentioned above.
Read more about Firebase Machine Learning.
What is OCR?
OCR stands for Optical Character Recognition, it used to recognize the characters from pictures and subsequently that text can be used to create digital copies of image that are easy to read by the computer/machine.
It is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo or from subtitle text superimposed on an image.
Applications of OCR :
In today’s world everything is digital. OCR engines have been developed into many kinds of domain-specific OCR applications, such as receipt OCR, invoice OCR, check OCR, legal billing document OCR.
Some applications of OCR include-
- In airports, for passport recognition and information extraction
- Automatic insurance documents key information extraction
- Traffic sign recognition
- Extracting business card information into a contact list
- More quickly make textual versions of printed documents
- Make electronic images of printed documents searchable, e.g. Google Books
- Converting handwriting in real-time to control a computer (pen computing)
- Assistive technology for blind and visually impaired users
- Automatic number plate recognition
Tools / Requirements:
IDE: Android Studio / VS Code with Flutter package
Language: Dart, Kotlin
Plug-ins / API : Firebase ML’s Cloud Vision’s text recognition API
Make sure you have downloaded one of the required IDEs and integrated it with Flutter.
We will use the default application that Flutter builds and tailor it to fit our objective which is to build a simple Optical Character Recognition Application. By making the necessary changes in the basic structure of the Android application, the UI/UX of your application should look closely like this :
Now moving on to Firebase integration, use a Google account to login into the Firebase console and create a Firebase project and configure it and add the Firebase to the android project.
Next, deploy the Cloud Function to bridge your app and the Cloud Vision API. Add Firebase Auth to the app and finally add necessary dependencies to your application.
The Vision API can detect and extract text from images. There are two annotation features that support optical character recognition (OCR):
TEXT_DETECTIONdetects and extracts text from any image. For example, a photograph might contain a street sign or traffic sign. The JSON includes the entire extracted string, as well as individual words, and their bounding boxes.
DOCUMENT_TEXT_DETECTIONalso extracts text from an image, but the response is optimized for dense text and documents. The JSON includes page, block, paragraph, word, and break information.
Create the image using bitmap function to process the image and create the JSON request to invoke the cloud function using any one of the above annotations and store the recognized text found in the
fullTextAnnotation object in the desired format preferably string.
val annotation = task.result!!.asJsonArray.asJsonObject["fullTextAnnotation"].asJsonObject
The text then can be used to copy and paste it in other applications which can be done by creating another function in the android application to do so.
In this way you can build your own OCR scanner by implementing Google’s Firebase Machine Learning — Cloud Vision’s text recognition API with Flutter.
GitHub repository for the application: https://github.com/AmimaShifa/scan_text_from_image
Note: Although the Firebase ML toolkit that was used to build the application in the above GitHub link got deprecated recently but the functions can still be used for reference.