Tuesday, February 6, 2018

[ENGINEERING BLOG] Major change of the API and in the license key formats

Besides working on new features and making plenty of SDK releases in the last year, as described in this post, we invested a lot of effort into research and development in order to upgrade our proprietary mobile OCR technology. In parallel, we were busy preparing a new API for our mobile SDKs, one that will be able to support all the new features we have planned. Also, we are changing our licensing subsystem with new formats of the license keys in order to increase extensibility and flexibility.

This is the biggest change we have made in the past 5 years, so we invite developers to read this blog post in full.

Reading time: 25-35 min

Why the change?


In the last two years, we have shifted from traditional text recognition to a deep-learning approach. Our research team designed a custom-made machine learning system for OCR and is continuously working on new models of state-of-the-art neural networks, while our development team makes sure that DeepOCR runs fast on mobile devices while at the same time requiring only minimal memory. This enables high accuracy and speed for even the most complex use cases. DeepOCR technology is already implemented in our award-winning product BlinkReceipt and it also powers the recognition of handwritten problems in the Photomath app.

Microblink SDKs are used in a wide variety of use cases, from scanning of identity documents using BlinkID, payslips or invoices using PhotoPay, various predefined data using BlinkInput, to barcode- and QR code-scanning with PDF417. Now it's time to prepare all SDKs for the implementation of DeepOCR, but it's not as straightforward as one might think. Such a variety of use cases cannot be solved with a single DeepOCR model and so support for using multiple models within an SDK is needed.

The above-mentioned is the reason why we decided to change the licensing subsystem with the backward-incompatible change of the API and to introduce the new format of license keys. The new API and the new license keys are necessary to support all the information required to run DeepOCR and to also support other new features that we plan to add in 2018.

The release of the new API provides some additional key benefits for developers:

  • the integration of SDKs is easier and more flexible;
  • the SDKs are now faster and smaller; 
  • the interaction of objects within the API involves much less overhead to the native library that does all the processing.

We understand that this type of major change requires additional development effort on integration so we will be available to help you at every stage of the development. Please don't hesitate to contact us for support.

Since this license format change is not backward compatible with the current license format and as we use semantic versioning for our SDKs, this means that we need to raise the major version number of all our SDKs.

The new versions for the Microblink SDKs will be:
  • PDF417.mobi SDK 
    • for Android: version 7.0.0
    • for iOS: version 7.0.0
  • BlinkInput SDK
    • for Android: version 4.0.0
    • for iOS: 4.0.0
  • BlinkID SDK
    • for Android: version 4.0.0
    • for iOS: version 4.0.0
  • PhotoPay SDK
    • for Android: version 7.0.0
    • for iOS: version 7.0.0
As you may notice, we decided to increase the iOS versions by more than one version number. This is to reduce any risk of confusion and ensure that the same version number is used for both Android and iOS SDKs, as well as for the wrappers (PhoneGap, Xamarin, React Native).

Please note that the existing license keys cannot work with the new SDK versions but they will continue to work with the existing SDK versions. Vice versa, it's not possible to use the new license key with the old SDK versions.

What has changed?

In this section, we will describe all the changes in our SDKs, namely:
  • the change in the license key formats, specifically the licensing subsystem and licensing API;
  • the change in handling the recognizers and parsers;
  • implementation of a new concept: the processor.

License key formats


Because of the ever-increasing number of features that clients require from us, we decided that we needed a new license key format that would support these present and future demands. Technically, adding support for that required increasing the size of the binary layout of the license buffer, which meant that our license keys could no longer be formed from 8 groups of 8 alphanumeric characters.
Therefore, the new license keys are utilized and are now distributed in three different formats for the client to decide which one to use:
  • as a file;
  • as the base64 encoded strings;
  • as raw buffers.

We recommend using the license key as a file, as it's the easiest way to manage multiple license keys (demo and production). Instead of having different license setup codes for your test and production app, you can now have the same code while using different license files within assets of your app.

We simplified the API for setting up the license key. Instead of having several different ways of setting up the same license key, especially in Android, now there is a unified way to set up the license keys in both, i.e., Android and iOS, SDKs.

For example, in Android, a class called MicroblinkSDK allows you to set the license key in three ways:
  • as a path to the file within your assets folder;
  • as a base64 string;
  • as a raw buffer. 
The choice is yours. A similar class exists in iOS and can be used in a similar manner.

One of the greatest problems with the licensing system in the old API arose when a developer set a license key that didn't allow usage of a specific recognizer and then activated that recognizer. The developer was informed about the licensing error at the point when the native library was starting up and after the camera had already been initialized. This information was delivered via asynchronous callback, which was difficult to handle and confusing for most developers. Sometimes developers would simply ignore the callback and then wonder why the scanning wasn't working.

With the new API, this is no longer possible. We expect a developer to set the license key as early as possible during the startup of an app. Whenever a specific recognizer, detector, processor, or parser that is not allowed by the license key is instantiated, an exception arises in Android and an NSError will be returned in the iOS. Thus, it will be much more difficult for a developer to go into production with an invalid license key.

In addition, now we ensure that the demo license keys always inform a user when a demo version is being used so that the app testers can easily notice if the production version is using a demo license key.

Recognizers, Parsers, Detectors, Processors, Templating API


All our existing clients are already familiar with the concept of a Recognizer. Some of them are also already familiar with the concepts of a Parser and Detector, which are available only within BlinkInput, BlinkID, and PhotoPay SDKs. However, in the new API, we're introducing a new concept: the Processor. In order to explain the processor concept, let's first review the concepts behind the recognizer, parser, and detector.

Recognizer


The recognizer has always been the main unit of recognition within Microblink SDKs. Basically, a recognizer is the most abstract object that serves a specific use case. For example, BarcodeRecognizer is an object that knows how to scan barcodes on images received from a camera, while MRTDRecognizer is an object that knows how to find a machine readable zone of a travel document on a camera frame, performs OCR on that zone, and extracts relevant document information from it.

As you can see, a recognizer is quite a complex object with many responsibilities:

  1. It manages the detection of objects like barcodes, ID cards, payslips, and machine readable zones.
  2. It performs image correction and the dewarping of detected objects.
  3. It performs optical character or barcode recognition.
  4. It intelligently questions the recognized data in order to produce the final result.
Recognizers are not new and have existed in the all Microblink SDKs from the very first version of every SDK, but initially they were internal objects. Developers could only interact with them by creating RecognizerSettings objects that configured the expected behavior of a specific recognizer. When recognition was finished, developers then needed to typecast the given BaseRecognitionResult to the specific RecognitionResult for the specific recognizer. This, however, proved rather confusing, as it was not always clear that the specific RecognitionResult could only be produced by the specific recognizer configured with the specific RecognizerSettings.

Now, this process has been simplified. A developer now simply needs to instantiate a specific recognizer object, configure it, and give it to the RecognizerRunner object, which will use it to perform the desired recognition. After doing the recognition, that same specific recognizer will internally contain its recognition result, which a developer can then obtain by calling on an appropriate getter method.

This makes the recognizers long-lived stateful objects that live within an app and change their internal state while performing recognition. This is probably the biggest change developers will face when integrating the new version of Microblink SDK, but once used to it, it will be obvious that the recognition is much simpler to handle than it was before.



There is one special type of recognizer that is very flexible and configurable - it's called the Templating Recognizer. It is used as a part of the Templating API, which allows manually defining its behavior. To perform the detection of objects, a detector is required. Then, locations are used within that detection to identify any parts of the detected object that need perspective correction, and the settings for performing OCR on the corrected images are defined. Finally, the parsers that extract structured information from the OCR result are defined.

With the new API, we upgraded the flexibility of the Templating Recognizer and added a new processor concept that can be used within the Templating API. This is explained in more detail below.

Detector


The detector is an object that knows how to find a certain object in a camera image. BlinkID developers are likely familiar with DocumentDetector, which can find cards and checks in images, and MRTDDetector, which can find documents containing a machine readable zone in images. Those two detectors will remain in the BlinkID and PhotoPay SDKs, while other detectors will be removed from the SDKs.

Previously, developers interacted with detectors in a similar manner as with recognizers: they created a specific DetectorSettings object and associated it with a special recognizer called DetectorRecognizer by using the DetectorRecognizerSettings object. Then, during the operation of DetectorRecognizer, after it had internally performed the detection and before continuing to the next step, it returned the concrete DetectionResult via MetadataListener (or the didOutputMetadata callback in iOS). This asymmetry was confusing even more than the case with recognizers, especially because the same callback could receive detection results from internal detectors within recognizer objects and no one actually knew where these results were coming from.

In the new API, a developer will simply create a specific detector and associate it with DetectorRecognizer directly. After DetectorRecognizer internally performs detection using the specified detector, its detection results will remain saved within the specific detector and will be available to the developer via the provided getter method - in the same way as the recognizer's result is available via the specific recognizer's getter method.

Using detectors will now be the same as using recognizers, which we believe will make things a lot easier for developers.

Parser


Parsers are objects that can extract structured data from the raw OCR result. BlinkInput, BlinkID, and PhotoPay developers will already be familiar with the concept, especially when using the field-by-field scanning feature. With the field-by-field scanning feature, each parser tries to extract specific information from the OCR result obtained by performing OCR over a small area of a camera frame in the user interface.

In previous versions of SDKs, the parsers always produced their results as strings, which proved confusing for some use cases, like date parsing, whereby the date parser would return the string as returned from the OCR engine and, although it internally knew which part of the date was the day, which part was the month, and which part was the year, it had no way to communicate that back to the developer.

Moreover, in order to obtain the specific parser result, the developer had to know the exact name of the parser and the exact name of the parser group where a parser was placed. To make things even more confusing, when using BlinkInputRecognizer for performing field-by-field scanning, it was possible to use multiple parser groups over a single image, while when using DetectorRecognizer or MRTDRecognizer (i.e., Templating Recognizers), the name of the parser group was actually the name of the location within the detected location of the document and there was always a single parser group for each decoding location.

Has that confused you? I bet it has! To address this issue, we really thought hard and long about how to make this concept easier to use, but without losing all the flexibility it provided. We love symmetry, so we thought that it would be a good idea to organize parsers in the same way as recognizers and detectors are organized. So, we did it.

Parser is now a stateful object, just like the recognizer or detector. Developers will create a specific parser and then associate it with ParserGroupProcessor (more on that later), which will be associated with either BlinkInputRecognizer (for the field-by-field scan) or with the Templating Recognizer. Then, after the parser performs extraction of the OCR result, it will save the extraction result internally, and it will be available to the developer via the provided getter method, just like the recognizer provides its result via its own getter method.

This means that developers will no longer need to worry about assigning arbitrary strings to parser names and to then use those strings to later obtain parsed results from some obscure BlinkInputRecognitionResult; now, the parser's result will be available within the parser object.

Processor


Some might ask: "What about parser groups? Where did they disappear to?"
In the above story about parsers, you probably noticed that in the old API, parsers were grouped into parser groups, where every parser within the same group would perform extraction of the same OCR result calculated for the entire parser group. You also probably noticed the discrepancy between the field-by-field scan and the Templating API, where you could use multiple parser groups on the same image in the field-by-field scan, but only a single parser group on the dewarped image within Templating Recognizer.

We were thinking: "How to avoid that discrepancy and also provide more flexibility within Templating API?" or for example, "How to ensure that recognition performed with Templating API is not fully complete if the image that should contain a person's face in the document does not contain it?" We knew we needed something like a parser, but not working with the OCR result. Instead, it should work with the image just like a recognizer but should be possible to use within Templating Recognizer. Well, that led us to the Processor.

The processor is an object that can perform recognition of the image. Unlike the recognizer, the processor cannot be used alone - it must be used within the Templating API. The above-mentioned ParserGroupProcessor is a special processor (it acts as a parser group in the old API) that performs OCR on a given image using the same rules as the parser group used in the old API, and then runs every parser bundled within it to extract the OCR result. If a developer needs a dewarped image, ImageReturnProcessor can be used to simply save the image that was provided to it. In future releases, we plan to add lots of new processors for various use cases.

And the architecture of the processor object is the same as the architecture of the recognizer, parser, and detector. A developer will create the processor and associate it with Templating Recognizer. After the recognition is finished, the developer will obtain the result from the processor.

Templating API


If you were familiar with our Templating API you might now ask: "Where are the classifiers? How do we define decoding locations?"
Well, decoding locations are now defined within ProcessorGroup, which contains
  • one or more processors;
  • a location of interest within a document;
  • an instruction how to perform image correction and dewarping.
Templating Recognizer uses the chosen instruction to perform image correction and dewarping of the desired location and then runs processors within the given processor group on the corrected image.


What about classifiers?


We changed those too. In the old API, a developer had to define a single document classifier that needed to provide a classification of the document based on the parser results obtained in the pre-classification stage of Templating Recognizer's processing in order to continue processing with the document-specific parsers. Yes, we know that was a complex sentence, but it describes the very complex process that developers had to follow in order to use Templating API to correctly recognize the custom document.

Now, in order to provide a better abstraction, we created Class, which is an object containing two collections of processor groups and a classifier. The two collections of processor groups within Class are:
  • the classification processor group collection;
  • the non-classification processor group collection.
This process goes as follows:
  1. All processor groups within the classification collection perform processing.
  2. The classifier decides whether the object being recognized belongs to the current class and if it decides so, then the processor groups within the non-classification collection perform processing.
  3. Finally, Templating Recognizer just contains one or more class of objects.

OK, you have lost me back at the recognizer. Do I need to use this Templating API?


In the most common cases, the Templating API is not used. The Templating API is a very flexible API that can be used to perform the recognition of almost any document and with the new release, it has become even more flexible than it was in the old API. However, flexibility comes with increased complexness and we are aware of that. 

If we simplify it too much, then developers will not be able to add support for scanning custom documents, such as loyalty cards, or will be very constrained about what they can do. The Templating API would then not be flexible enough for many practical use cases and that would make our SDKs useless for those who want to add support for documents by themselves. Adding lots of flexibility makes Templating API very complex, but also very powerful.

Hence, we decided to make Templating API flexible and powerful, at the cost of it being more complex. The Templating API has always been and will always be a tool for the more advanced developers - typically those specialist in Microblink technology.

Platform-specific changes: iOS, Android, cross-platform


The changes described above apply to all platforms. However, there are some additional changes to mention that are specific to Android and iOS SDKs.

Name unification


A big problem in the old API was that the same concepts had different names in the Android and iOS SDKs. This was a problem in cases when a developer became familiar with Android documentation but then needed to port its code to iOS. Code porting was not so straightforward as some recognizers and UI elements had different names and even some basic API objects were named completely different (for example, PPCameraCoordinator in iOS was basically the same as RecognizerView in Android - but who knew that without asking our support engineers?).

The new API, however, has unified naming across platforms. The only difference in names now are those due to a specific platform's naming conventions; for instance, the DirectAPI singleton will now be called RecognizerRunner in Android and MBRecognizerRunner in iOS. Similarly, in iOS, there is now MBRecognizerRunnerViewController and in Android, there's RecognizerRunnerView and RecognizerRunnerFragment. In the same way, other components will have similar, if not the same, names, as you will see from the new and updated documentation accompanying each SDK release.

Images are now part of the results


In the new API, besides the scanned text, the results (in the recognizer and processor) can also contain images. This is especially important for BlinkID SDK. Now, it will be much easier to obtain images of the documents as well as faces and images of signatures from the documents. Those images will no longer be sent to an image callback. Instead, images will now be part of the specific recognizer's result, just like the extracted OCR data is.

Note for Android


In order to support this, we needed to change the way how recognizer objects are passed between activities via Intent. The problem is that Android has very strict limits on the size of data transferred via Intent, so it is not possible to transfer images. You can find details about this in the documentation and troubleshooting part of the new README. Also, make sure to check updated sample integration apps to see any changes.

iOS specific changes


Specifically for iOS, there are several notable changes to mention.
  1. Since the recognizer object is now a stateful object that gets mutated while it performs the recognition, we needed to change the way results are delivered via a delegate. Previously, that happened in didOutputResults: a method that was always called on in the main thread. Now, this happens in didFinishScanning: a method that will always be called on in the background processing thread. The reason for this is that when this method is called on, the recognition cannot continue since the same thread is busy processing the callback method. This gives you the opportunity to pause scanning while still in the processing thread to prevent changes to the recognizer's result that could occur during processing of the new camera frames that will arrive while the block is being dispatched from the processing thread to the main thread.
  2. There is no longer a didOutputMetadata delegate method. Instead, there is a separate delegate for each and every metadata item that can be obtained during processing. In this way, it is now more clear which methods need to be implemented if a specific metadata needs to be obtained.
  3. The segment scan overlay is renamed to MBFieldByFieldOverlayViewController and now will be part of the SDK. This means that integrating the field-by-field scanning feature into your app will be much easier, as you will not be required to copy lots of code from our sample app into your app to get that behavior. Using field-by-field overlay will now be as simple as using any other scanning overlay.

For more details about the iOS changes, you should always check the updated sample integration apps and documentation.

Android specific changes


Specifically for Android, there are two major changes. First, just like in iOS, we removed MetadataListener and introduced separate callbacks for each and every metadata that could be obtained during processing. This makes it much easier to manage events reported by the recognition process.
Second, we introduced the RecognizerRunnerFragment for a more flexible integration of the built-in UI.

RecognizerRunnerFragment


One of the questions we were often asked by developers was how could they embed a built-in UI into their applications' UI. Unfortunately, with the old API that was not possible. Developers could either use our built-in activities or could use RecognizerView to create their own scanning UI from scratch. Creating a custom UI from scratch was too much effort for some and yet they needed our scanning UI within their layout. This usually resulted in the developers using our built-in activities instead of presenting a scanning interface as they originally intended or resulted in a poorly integrated RecognizerView that caused weird bugs and crashes in the final app.

Therefore, in the new API, the RecognizerRunnerFragment is introduced. We created a fragment that controls the RecognizerRunnerView and can be skinned with different built-in overlays. Furthermore, every built-in activity is now actually implemented in a way that it presents the RecognizerRunnerFragment in full screen and adds a specific overlay to it. This is very similar to what occurs in iOS integration. Now developers are given a way to simply present our built-in scanning UI somewhere within their application layout, without forcing them to navigate away to a new activity.

When using RecognizerRunnerFragment or RecognizerRunnerView, notification that scanning was completed will be obtained via ScanResultListener, just like before when using RecognizerView in the old API. However, there are some differences in behavior. Most notably, just like in iOS, ScanResultListener's method onScanningDone is no longer invoked in the UI thread. Instead, it is invoked in the background processing thread to give the opportunity to pause scanning and to prevent changing of the recognizer's result object while the runnable block is being dispatched from the processing thread to the UI thread.

For more details about the changes in Android, you should check the updated sample integration apps and documentation.

PhoneGap, Xamarin, React Native


All the above-described changes affect only the native Android and iOS SDKs. Existing APIs used within the Cordova/PhoneGap, Xamarin, and React Native wrappers will remain the same. However, the bridging code for the native SDK will need to be updated. We will do that for our official plugins; however, if you created your own wrapper around our Android and iOS SDKs, then you will need to update it according to the new API changes.

How does it all that affect me?


To be clear, the next update to the new SDK will not work straight out of the box. A developer will need to adapt your application to the new API. This means that you will need to get new license keys for all your applications and change the integration code. Depending on the complexity of your app, this may take from a couple of minutes to a couple of weeks, so make sure you get prepared to do the work. 

But fear not! If you used the most basic level of integration, there will be only a small set of changes that you will need to apply to your app. However, if you created a custom scanning UI using Microblink SDK or if you used the Templating API to add support for scanning some custom document types, then you will need to be prepared to make some larger changes to your codebase, which may take up to a couple of weeks. But rest assured, we want the upgrade process to be as easy as possible for you, so don't hesitate to ask our engineering teams if you need help.

To help you plan ahead the changes in your applications, we are announcing the SDK release schedule below.

New SDK version release schedule

PDF417 SDK

PDF417 Android SDK was recently released on 22nd January with detailed documentation. PDF417 iOS SDK is scheduled for release in the first week of February. These SDKs will give you a glimpse of the new Recognizer architecture and you will have the chance to test the new license key formats.

BlinkInput SDK

In February, we also plan to release BlinkInput SDK for both Android and iOS, with the new API. This release will also contain a preview version of our next-generation DeepOCR engine. However, DeepOCR will be optional to try it out in your experiment and we would welcome your feedback on how we can improve it. This release will give you the opportunity to play with the new Templating API, the new Field-by-Field scan, and the new Parser and Processor architectures.

BlinkID SDK and PhotoPay SDK

After we release BlinkInput SDK, the plan is to release the new BlinkID SDK and PhotoPay SDK during March and April. However, these releases will depend on the feedback and the number of issues we receive from developers who have tried the new API in BlinkInput or PDF417. 

After we make sure the new API works flawlessly, we will continue porting the BlinkID and PhotoPay SDKs to the new API.

We encourage you to try the new API with PDF417 as soon as possible and to please give us your feedback. We are still actively working on the new API so your feedback will be very valuable to us.

Ultimately, we truly hope that you will enjoy using our products with the new API at least as much as we enjoyed creating it for you.

For feedback and help with integration, please contact us on help.microblink.com.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.