S3 - OCR - Serverless OCR Text Recognition
S3-OCR is an application capable of OCR processing of the given PDF file in the fastest way possible.
Overview
PDF-OCR is an application that performs OCR processing of the given PDF file in the fastest way possible. VirtualPostMail was using the old application in for this purpose, but the application wasn't able to process large PDFs, it wasn't compatible with all the servers available, and frequently it needed 2 to 3 days to process the request, or even wasn't able to finish it. Our partners from VirtualPostMail relied on us to make a new application for them with an improved performance to assure their customers have a fast and pleasant experience with their services.
Kickoff
Our job was to make a new application from scratch. The new app is designed to work on the Amazon Web Services (AWS) platform using the Serverless framework. Large PDFs are now split between different servers and pages are processed in parallel, resulting in significantly lower processing time, while the costs remain the same. We leveraged AWS cloud solutions to build a sophisticated application with increased flexibility, scalability and reliability. PDF-OCR is integrated with other systems so that the entire process is done automatically.
Timeline
After the agreement with the client about purpose and requirements of the application we started by making an architecture diagram of the app which helps system designers and developers visualize the high-level, overall structure to ensure that the system meets our goals. We decided to use AWS Step Functions that lets you coordinate multiple AWS services into serverless workflows so you can build and update apps quickly. Using Step Functions, we made our workflows run fast using AWS Lambda for this feature-rich application. Properly set up Step Functions automatically trigger and track each step, and retries when there are errors, so your application executes in order and as expected.
Key Results
- OCR Application that runs AWS services using Serverless framework
- Fast application able to process a large amount of data in a short period of time
- Highly usable custom software for VirtualPostMail