# Character Recognition for Telugu Language
OCR (optical character recognition) is the recognition of printed or written text characters by a computer. This involves photoscanning of the text character-by-character, analysis of the scanned-in image, and then translation of the character image into character codes, such as ASCII, commonly used in data processing
In OCR processing, the scanned-in image or bitmap is analyzed for light and dark areas in order to identify each alphabetic letter or numeric digit. When a character is recognized, it is converted into an ASCII code. Special circuit boards and computer chips designed expressly for OCR are used to speed up the recognition process.
OCR is being used by libraries to digitize and preserve their holdings. OCR is also used to process checks and credit card slips and sort the mail. Billions of magazines and letters are sorted every day by OCR machines, considerably speeding up mail delivery.
The Applications are:
Data entry for business documents, e.g. check, passport, invoice, bank statement and receipt
Automatic number plate recognition
In airports, for passport recognition and information extraction
Automatic insurance documents key information extraction
Extracting business card information into a contact list
More quickly make textual versions of printed documents, e.g. book scanning for Project Gutenberg
Make electronic images of printed documents searchable, e.g. Google Books
Converting handwriting in real time to control a computer (pen computing)
Defeating CAPTCHA anti-bot systems, though these are specifically designed to prevent OCR.The purpose can also be to test the robustness of CAPTCHA anti-bot systems.
Assistive technology for blind and visually impaired users
An optical character recognition (OCR) engine:
Tesseract is an OCR engine with support for unicode and the ability to recognize more than 100 languages out of the box. It can be trained to recognize other languages.
Tesseract is used for text detection on mobile devices, in video, and in Gmail image spam detection.
Tesseract was developed as a proprietary software by Hewlett Packard Labs. In 2005, it was open sourced by HP in collaboration with the University of Nevada, Las Vegas. Since 2006 it has been actively developed by Google and many open source contributors.
Tesseract acquired maturity with version 3.x when it started supporting many image formats and gradually added a large number of scripts (languages). Tesseract 3.x is based on traditional computer vision algorithms. In the past few years, Deep Learning based methods have surpassed traditional machine learning techniques by a huge margin in terms of accuracy in many areas of Computer Vision. Handwriting recognition is one of the prominent examples. So, it was just a matter of time before Tesseract too had a Deep Learning based recognition engine
