By C.V. Jawahar, Anand Kumar, A. Phaneendra, K.J. Jinesh (auth.), Venu Govindaraju, Srirangaraj (Ranga) Setlur (eds.)
Optical personality popularity (OCR) is a key permitting know-how serious to making listed, electronic library content material, and it truly is in particular invaluable for Indic scripts, for which there was little or no electronic entry.
Indic scripts, the traditional Brahmi scripts universal within the Indian subcontinent, current a few demanding situations for OCR which are diversified from these confronted with Latin and Oriental scripts. yet correctly applied, OCR might help to make Indic electronic files essentially available to researchers and lay clients alike through growing searchable indexes and machine-readable textual content repositories.
This targeted guide/reference is the first actual accomplished ebook with regards to OCR for Indic scripts, supplying an summary of the state of the art learn during this box in addition to different concerns concerning facilitating question and retrieval of Indic records from electronic libraries. All significant study teams operating during this quarter are represented during this booklet, that's divided into sections on recognition of Indic scripts and retrieval of Indic documents.
Topics and features:
- Contains contributions from the prime researchers within the field
- Discusses facts set construction for OCR development
- Describes OCR platforms that disguise 8 varied scripts: Bangla, Devanagari, Gurmukhi, Gujarati, Kannada, Malayalam, Tamil, and Urdu (Perso-Arabic)
- Explores the demanding situations of Indic script handwriting acceptance within the on-line domain
- Examines the advance of handwriting-based textual content enter systems
- Describes ongoing paintings to extend entry to Indian cultural historical past materials
- Provides a piece at the enhancement of textual content and pictures acquired from historic Indic palm leaf manuscripts
- Investigates diversified options for note recognizing in Indic scripts
- Reviews mono-lingual and cross-lingual details retrieval in Indic languages
This is a superb reference for researchers and graduate scholars learning OCR expertise and methodologies. This quantity will give a contribution to establishing up the wealthy Indian cultural background embodied in thousands of historic and modern files spanning themes equivalent to technological know-how, literature, medication, astronomy, arithmetic and philosophy.
Venu Govindaraju FIEEE FIAPR, is a exotic Professor of machine technological know-how and Engineering on the collage at Buffalo. He has over two decades of analysis event in development attractiveness, info retrieval and biometrics. His seminal paintings on handwriting acceptance used to be on the middle of the 1st handwritten handle interpretation process utilized by the U.S. Postal Service.
Srirangaraj Setlur SMIEEE, is a crucial learn Scientist on the collage at Buffalo. He has over 15 years of study event in trend popularity that comes with NSF backed paintings on multilingual OCR applied sciences for electronic libraries and different functions. His paintings on postal automation has resulted in know-how followed by means of the U.S. Postal provider, and Royal Mail within the U.K.
Read or Download Guide to OCR for Indic Scripts: Document Recognition and Retrieval PDF
Similar computers books
The anode/electrolyte interface ш reliable oxide gas cells (SOFC) is understood to reason electric losses. Geometrically uncomplicated Ni yttria-stabilised zirconia (YSZ) interfaces have been tested to realize info at the structural and chemical adjustments taking place in the course of experiments at 1000°C in an environment of ninety seven% H2/3% H20.
The guide of machine imaginative and prescient and purposes, Three-Volume Set is on one of many "hottest" topics in present day intersection of utilized Physics, desktop technology, electric Engineering, and utilized arithmetic. the individuality of this set is that it's very applications-oriented. Examples of purposes in numerous fields of contemporary technology are fairly emphasised.
- Current Trends in Database Technology - EDBT 2004 Workshops: EDBT 2004 Workshops PhD, DataX, PIM, P2P&DB, and ClustWeb, Heraklion, Crete, Greece, March 14-18, 2004. Revised Selected Papers
- DirectX9 User Interfaces: Design and Implementation (Wordware Game Developer's Library)
- Healthy PC: Preventive Care and Home Remedies for Your Computer
- Mastering phpMyAdmin 3.4 for Effective MySQL Management (Community Experience Distilled)
Extra resources for Guide to OCR for Indic Scripts: Document Recognition and Retrieval
The second approach is efficiently implemented by the Viterbi algorithm. For an m-character string, this is equivalent to traversing through a trellis graph of N × m nodes, where N is the size of the alphabet. If negative of log of transition probability is the weight associated with edges of the trellis and negative of log of confusion probability is the weight associated with the nodes, then the path with minimum cumulative weight represents the desired correction output. On OCR of Major Indian Scripts: Bangla and Devanagari 37 Among Indian scripts, a complete OCR system with post-recognition error correction reported by Chaudhuri and Pal  was later modified using confusion matrix-based frequency and dictionary-based error correction approaches as follows.
Proposition 1 If, for any erroneous string S, the longest substring match in forward dictionary occurs for the first k1 characters, then the error must lie within the first k1 + 1 characters of S. The rest of the characters of S are error free. , let the error not lie in the first k1 + 1 characters. Now, since the first k1 + 1 characters are error free, then we could find at least one word in the dictionary where the first k1 + 1 characters match with those of the string S. This is a contradiction, since the longest dictionary word string match occurred for the first k1 characters, not for k1 +1 characters.
Based on the content. The bounding box information of these blocks is also stored in the schema. These text blocks may be further segmented as lines of blocks, words of lines, and Aksharas of words. The OCR may prefer to have information like font name, size, scan resolution which can be accessed from the metadata. To facilitate this, a reference to the metadata information file will be given in each page of the annotation. Sometimes, a single page of a collection may have different font style, font size, scan resolution, or print quality.