You are on page 1of 20

MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION, MUMBAI

A Project Report on Optical Character Recognition

Submitted in partial fulfillment for the award of Diploma in Computer Engineering

By Lavanya R. Patil Mayuri D. Sonavane Divya R. Patil DEPARTMENT OF COMPUTER ENGINEERING S.S.V.P.S's Bapusaheb Shivajirao Deore Polytechnic, Dhule (Institute Code : 0059) 2013-2014

I.

A Project Report on Optical Character Recognition Submitted in partial fulfillment for the award of Diploma in Computer Engineering By Lavanya R. Patil Mayuri D. Sonavane Divya R. Patil

Guided by Ms. Vishakha R. Bhadane

DEPARTMENT OF COMPUTER ENGINEERING S.S.V.P.S's Bapusaheb Shivajirao Deore Polytechnic, Dhule (Institute Code : 0059)

II.

CERTIFICATE
This is to certify that, the project entitled Optical Character Recognition has been carried out in our premises by, Lavanya R. Patil Mayuri D. Sonavane Divya R. patil

under my guidance, in partial fulfillment for the award of Diploma in Computer Engineering of the Maharashtra State Board of Technical Education, Mumbai during the academic year 2013 - 2014.

Place : Dhule Date : March 22,2014

Ms. Vishakha R. Bhadane Guide

Ms S.R.Patil H.O.D

Prof.Pramod B. Kachave Principal

III.

ACKNOWLEDEMENT
We express our profuse thanks to our HOD Mrs. S. R. Patil for permitting us to develop such project OPTICAL CHARACTER RECOGNITIONand our Guide Ms. Vishakha R. Bhadane for ably and sincerely assisting me not only in preparing the computer program and project work but also in the preparation of the project report. We wish to express our thanks to respected teachers and all teaching staff who have been a constant source of encouragement and assistance. Last but not the least we are also grateful to college library, computer lab facility made available by our college.

Lavanya R. Patil Mayuri D. Sonavane Divya R. Patil

ABSTRACT

Optical Character Recognition is offently called as OCR.OCR is a complex technology that converts images with text into editable format. In existing system there are several problems like the document can not be converted into editable format to overcome the drawback of existing system Optical Character Recognition system is developed.OCR allows you to process scanned books, screenshots and photos with text and get editable documents like TXT, DOC or PDF file. .

PAGE INDEX

CHAPTER NO. 1

CONTENTS
INTRODUCTION
1.1 Overview 1.2 Need For System 1.3 Limitations of Existing System 1.4 Proposed System 1.5 Problem Definition

PAGE 1

REQUIREMENT ANALYSIS 2.1 Developer Requirements


2.2 User Requirement

DESIGN
3.1 System Architecture 3.2 Flow Control Diagrams

HARDWARE AND SOFTWARE


4.1 Hardware Requirement 4.2 Software Requirement

5 10

IMPLEMENTATION 5.1 Forms

5 14

CONCLISION AND FUTURE SCOPE REFERENCES 15

CHAPTER: 1
INTRODUCTION 1.1 Project Overview
Optical character recognition is offently called as OCR .It is a process, used to convert scanned images of text to electronic text so that digitized texts can be searched, indexed and retrieved. An OCR system consists of a normal scanner and some special software.The OCR software then examines the page and changes the letters into a form that can be edited or processed by a normal word processing package.Optical Character Recognition is a process by which text characters can be input to a computer by providing the computer with an image.The computer uses an OCR engine-a computer program with the specific function of making a guess which letter (recognizable to a computer) an image (recognizable to a human) represents.Optical character recognition usually abbreviated to OCR, involves a computer system designed to translate images of typewritten text (usually captured by a scanner) into machine editable text or to translate pictures of characters into a standard encoding scheme representing them.OCR began as a field of research in artificial intelligence

and computational vision.Suppose you wanted to digitize a magazine article or a printed contract. You could spend hours retyping and then correcting misprints. Or you could convert all the required materials into digital format in several minutes using a scanner (or a digital camera) and Optical Character Recognition software.

1.2 Need For System


In Optical Character Recognition system we can convert the text into editable format.This system allows processing scan books, screenshots and photos with text and getting editable documents like TXT, DOC and PDF files.

1.3 Limitation of Existing System


Text form a source with a font size of less than 12 points will result is error Most documents formatting is lost during text scanning except for paragraph marks and tab stops Scanning of plain text files or spreadsheet printouts usually work,how ever the data needs to be reformed Character a and 2 are not supported more

1.4 Proposed System


OCR is a complex technology that convert images with text into editable formats.OCR allows you to process scanned books,screenshots and photos with text and get editable formats like TXT,DOC or PDF files.This technology is widely used in many areas and the most advanced OCR systems can handle almost all types of images,even such complex as scanned magazine pages with images and columns or photos from a mobile phone

1.3 Problem Defination


Information available on the World Wide Web is obtained from old books, published many years ago. Using software we will find easy to scan the books and to convert the pages into computer text.Scanning a page from a book creates a picture of the page. In such case a picture is not easily editable. It is easily readable by a human eye, the computer cannot "see" the words in the picture.

CHAPTER: 2
REQUIREMENT ANALYSIS

Function of system forms containing characters images can be scan.

Through scanner then the recognition engine of system interpret the images to turn images of text or printed characters into ASCII data (machine readable characters.)

2.2 Developer Requirements:


Scan Images Jdk1.6 All the documents where pictures text are available.

2.3 User Requirements:


Recognized the unclear text or documents

CHAPTER: 3
DESIGN

Design creates a representation or model of the software but unlike the requirements and components that are necessary to implement the system. Design allows you to model the system or product that is to be build, this model can be assessed for quality and improved before code is generated and end users become involved in large numbers. Design is the place where software quality is established. In this chapter, we see the details about how this system actually works. This chapter includes the details about working of the proposed system, flow control diagram of the system database design used in the system. Design is the only way that we can accurately translate a users view into a finished software product or system. Design is the first step in the development phase for any engineered product or system. The designers goal is to produce a model or representation of an entity that will later be built. Beginning, once system requirement have been specified and analyzed, system design is the first of the three technical activities -design, code and test that is required to build and verify software The importance can be stated with a single word Quality. Design is the place where quality is fostered in software development. Design provides us with representations of software that can assess for quality. Design is the only way that we can accurately translate a customers view into a finished software product or system. Software design serves as a foundation for all the software engineering steps that follow. Without a strong design we risk building an unstable system one that will be difficult to test, one whose quality cannot be assessed until the last stage.

fig 3.1.1 System Architecture . This system architecture shows image which contain number of character have a preprocessing which relate to features which extracted from the image. image store in module library then it matches with the letter.

3.2 Flow Control Diagram

fig. 3.2.2 System Architecture. .

3.3 Flow Diagram

Fig 3.3 Flow Diagram

CHAPTER: 4
Hardware and software Requirements

4.1 Hardware Requirement: Operating System Windows 7 Processor Pentium4 RAM - 1 GB Hard Disk - 60 GB

4.2 Software Requirement:

Jdk1.6 Java Optical Character Library Text Pad OdBc drivers for MS-Access MS-Office

5.0 Implementation Details 5.1 Form

Fig 5.1 Character Recognition Image

The above window opens when the system will start, then the user have to click on the Character/Text Recognition button when the user clicks on that button, then the list of scanned images are opened. 9

Fig 5.2 List of Scanned Images

In the above window all the scanned images are stored. Now select the image which user wants to recognize. The image is a structure of scan window as above, the window can be modified with latest with scanned images also.

10

Fig 5.3 Input Image The above window is a scanned image which user wants to recognize. User have to click on Recognize button to get the desired output.

11

Fig 5.4 Output Image The above window is a output window ,in which the recognized text is displayed.

12

6. 0 CONCLUSION & FUTURE SCOPE

Conclusion:
A is recognized as 9.The scanned image is successfully displayed.

Future Scope:
The older books with unclear matter can be cleared hence the imp info in the books can be restored.The image recognition is helpful to clear the photos of historical persons.It can be helpful in barcode recognition.

13

REFERENCES
Websites
www.convert-in.com/sql2ora.htm En.m.wikipedia.org/wiki/Federated_database_system www.javasourcecode.com

Books
Java 2 - The Complete Reference (5th Edition)

14

You might also like