OFFLINE HAND WRITTEN DIGITS RECOGNITION USING SUPPORT VECTOR MACHINE

0
476

INTRODUCTION

1.1     Background of the Study

A digit is a numeric symbol (such as “2” or “5”) used in combinations (such as “25”) to represent numbers (such as the number 25) in positional numeral systems. The name “digit” comes from the fact that the 10 digits (Latin digiti meaning fingers) of the hands correspond to the 10 symbols of the common base 10 numeral system, that is, the decimal (ancient Latin adjective decem meaning ten) digits (O’Connor and Robertson, 2001).

In a given numeral system, if the base be an integer, the number of digits required would always equal to the absolute value of the base. For example, the decimal system (base 10) has ten digits (0 through to 9), whereas binary (base 2) has two digits (0 and 1) (O’Connor and Robertson, 2001).

In a basic digital system, a numeral is a sequence of digits, which may be of arbitrary length. Each position in the sequence has a place value, and each digit has a value. The value of the numeral is computed by multiplying each digit in the sequence by its place value, and summing the results (Wheeler and Wheeler, 2001).

Handwriting is writing created by a person with a writing utensil such as a pen or pencil. Handwriting includes both printing and cursive styles and is separate from formal calligraphy or typeface. Because each person’s handwriting is unique, it can be used to verify a document’s writer. The deterioration of a person’s handwriting is also a symptom or result of certain diseases (Srihari, Huang and Srinivasan, 2008).

Each person has their own unique style of handwriting, whether it is everyday handwriting or their personal signature. Even identical twins who share appearance and genetics don’t have the same handwriting. A person’s handwriting is like that person’s fingerprints: people might be able to copy it, but never write it in an identical way. The place where one grows up and the first language one learns melt together with the different distribution of force and ways of shaping words to create a unique style of handwriting for each person (Srihari, Huang and Srinivasan, 2008).

Penmanship is the technique of writing with the hand using a writing instrument. Today, this is most commonly done with a pen, or pencil, but throughout history has included many different implements. The various generic and formal historical styles of writing are called “hands” whilst an individual’s style of penmanship is referred to as “handwriting” (Nickell, 2003).

Handwriting recognition (HWR) is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed “off line” from a piece of paper by optical scanning (optical character recognition) or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed “on line”, for example by a pen-based computer screen surface, a generally easier task as there are more clues available (Holzinger, Stocker, Peischl and Simonic, 2012).

Handwriting recognition principally entails optical character recognition. However, a complete handwriting recognition system also handles formatting, performs correct segmentation into characters and finds the most plausible words (Holzinger, Stocker, Peischl and Simonic, 2012).

Off-line handwriting recognition involves the automatic conversion of text in an image into letter codes which are usable within computer and text-processing applications. The data obtained by this form is regarded as a static representation of handwriting. Off-line handwriting recognition is comparatively difficult, as different people have different handwriting styles. Off-line character recognition often involves scanning a form or document written sometime in the past. This means the individual characters contained in the scanned image will need to be extracted (Plamondon and Srihari, 2000).

After the extraction of individual characters occurs, a recognition engine is used to identify the corresponding computer character. Several different recognition techniques are currently available (Plamondon and Srihari, 2000).

In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall (Ben-Hur, Horn, Siegelmann and Vapnik, 2001).

In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces (Ben-Hur, Horn, Siegelmann and Vapnik, 2001).

When data are not labeled, supervised learning is not possible, and an unsupervised learning approach is required, which attempts to find natural clustering of the data to groups, and then map new data to these formed groups. The clustering algorithm which provides an improvement to the support vector machines is called support vector clustering and is often used in industrial applications either when data are not labeled or when only some data are labeled as a preprocessing for a classification pass (Ben-Hur, Horn, Siegelmann and Vapnik, 2001).

1.2  Statement of the Problem

Handwriting digits recognition is a challenging problem researchers had been research into this area for so long especially in the recent years. In our study there are many fields concern with numbers, for example, checks in banks, post-mail, address on home’s name plate and exam form which also filled by hand and so on, the subject of digits recognition appears. In other words, to let the computer understand the digits that is written manually by users and views them according to the computer process. A system for recognizing isolated digits may be as an approach for dealing with such application.

1.3       Aim and Objectives

The aim of this project is to develop a system that recognizes offline handwritten digits using Support Vector Machines (SVMs). The objectives are stated below:

  1. To acquire handwritten digits image for all the digits from different people.
  2. To pre-process the entire images for image standardization.
  3. To extract features from the entire images.
  4. To develop a Network for digit recognition using SVMs with the extracted features.

1.4       Scope and Limitation of the Study

This proposed work seeks to determine how a system for recognition of handwritten digits would be implemented on a modern system. We will focus on the single symbol off-line case of handwriting i.e. the input is a static image containing one symbol. This research will discuss the different stages of a complete recognition system to give an overview of the challenges faced. The limitation is stated below:

  i.                  Only one symbol will be recognized at a time.

ii.                  The classifier will only handle binary data and suppose the images have been pre-processed.

iii.                  Handwriting is subject to high conflict in writing style between different authors. Factors contain size, rotation, lengthening and tilt or the equivalence of dissimilar fonts.

1.4  Significance of the Study

In recent years there has been a constant increase in documents on paper. The number of mail pieces sent and checks written grows from year to year. Contrary to the belief that advances in electronic technology would help create a paperless society, they have helped increase the average amount of paper documents people and businesses handle every day. Therefore, there is a great need for machines able to read paper documents. 

1.6       Definition of Terms

        i.            Digit: a numeral that can be combined with other to write larger numbers, and that cannot itself be split into other numeral.

      ii.            Recognize: to match something or someone which one currently perceives to a memory of some previous encounter with the same entity.

    iii.            Handwriting: the act or process of writing done with hand, rather than typed or word-processed.

    iv.            Offline Handwritten Recognition: involves the automatic conversion of text in an image into letter codes which are usable within computer and text-processing application. 

OFFLINE HAND WRITTEN DIGITS RECOGNITION USING SUPPORT VECTOR MACHINE