iSight: An Object Recognition Application for Visually Impaired Individuals

0
453

TABLE OF CONTENTS

ABSTRACT                                                                                                                           v

LIST OF TABLES                                                                                                                 ix

LIST OF FIGURES                                                                                                                x

LIST OF ABBREVIATIONS                                                                                               xi

CHAPTER 1: INTRODUCTION                                                                                       1

Table 1.1 Risk Assessment                                                                                                  5

CHAPTER 2: LITERATURE REVIEW                                                                          7

  1. Introduction                                                                                                              7
    1. Historical Overview                                                                                                  7
      1. Assistive Technologies                                                                                              7
      2. Computer Vision                                                                                                      11
    2. Related Work                                                                                                               14
    3. Tool and Utilities                                                                                                    15
      1. TensorFlow Lite                                                                                                  15
      2. Android Text-to-Speech                                                                                          16
    4. Summary                                                                                                                 16

CHAPTER 3: REQUIREMENTS, ANALYSIS, AND DESIGN                                  18

3.8.6      User Interface Design                                                                                      31

CHAPTER 4:  IMPLEMENTATION AND TESTING                                                 34

Test case TC-001(User Login)                                                                                   40

Test case TC-002(User Registration)                                                                         41

Test case TC-003 (Object-Detection)                                                                        42

Test case TC-004(Text-to-Speech)                                                                             43

CHAPTER 5:  DISCUSSION, CONCLUSION, AND RECOMMENDATIONS       46

REFERENCES                                                                                                                    49

APPENDICES                                                                                                                     51

Test case TC-001(User Login)                                                                                   57

Test case TC-002(User Registration)                                                                         58

Test case TC-003 (Object-Detection)                                                                        59

Test case TC-004(Text-to-Speech)                                                                             60

LIST OF TABLES

TABLE 1.1RISK ASSESSMENT5
TABLE 3.1FUNCTIONAL REQUIREMENT SPECIFICATIONS24
TABLE 3.2NON-FUNCTIONAL REQUIREMENT SPECIFICATIONS25
TABLE 4.1TEST SUITE FOR LOGIN40

TABLE 4.2   TEST SUITE FOR REGISTRATION                                                                                                                 41

TABLE 4.3  TEST SUITE FOR OBJECT DETECTION                                                                                                        42

TABLE 4.4 TEST SUITE FOR TEXT-TO-SPEECH                                                                                                              43

TABLE 4.5 TEST TRACEABILITY MATRIX                                                                                                                      44

TABLE 4.6 TEST REPORT SUMMARY                                                                                                                              44

LIST OF FIGURES

FIGURE 3.1AGILE METHODOLOGY VS WATERFALL METHODOLOGY19
FIGURE 3.2EXAMPLE OBJECT DETECTION TENSORFLOW LITE22
FIGURE 3.3ANDROID TEXT-TO-SPEECH WORKFLOW23
FIGURE 3.4APPLICATION ARCHITECTURE26
FIGURE 3.5USE CASE DIAGRAM27
FIGURE 3.6ACTIVITY DIAGRAM28
FIGURE 3.7DATA-FLOW DIAGRAM29
FIGURE 3.8ENTITY-RELATIONSHIP DIAGRAM30
FIGURE 3.9LOGIN PAGE31
FIGURE 3.10REGISTRATION PAGE32
FIGURE 3.11OBJECT RECOGNITION INTERFACE33
FIGURE 4.1LOGGED DETECTION RESULTS36
FIGURE 4.2LOGGED DETECTION RESULTS WITHOUT DELAY37

LIST OF ABBREVIATIONS

CPUCentral Processing Unit
ERDEntity Relationship Diagram
ITInformation Technology
MLMachine Learning
AIArtificial Intelligence
CVComputer Vision
CNNConvolutional Neural Network
RNNRecurrent Neural Network
RAMRandom Access Memory
UMLUnified Modeling Language

CHAPTER 1: INTRODUCTION

                        Overview

The aim of this project is to use an amalgamation of advancements in both smartphone technology, as it pertains to processing power, and camera technology, in tandem with advances in machine learning and computer vision, to build a mobile application solution that helps the visually impaired to carry out their day-to-day activities, as well as attempting to contextualize this application for a Nigerian user base.

With the increasing ubiquity of smartphones available, various day-to-day problems have found unique solutions. There have also been multiple smartphone application solutions for the visually impaired that have taken varying approaches, some taking an emergency- service based approach, others recognizing specific items like currencies that are essentials to daily life. For example ‘eyeNote’ and ‘LookTel’ are applications that recognize currencies and audibly communicate this to the user, whereas a project like ‘BlindSighted’ notifies a user by buzzing whenever the user is within close range of an object. This project will be taking the approach of audibly communicating objects to users, when recognized by the mobile application. (Ghantous, Nahas, Ghamloush and Rida, 2014)

The following chapters of this thesis will succinctly provide analyses, design and implementation of this object detection system to help the visually impaired.

                        Background and Motivation

Individuals with visual impairments are defined by the World Health Organization (WHO) as those who suffer from low vision or blindness. (World Health Organization, 1992). Due to their ailment, these individuals face many challenges in their day-to-day activities.

Throughout human history a myriad of devices and methods to overcome these difficulties have been devised, from more traditional devices such as walking sticks and reading glasses to Braille, which is a system of touch based reading and writing, to more recently, assistive devices. Assistive devices (specialized high and low technology tools designed for individuals with disabilities) increase the ability of visually impaired individuals to better understand their environment. These devices range from specialized screen-reading software, magnification programs and daisy book readers (Martiniello et al., 2019). Despite their established utility, widespread adoption of these devices has been hindered by factors such as cost and negative perceptions associated with vision loss (Mulloy et al., 2014).

According to the World Health Organization, at least 2.2 billion people suffer from a visual impairment or blindness globally. Of these, at least 1 billion have a visual impairment that could’ve been prevented, or is yet to be addressed. (World Health Organization, 2020)

In the past few decades, smartphones and tablets have become increasingly popular and have become a staple of mainstream society. Overtime, as a result of technological advancements a large amount of in-built accessibility tools have been incorporated within these devices, which create and maximize accessibility for users with a diverse set of needs. (Martiniello et al., 2019). These devices, unlike traditional assistive devices, have already achieved widespread adoption, furthermore they are more affordable and are less likely to draw attention to the user, avoiding any negative perceptions. Alongside the in-built accessibility tools, smartphone operating systems provide developer platforms that allow developers to leverage the devices capabilities to build third-party applications for users; amongst these are assistive / accessibility applications.

Given the ubiquity of smartphones – there are currently about 3.5 billion smartphones worldwide – majority of which are running one of iOS and Android, it only makes sense to build applications, especially those geared towards accessibility on these platforms. Leveraging off this ubiquity allows us to build accessibility, faster, to those who need it most.

Furthermore smartphone camera technology and computer vision algorithms have both been improving at a rapid rate. With object recognition, one could simulate seeing for the visually impaired in a better fashion than the traditional methods currently available, without having to compromise for cost or societal perceptions. This would all be achieved by using technology that’s currently available; camera technology, computer vision algorithms and a voice assistant, to build an object recognition application that labels surrounding objects and then audibly communicates the label recognized to the user.

Adaptability and support is a facet of smartphones and smartphone applications that isn’t available in traditional assistive devices, alongside costs and stigma these are also factors that cause the abandonment of traditional assistive devices (Phillips and Proulx, 2018). Which is another source of motivation for this project, as smartphone applications are able to achieve continued support by benefit from ‘over the air’ updates to improve user experiences. Moreover, applications can be adapted to be contextualized to different demographics with respect to multiple criteria, for example, age, and geographic location. This is paramount especially in the case of object recognition applications built with artificial intelligence, object recognition models should be adaptable to various languages and audiences to ensure that all users can adequately benefit from it.

                        Statement of the Problem

The ability for an individual to recognize their objects and their surroundings is a quintessential aspect of being able to operate self-sufficiently. Carrying out even the most menial, routine tasks rely on this capability. Hence, operating independently can become extremely difficult for those who suffer from visual impairments.

Due to their ailment, visually impaired individuals can face many hurdles in tasks others might recognize to be simple daily tasks, where there have been attempts to solve this through reading glasses, walking sticks and even surgery. These methods may either be financially infeasible for some, while the other solutions may only be workaround type solutions. Assistive technologies have also been used and have been shown to increase users’ access to their environment and information, however they have failed to achieve widespread adoption, this is due to a plethora of reasons, namely, cost, lack of technical support and the stigma attached to using these devices in public.

Leveraging the widespread adoption of smartphone technology, these issues can be further curbed without having to compromise for cost, support, stigma and made instantly widely available by using computer vision and smartphone camera technology, to build an application that recognizes objects in a users surroundings and audibly communicates to them.

                      Aim and Objectives

This project proposes an object recognition system for the visually impaired, which will be built as a mobile application running on Android. Android Studio / Java will be used to design the graphic user interface as well as the functionalities. The application will provide an intuitive user interface that opens up to the camera and labelling nearby objects using object

recognition models audibly communicating them to the user. Over time the project will also evolve into building further contextualized models for various demographics.

                   Significance of the Project

The implementation of this project has the potential to benefit the visually impaired, and the Nigerian society. It would be immediately helpful to the visually impaired aiding in the execution of daily activities which will have an overall positive impact on their lives.

This project could also shed some light on how artificial intelligence and its various facets need to be built to contextualize the various different societies they’re implemented in, hopefully further encouraging Nigerian developers to participate in building models that can adequately capture the nuances and idiosyncrasies of Nigerian societies, better so than models builts by other developers could.

With helping the visually impaired community, and by encouraging Nigerian developers to build tools for Nigerians, this will overall have a positive lasting impact on Nigerian society as a whole. Furthermore it sheds light on the importance of artificial intelligence and machine learning towards each target demographic / audience.