iSight: An Object Recognition Application for Visually Impaired Individuals

Uwakmfon

3 years ago

ABSTRACT v

LIST OF TABLES ix

LIST OF FIGURES x

LIST OF ABBREVIATIONS xi

CHAPTER 1: INTRODUCTION 1

Overview 1
- Background and Motivation 2
- Statement of the Problem 4
- Aim and Objectives 4
- Significance of the Project 5
- Project Risks Assessment 5

Table 1.1 Risk Assessment 5

Scope/Project Organization 6

CHAPTER 2: LITERATURE REVIEW 7

Introduction 7
1. Historical Overview 7
  1. Assistive Technologies 7
  2. Computer Vision 11
2. Related Work 14
3. Tool and Utilities 15
  1. TensorFlow Lite 15
  2. Android Text-to-Speech 16
4. Summary 16

CHAPTER 3: REQUIREMENTS, ANALYSIS, AND DESIGN 18

Overview 18
- Proposed Model 18
- Methodology 21
  - Desk Research 21
- Tools and Techniques 21
- Ethical Consideration 23
- Requirement Analysis 24
- Requirements Specifications 24
  - Functional Requirement Specifications 24
  - Non-Functional Requirement Specifications 25
- System Design 26
  - Application Architecture 26
  - Use Case 27
  - Activity Diagram 28
  - Data Flow Diagram 29

3.8.6 User Interface Design 31

Summary 33

CHAPTER 4: IMPLEMENTATION AND TESTING 34

Overview 34
- Main Features 34
- Implementation Problems 35
- Overcoming Implementation Problems 35
- Testing 38
  - Tests Plans (for Unit Testing, Integration Testing, and System Testing) 38
  - Test Suite (for Unit Testing, Integration Testing, and System Testing) 40

Test case TC-001(User Login) 40

Test case TC-002(User Registration) 41

Test case TC-003 (Object-Detection) 42

Test case TC-004(Text-to-Speech) 43

Test Traceability Matrix (for Unit Testing, Integration Testing, and System Testing) 44
- Test Report Summary (for Unit Testing, Integration Testing, and System Testing) 44
- Error Reports and Corrections 45
- Use Guide 45
- Summary 45

CHAPTER 5: DISCUSSION, CONCLUSION, AND RECOMMENDATIONS 46

Overview 46
- Objective Assessment 46
- Limitations and Challenges 46
- Future Enhancements 47
- Summary 48

REFERENCES 49

APPENDICES 51

Test case TC-001(User Login) 57

Test case TC-002(User Registration) 58

Test case TC-003 (Object-Detection) 59

Test case TC-004(Text-to-Speech) 60

LIST OF TABLES

TABLE 1.1	RISK ASSESSMENT	5
TABLE 3.1	FUNCTIONAL REQUIREMENT SPECIFICATIONS	24
TABLE 3.2	NON-FUNCTIONAL REQUIREMENT SPECIFICATIONS	25
TABLE 4.1	TEST SUITE FOR LOGIN	40

TABLE 4.2 TEST SUITE FOR REGISTRATION 41

TABLE 4.3 TEST SUITE FOR OBJECT DETECTION 42

TABLE 4.4 TEST SUITE FOR TEXT-TO-SPEECH 43

TABLE 4.5 TEST TRACEABILITY MATRIX 44

TABLE 4.6 TEST REPORT SUMMARY 44

LIST OF FIGURES

FIGURE 3.1	AGILE METHODOLOGY VS WATERFALL METHODOLOGY	19
FIGURE 3.2	EXAMPLE OBJECT DETECTION TENSORFLOW LITE	22
FIGURE 3.3	ANDROID TEXT-TO-SPEECH WORKFLOW	23
FIGURE 3.4	APPLICATION ARCHITECTURE	26
FIGURE 3.5	USE CASE DIAGRAM	27
FIGURE 3.6	ACTIVITY DIAGRAM	28
FIGURE 3.7	DATA-FLOW DIAGRAM	29
FIGURE 3.8	ENTITY-RELATIONSHIP DIAGRAM	30
FIGURE 3.9	LOGIN PAGE	31
FIGURE 3.10	REGISTRATION PAGE	32
FIGURE 3.11	OBJECT RECOGNITION INTERFACE	33
FIGURE 4.1	LOGGED DETECTION RESULTS	36
FIGURE 4.2	LOGGED DETECTION RESULTS WITHOUT DELAY	37

LIST OF ABBREVIATIONS

CPU	Central Processing Unit
ERD	Entity Relationship Diagram
IT	Information Technology
ML	Machine Learning
AI	Artificial Intelligence
CV	Computer Vision
CNN	Convolutional Neural Network
RNN	Recurrent Neural Network
RAM	Random Access Memory
UML	Unified Modeling Language

CHAPTER 1: INTRODUCTION

Overview

The aim of this project is to use an amalgamation of advancements in both smartphone technology, as it pertains to processing power, and camera technology, in tandem with advances in machine learning and computer vision, to build a mobile application solution that helps the visually impaired to carry out their day-to-day activities, as well as attempting to contextualize this application for a Nigerian user base.

With the increasing ubiquity of smartphones available, various day-to-day problems have found unique solutions. There have also been multiple smartphone application solutions for the visually impaired that have taken varying approaches, some taking an emergency- service based approach, others recognizing specific items like currencies that are essentials to daily life. For example ‘eyeNote’ and ‘LookTel’ are applications that recognize currencies and audibly communicate this to the user, whereas a project like ‘BlindSighted’ notifies a user by buzzing whenever the user is within close range of an object. This project will be taking the approach of audibly communicating objects to users, when recognized by the mobile application. (Ghantous, Nahas, Ghamloush and Rida, 2014)

The following chapters of this thesis will succinctly provide analyses, design and implementation of this object detection system to help the visually impaired.

Background and Motivation

Individuals with visual impairments are defined by the World Health Organization (WHO) as those who suffer from low vision or blindness. (World Health Organization, 1992). Due to their ailment, these individuals face many challenges in their day-to-day activities.

Throughout human history a myriad of devices and methods to overcome these difficulties have been devised, from more traditional devices such as walking sticks and reading glasses to Braille, which is a system of touch based reading and writing, to more recently, assistive devices. Assistive devices (specialized high and low technology tools designed for individuals with disabilities) increase the ability of visually impaired individuals to better understand their environment. These devices range from specialized screen-reading software, magnification programs and daisy book readers (Martiniello et al., 2019). Despite their established utility, widespread adoption of these devices has been hindered by factors such as cost and negative perceptions associated with vision loss (Mulloy et al., 2014).

According to the World Health Organization, at least 2.2 billion people suffer from a visual impairment or blindness globally. Of these, at least 1 billion have a visual impairment that could’ve been prevented, or is yet to be addressed. (World Health Organization, 2020)

In the past few decades, smartphones and tablets have become increasingly popular and have become a staple of mainstream society. Overtime, as a result of technological advancements a large amount of in-built accessibility tools have been incorporated within these devices, which create and maximize accessibility for users with a diverse set of needs. (Martiniello et al., 2019). These devices, unlike traditional assistive devices, have already achieved widespread adoption, furthermore they are more affordable and are less likely to draw attention to the user, avoiding any negative perceptions. Alongside the in-built accessibility tools, smartphone operating systems provide developer platforms that allow developers to leverage the devices capabilities to build third-party applications for users; amongst these are assistive / accessibility applications.

Given the ubiquity of smartphones – there are currently about 3.5 billion smartphones worldwide – majority of which are running one of iOS and Android, it only makes sense to build applications, especially those geared towards accessibility on these platforms. Leveraging off this ubiquity allows us to build accessibility, faster, to those who need it most.

Furthermore smartphone camera technology and computer vision algorithms have both been improving at a rapid rate. With object recognition, one could simulate seeing for the visually impaired in a better fashion than the traditional methods currently available, without having to compromise for cost or societal perceptions. This would all be achieved by using technology that’s currently available; camera technology, computer vision algorithms and a voice assistant, to build an object recognition application that labels surrounding objects and then audibly communicates the label recognized to the user.

Adaptability and support is a facet of smartphones and smartphone applications that isn’t available in traditional assistive devices, alongside costs and stigma these are also factors that cause the abandonment of traditional assistive devices (Phillips and Proulx, 2018). Which is another source of motivation for this project, as smartphone applications are able to achieve continued support by benefit from ‘over the air’ updates to improve user experiences. Moreover, applications can be adapted to be contextualized to different demographics with respect to multiple criteria, for example, age, and geographic location. This is paramount especially in the case of object recognition applications built with artificial intelligence, object recognition models should be adaptable to various languages and audiences to ensure that all users can adequately benefit from it.

Statement of the Problem

The ability for an individual to recognize their objects and their surroundings is a quintessential aspect of being able to operate self-sufficiently. Carrying out even the most menial, routine tasks rely on this capability. Hence, operating independently can become extremely difficult for those who suffer from visual impairments.

Due to their ailment, visually impaired individuals can face many hurdles in tasks others might recognize to be simple daily tasks, where there have been attempts to solve this through reading glasses, walking sticks and even surgery. These methods may either be financially infeasible for some, while the other solutions may only be workaround type solutions. Assistive technologies have also been used and have been shown to increase users’ access to their environment and information, however they have failed to achieve widespread adoption, this is due to a plethora of reasons, namely, cost, lack of technical support and the stigma attached to using these devices in public.

Leveraging the widespread adoption of smartphone technology, these issues can be further curbed without having to compromise for cost, support, stigma and made instantly widely available by using computer vision and smartphone camera technology, to build an application that recognizes objects in a users surroundings and audibly communicates to them.

Aim and Objectives

This project proposes an object recognition system for the visually impaired, which will be built as a mobile application running on Android. Android Studio / Java will be used to design the graphic user interface as well as the functionalities. The application will provide an intuitive user interface that opens up to the camera and labelling nearby objects using object

recognition models audibly communicating them to the user. Over time the project will also evolve into building further contextualized models for various demographics.

Significance of the Project

The implementation of this project has the potential to benefit the visually impaired, and the Nigerian society. It would be immediately helpful to the visually impaired aiding in the execution of daily activities which will have an overall positive impact on their lives.

This project could also shed some light on how artificial intelligence and its various facets need to be built to contextualize the various different societies they’re implemented in, hopefully further encouraging Nigerian developers to participate in building models that can adequately capture the nuances and idiosyncrasies of Nigerian societies, better so than models builts by other developers could.

With helping the visually impaired community, and by encouraging Nigerian developers to build tools for Nigerians, this will overall have a positive lasting impact on Nigerian society as a whole. Furthermore it sheds light on the importance of artificial intelligence and machine learning towards each target demographic / audience.

DOWNLOAD PROJECT

TABLE OF CONTENTS