DESIGN & IMPLEMENTATION OF A NEURAL MACHINE TRANSLATION SYSTEM (LET’S TALK) FOR THE TRANSLATION OF HYAM TO ENGLISH

0
594

TABLE OF CONTENTS

DECLARATION………………………………………………………………………………………………………. ii

CERTIFICATION…………………………………………………………………………………………………….. iii

APPROVAL……………………………………………………………………………………………………………. iv

DEDICATION…………………………………………………………………………………………………………. v

ACKNOWLEDGEMENT………………………………………………………………………………………….. vi

LIST OF TABLES…………………………………………………………………………………………………….. x

LIST OF FIGURES…………………………………………………………………………………………………… xi

ABBREVIATIONS & ACRONYMS………………………………………………………………………….. xii

ABSTRACT…………………………………………………………………………………………………………….. 1

CHAPTER 1: INTRODUCTION…………………………………………………………………………………. 1

  • : Overview……………………………………………………………………………………………………….. 1
    • : Background and Motivation……………………………………………………………………………… 1
    • : Problem Statement…………………………………………………………………………………………… 3
    • : Aim and Objectives…………………………………………………………………………………………. 3
    • : Significant of the Project………………………………………………………………………………….. 3
    • : Risk Assessment for project……………………………………………………………………………… 3
    • : Scope/Organization………………………………………………………………………………………….. 5

CHAPTER 2: LITERATURE REVIEW………………………………………………………………………… 5

  • : Introduction……………………………………………………………………………………………………. 5
    • : Historical Overview…………………………………………………………………………………………. 6
    • : Related Work………………………………………………………………………………………………… 10
      • : Microsoft Translator…………………………………………………………………………………. 10
      • Google Translator……………………………………………………………………………………… 11
      • : iTranslate……………………………………………………………………………………………….. 12
    • :      Architecture of the Work……………………………………………………………………………. 12
    • : Summary……………………………………………………………………………………………………… 14

CHAPTER 3: REQUIREMENTS, ANALYSIS & DESIGN……………………………………………. 14

  • : Overview……………………………………………………………………………………………………… 14

CHAPTER 4: IMPLEMENTATION AND TESTING……………………………………………………. 31

Chapter 5: DISCUSSION, RECOMMENDATIONS & CONCLUSION……………………………. 41

  • :  Overview…………………………………………………………………………………………………….. 41
    • :  Objective Assessment……………………………………………………………………………………. 41
    • :   Limitations & Challenges………………………………………………………………………………. 41
    • :  Future Improvements…………………………………………………………………………………….. 41
    • :  Recommendations………………………………………………………………………………………… 42
    • :  Summary…………………………………………………………………………………………………….. 42

REFERENCES……………………………………………………………………………………………………….. 43

APPENDICES………………………………………………………………………………………………………… 44

Appendix A: Project Documentation/ Executive Summary………………………………………… 44

Appendix B: Work Plan……………………………………………………………………………………….. 45

Appendix C: Gantt Chart………………………………………………………………………………………. 46

LIST OF TABLES

Table 1 Matrix for Risk Assessment…………………………………………………………………………… 3

Table 2  Matrix for Risk Amendment…………………………………………………………………………. 4

Table 3 Hardware Requirements……………………………………………………………………………… 19

Table 4 Hardware Requirements……………………………………………………………………………… 20

Table 5 Functional Requirements…………………………………………………………………………….. 25

Table 6  Non functional requirements……………………………………………………………………….. 25

LIST OF FIGURES

RBMT workflow…………………………………………………………………………………………………….. 7

SMT workflow………………………………………………………………………………………………………… 8

Microsoft Translator……………………………………………………………………………………………….. 10

Google Translator…………………………………………………………………………………………………… 11

iTranslate………………………………………………………………………………………………………………. 12

The Waterfall Model………………………………………………………………………………………………. 15

The Prototype Model…………………………………………………… Error! Bookmark not defined.

The Iterative model………………………………………………………………………………………………… 17

The Iterative model………………………………………………………………………………………………… 18

Gender Chart………………………………………………………………………………………………………… 22

Age Chart……………………………………………………………………………………………………………… 23

Current level of Education………………………………………………………………………………………. 23

Nationality of participants………………………………………………………………………………………. 24

Those interested in online learning……………………………………………………………………………. 24

Application Architecture for Customer…………………………………………………………………….. 26

Application Architecture for Admin………………………………. Error! Bookmark not defined.

Activity Diagram for Customer……………………………………………………………………………….. 27

Activity Diagram for Administrator………………………………. Error! Bookmark not defined.

Use Case Diagram………………………………………………………………………………………………….. 28

Sequence Diagram…………………………………………………………………………………………………. 29

Entity Relationship Diagram……………………………………………………………………………………. 30

A screenshot of the training of the model using google colab………………………………………. 32

A screenshot of the training of the model using google colab………………………………………. 32

A screenshot of the training of the model using google colab………………………………………. 36

A screenshot of the training of the model using google colab………………………………………. 37

A screenshot of the training of the model using google colab………………………………………. 38

A screenshot of the training of the model using google colab………………………………………. 38

ABBREVIATIONS & ACRONYMS

RNN – Recurrent Neural Network LSTM – Long Short Term Memory GRU – Gated Recurrent Unit

MT –    Machine Translation

NMT – Neural Machine Translation

ABSTRACT

In Southern Kaduna, the Hyam Community always welcome foreigners every year, due to their friendliness and hospitable nature. They have even welcomed Igbos in the past and when the Hausas wanted to kill them, they hid them and gave them a safe route back to the east. Now with this in mind, the Hyam people would find it hard to communicate easily, due to their little or lack of knowledge of the lingua franca, because of lack of exposure.

This project presents an overall model for the translation of natural languages (Hyam to English). It is a Neural Machine Translator, achieved using Recurrent Neural Network (RNN). The models are made up of large artificial neural neurons which is able to predict the probability of a sequence of words in a given sentence. This translation is achieved using hidden layers in both models, based on parallel documents provided for both languages. The models here are encoder and decoder.

CHAPTER 1: INTRODUCTION

           : Overview

Right from the beginning, communication is a very key part of our lives. Language is seen or noticed as one of the representatives of culture be it for a group, society or a nation. But with this in mind, language can serve as a bane for communication. This happens when two or more people, who understand different languages, and not the same language. This unfortunately, hinders growth and development in the world today.

In developing countries like Nigeria, the nation is a living proof of the above statement, because ethnicity discrimination is done every day among people in places of work, appointments, just to name a few. One of the effective ways, or possibly the only way to communicate in a different language, is having a human translator as the person to ‘bridge the gap’ between them. But in some cases, the translator can be biased or be selfish, because he or she can alter information, so as to get more profit.

We came out with a web – based application let’s talk, which acts as a virtual, unbiased intermediary, to enable locals of Hyam ethnicity to understand English language as well as foreigners who understand only English language to understand Ham language, both written and spoken.

           : Background and Motivation

Natural Language Processing or NLP for short, is an area of specification under Artificial Intelligence that deals with automatic manipulation human language which is the natural language, using a software. It assists the computers or digital devices to interpret, comprehend and make use of the natural language. NLP gives way for people to interact with computer using natural languages. An example of this scenario is the Alan Turing test or Turing Test for short. In this example, a human judge would communicate with another human using computer as well as an unmanned computer, both behind a wall. If the human judge is not able to distinguish between the human and the unmanned computer, then the unmanned computer, is said to have passed the test.

What mainly NLP does, is that, it breaks down a natural language into smaller, less complex parts, which are identified as tokens. Tokens could be in form of words, characters, words, etc, and later on, attempts to comprehend the relationships of the tokens involved.

In the early 70s, the first signs of NLP emerged, when John Grinder (an associative professor of linguistics), and a psychology student, Richard Bandler observed that people who had the same amount of training, education, and years of experience all had different results at the end.

They later placed their focus on communication. How people who succeeded in their respective fields, interacted with each other, based on some factors, verbal language, movements of their eyes, their body language, gestures, etc. They studied successful people like Virginia Satir who

established Family Therapy as well as Fritz Perls, who is also a founder of Gestalt Therapy and Milton H. Erickson, who is well known for his hypnotherapist work.

When the study was already made, they were able to point out their patterns of thinking, which is what made the NLP successful. They made a theory stating that the brain can learn systematic procedures, which brings about positive as well as physical and emotional perks. The resultant study from all of this was identified as Neuro-Linguistic Programming.

NLP is applied in so many areas today. There are; machine translation, recognizing speech, opinion mining, question answering, automatic summary of information, chat bots, market intelligence, text grouping, recognition of characters and spell checking

Prior to the introduction of digital computers, there were tasks that needed intelligence, an example of such is the translation of a language to another language. This was a task that was never believed to be done by a machine. Machine translation, is the use of automated machines to translate a source language into a target language. Machine translation of a language can also be achieved by developing algorithms, which enable the computer or device to understand the semantics of a language, for quality translation without any help from humans when translating. In the ‘70s the building block for the premiered MT was put in place. Nowadays, MT is mainly based on a statistical system that can make a language rule-based on large corpus of already made translated text, rather than analyzing the rules or structures of every language and develop an algorithm to enforce the rules.

In communication, ICT has done so much from developing social media apps that can enable people communicate from far away as well as meeting up with loved ones or friends virtually, it has also helped in translating of languages. It is a quick and effective way to translate information received. This is known as machine translation. Machine translation is still a growing industry, given the fact of how languages are structured, and the vast amount of languages yet to be analyzed.

In Nigeria, language is very vast here. There are roughly over five hundred languages spoken in Nigeria. Now with the beauty of having so many languages to speak it is very hard to communicate easily. In Nigeria, this creates ethnic rivalry and diversity, which hinders our growth as a nation. A web application called ‘Let’s talk’, is developed with the aim of translating a language called Ham language. This is a language spoken by Hyam people who live in southern Kaduna. They are about 400,000 people who speak the language in Nigeria. It seeks to translate Hyam language to English and from English language back to Hyam language, this way, a local and a foreigner, can easily communicate without any problems.

Another point to note is that, since the language is not a major language, the probability of it being extinct by the newer generation, would be high. This would be to the fact that parents may not be educated, only their children, who would learn only English and may not learn Hyam, it would be an opportunity for them to learn it also, not just foreigners of the language.

The last point is that linguists, who plan on studying the language, can use the application as means of understanding the structure of the language.

           : Problem Statement

  • Due to imperfection in this part of machine learning, it would be hard for the user to enable the system recognize his/her voice, due to noise, disfluencies, vocabulary size, and language perplexity.
    • The ability for the machine to translate effectively and correctly a language to another, is very bad. It may fail to recognize which synonym, collocation or word meaning should be used, given the lexicon it has inside. A study regarding machine translation, that is, from English to Lithuanian and vice – versa, shows that two -thirds of all the sentences were incorrectly translated, which means there is a more or less slim chance that mobile applications can translate accurately.
    • Quality issues – as opposed to a human translator, the computer software, cannot process the context in which a language is being used. For instance, a word may have many meanings and due to the way its pronounced mixed with emotions, the user may refer to a particular meaning of the word, which the computer software may not comprehend.

           : Aim and Objectives

  • To create a suitable mobile application that can be used to translate Ham language to Nigeria’s to lingua-franca, and vice-versa
    • To enable the computer understand, as well as learn the structure, phonetics, and spelling of the words.
    • Preservation of the language, as part of the Ham culture.

           : Significant of the Project

This project is significant, in the sense that people who know the English language, would be exposed to the English language and it would enable them to understand the language better as well as those who are fairly conversant with the language(ham language), or possibly those who cannot speak the language but understand English.

           : Risk Assessment for project

Risks examination are performed to effectively manage the project in the case of unforeseen problems that may happen before, during or after the project has been developed. It is important to have risks expunged, as much as possible from the project in order to maintain project efficiency and reliability, from both developer(s) and customers alike.

Examination of risks however, cannot be precise. If it were, then it would be easy to predict the future of the app. But having a place to begin talking of the symptoms of the problems, as well as how it would have an impact in the system’s performance, is much better.