You are on page 1of 13

Information System Project

PROJECT PROPOSAL

By A. Student

Research on speech recognition


technique while building speech
recognition bot

Project Instructor
Martin Stanton
Executive Summary

There are large numbers of people who hate the idea of using keyboard for every
small piece of work on their computer and there are other people who can't use
the computer because of their disabilities. The main idea of the proposed project
is to perform a detailed research in speech recognition in order to create a
"speech recognition bot" that can help individuals to use their computer without
using their muscles. The speech recognition bot is intended to answer the users’
queries and perform certain tasks given by users like pressing buttons,
minimizing and maximizing windows, opening a certain website or a program. In
some cases the conversation between the users might even turn funny, with the
app giving witty remarks or telling jokes. The tasks might be limited to some
extent but the application will be engaging and useful to the users.

2
Table of Contents

EXECUTIVE SUMMARY .................................................................................................................... 2


TITLE ..................................................................................................................................................... 4
BACKGROUND TO PROJECT .......................................................................................................... 4
AIMS....................................................................................................................................................... 5
OBJECTIVES......................................................................................................................................... 5
JUSTIFICATION .................................................................................................................................. 5
SCOPE AND TECHNICAL CHALLENGES ...................................................................................... 6
DELIVERABLE ..................................................................................................................................10
PROJECT PLAN .................................................................................................................................10
REFERENCES .....................................................................................................................................13

3
Title
Research on speech recognition technique while building speech recognition bot

Background to Project
The concept of speech recognition started back in the early 1950’s and it was all
about voice activation and recognition. Later on during 1980’s National Science
Foundation funded the project and used the technology on people with disability
and victims of sclerosis and cerebral palsy. During this time speech recognition
programs were really complex and limited. The users either need to use small
vocabulary or it was to be used by certain individual. By the end of 1990’s there
was great improvement in speech recognition and it was applied to every
possible fields. Today most of the machines, gadgets, even vehicles contain
speech recognition facility. Popular companies like Apple, Microsoft and Google
have got their own speech assistant for their devices.

Speech is the most common way of communication amongst human beings.


Researchers have been continuously trying to make computer to understand
every human speech. “Research in automatic speech recognition by machine has
been done for almost four decades.” (Lawrence and Biing 1993, pp.6). And there
have been a lot of improvements. “Automatic speech recognition (ASR) can be
defined as the independent, computer‐driven transcription of spoken language
into readable text in real time.” – (Stuckless,R ED 1994, pp.197). The definition
suggests that the speech recognition records the human speech and converts it
to readable text based on which computer follows it’s instructions. The speech
recognition technique makes use of following aspects to achieve its goal:
digitization (analog to digital) speech models (sound, words and language),
speech synthesis (reply from computer), dictionary (set of words) and grammars
(rules).

The proposed project is supposed to help on enhancing speech recognition


facility in future. The project suggests way to make use of human speech and
provide proper answer to their questions like current weather, time and control
users computer.

Based on my research the proposed system can be achieved by converting audio


input to text, comparing it to the dictionary, mixing it up in a proper grammatical
form and give proper response using speech synthesis, key listeners or robot.

4
Aims
The project is aimed to take user’s speech as an input, reply it with an answer;
control the user’s computer based on their speech.

While building an application the project is aimed to find the flaw in speech
recognition techniques and suggest possible way to overcome it.

Objectives
The proposed project is supposed to make use of speech recognition feature in
order to create a bot that could communicate and help normal person or
disabled peoples to use their computer without typing and clicking.

Rank Priority Type Objectives

1 Functional Technical Thorough Research on speech recognition topic and its


fundamentals for human-computer interaction.

2 Functional Technical Study on sphinx-4 (open source) Library to understand


the processing mechanism.

3 Functional Technical Find out the best way to implement the grammar and
dictionary for the speech recognition bot.

4 Functional Personal Enhance the knowledge of Java, and project management


techniques based on speech recognition libraries.

5 Non Technical Make speech recognition feature applicable to every


Functional possible accent. (System wont have this due to limited time
for research)

6 Non Technical Make platform independent program (OSX, Linux,


Functional Windows)

The program will only work on windows Platform due to


lack of resources)

Justification
“Having a machine to understand fluently spoken speech has driven speech
research for more than 50 years. Although ASR technology is not yet at the point
where machines understand all speech, in any acoustic environment, or by any
person, it is used on a day‐to‐day basis in a number of applications and services”
(Docsoft, 2009). There have been lots of improvements in speech recognition but
it is not good enough to understand every accent of individuals. It was
5
demonstrated at one of the YouTube videos (Branjoj, 2011) using iPhone 4S’s
speech recognition facility at Scotland. The topic is yet to be researched and
optimized and it is vast topic for research. The survey of Google(2014) has
shown that 55 percent of teens between the ages of 13 and 17 use voice search at
least twice every day. Hence, the users of speech recognition facilities are
increasing day by day and I found the topic fascinating for research purpose.

Speech recognition allows regular peoples to use the computer with ease and it
can help peoples with certain disability to use the machine even if they can’t use
their hands or fingers. Even if the proposed application is supposed to control
the PC and answer users queries, I hope the application to be helpful on
optimizing the speech recognition facility at some point and help other
researchers of this topic to get information on speech recognition technology.

The proposed project won’t use any real person, as an entity or collect their data.
It is just used by user and not even a single piece of user information is stored
hence there is no ethical issue or data protection issue for this project.

By the end of the project I am hoping to understand the fundamentals of speech


recognition facility clearly and create the proposed application.

Scope and Technical Challenges


As the project’s title suggest the project is intended to create an application that
could help people to control the computer with their voice and get answer to
their queries.

The proposed project is not an AI agent hence the application isn’t supposed to
enhance its recognition ability upon its use and it is not intended to learn new
words by itself. It is not supposed to open every single application, only selective
applications like: Google chrome, calculator, paint, Photoshop are likely to work.

It is aimed to open, minimize, maximize or close applications and scroll between


the pages, but it‘s not planned to perform complex activities like filling up the
forms. The application is supposed to understand only English language.

The application will be provided with crib sheet (quick reference). It won’t be
able to understand or perform extra functionalities other than in crib sheet.

The proposed system is supposed to make the use of Java SE for the speech
recognition technique. The project is supposed to be based on MVC architecture.
The open source library Sphinx-4 is supposed to be used for getting some

6
assistance for speech recognition technique. “Sphinx-4 is a pure Java speech
recognition library. It's very flexible in its configuration, and in order to carry out
speech recognition jobs quite a lot of objects depending on each other should be
instantiated” (sourceforge.net, 2014)

The basic example of application is shown below:

(Note: It’s not a design it’s just an example of how it works)

Figure 1: Example of proposed system

7
The use case diagram is the diagram that is used to show the interaction of the
user to the system. It is used to determine the possible work of user in the
system. The stick figure represents the actor and the box represents system
boundary. The tasks inside box represent the use cases. I’ve used use case
diagram for this project it in order to limit the scope of the system.

Figure 2: Use case diagram for Virtual Speech Bot

During this project there might be lots of challenges, which can be solved by
different techniques such as problem solving technique, Analytical skills, critical
review design decision and innovative thinking.

The technical challenges of the projects have been shown with the help of
problem solving technique (SWOT)

8
SWOT: SWOT is one of the problems solving technique, which allows us to find
out the possible strength, Weakness, opportunities and threats of the project we
are going to start. It allows us to find out what can be put the advantages of
projects we are doing and what might put our projects at risk.

Figure 2: SWOT Analysis for the proposed project

9
Deliverable
At the end of the project, completely working speech recognition bot with
capability of answering users queries is supposed to be built. This bot is also
expected to have functionality of controlling the users computer.

The following source files are expected at the end of the project in order to
create a speech recognition bot

SpeechRecognitionBot> src > com>sagun>speechbot (Folder Path)

|- SpeechRecogBot.Java

|- Grammar.java

|- SpeechRecogBot.config.xml

Some other associated set of documents is also supposed to be provided with


project, here is the list of those documents:

Requirement specification and Design documents: This document is very much


likely to include the specification part done on the project topic as well as some
background research. It will have software design approach and modeling
approaches like object, dynamic and functional models.

Testing documents: This documents will consists of various test cases,


benchmarking techniques, test plans and test scenarios this document will be
helpful for checking robustness and reliability of prepared system.

Working application and User Manual: This part of documentation will consist of
completely working java files and class files along with user manual for
helping new users of system.

Project Plan
The proper estimation of the project timeline has been shown below along with
Gantt chart. This is only the rough estimation so this might be changed during
the project course.

10
Figure 3: Project Schedule

11
Figure 4: Gantt chart

12
References
Admin (2014). sourceforge.net [online]. Available from:
<http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4>. [Accessed 4th
October].

Brajonnz (2011). iPhone has problems with Scottish accents. [online]. Available
from: <http://www.youtube.com/watch?v=1EVnK-pNxaA>. [Accessed 4th Oct].

Creative Freedom (2009). microphone-5-icon [online]. Available from:


<http://www.iconeasy.com/icon/png/System/Shimmer/Microphone.png>.
[Accessed 30th Sep 2014].

Docsoft, Inc. (2009, June). What is Automatic Speech Recognition? . (1st).


$publisher Oklahoma City, OK 73104 Available from:
<http://support.docsoft.com/help/whitepaper-asr.pdf>. Accessed: 5th October.

Eric Mack (2014). Google reveals our embarrassing voice search habits. 14th
October. Cnet [online]. [Accessed 15th October]. Available from:
<http://www.cnet.com/news/google-voice-search-siri-cortana-teen-study/>.

Lawrence Rabineer, Biing-hwang-juan (1993). Fundamentals of speech


recognition. 1st. ed. Eaglewood Cliffs, New Jersey 07632: PTR Prentice-Hall, Inc.

Stuckless, R. (1994). Developments in real-time speech-to-text communication for


people with impaired hearing. In M. Ross(Ed.). Baltimore, MD: York Press.

13

You might also like