Pra a Aaaaaaaaaaaaaaa
Transcript of Pra a Aaaaaaaaaaaaaaa
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
1/27
INDEX
1 Introduction 1
1.1 History 2
2. Speech Recognition 4
2.1 Performance of speech recognition systems 5
2.2 Hidden Markov model (HMM)-based speech recogni ion !
2." D#namic ime $arping (D%&)-based speech recogni ion '
3 Speech Understanding 11
Text Generation 13
Speech Syntheses 14
* Language Resources 15
!. Applications
!. "onclusion 3
'. Re#erences 4
1
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
2/27
1. In rod+c ion
In#or$ation Technology deals %ith the ac&uisition' organi(ation' storage'
processing' trans$ission and deli)ery o# in#or$ation. *u$an +eings collect )arious type
o# data %ith the intention o# extraction in#or$ation rele)ant to decision $a,ing. A large
part o# data processing is conducted using co$puters than,s to their enor$ous capa+ility
#or nu$erical co$putation. *o%e)er' co$puters e)en today play the role o# an assistant
in decision $a,ing rather than the role o# a decision $a,er' and rightly so. They #ull this
role +y presenting the in#or$ation and ,no%ledge gleaned #ro$ data processing to the
hu$ans in a #or$ %hich is easily interpreta+le +y the hu$an +eings. -uite o#ten' people
issue co$$and to the co$puter to rune the in#or$ation #ollo%ing so$e $ethodology
%hich is dyna$ically deter$ined depending on the pro+le$ at hand. Thus' the hu$an
decision $a,ing process %ith the help o# co$puters in)ol)es a dialogue +et%een $an
and $achine.
"o$$unication a$ong hu$an +eings is inherently $ulti $odal' )isual and aural
$odes +eing the pri$ary $odes. "urrently' the principal $eans o# hu$an $achine
co$$unication is hea)ily +iased to%ards the con)enience o# the $achine rather than that
o# $an. /ouse and ,ey+oard are pri$ary input de)ices and )isual display unit is the
pri$ary output de)ice. Usage o# such inter#aces re&uires special s,ills and $ental attitude
%hich $any people are not endo%ed %ith. This $achine centric $ode o# co$$unication
needs to +e changed in #a)or o# hu$an centric inter#aces so that the +eneath o# the po%er
o# co$puters is shared +y all people. 0hile )isual $ode is $ost e##ecti)e in capturing
In#or$ation' speech re$ains the pre#erred and $ost con)enient $eans o#
con)eying in#or$ation. The ad)antage o# and the co$pelling reason #or2 )er+al2
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
3/27
co$$unication has +eco$e e)en $ore stronger today due to con)ergence o# co$puters
and teleco$$unication syste$s %hich allo%s people to access in#or$ation on co$puters
located re$otely. The )er+al co$$unication in)ol)es natural language' and this +rings to
#ore the role o# linguistics in the in#or$ation technology.
ro$ the a+o)e discussion' it is clear that hu$an centric inter#ace to co$puter is
the they share in#or$ation' thoughts and ideas artlessly a$ong the$sel)es. acilitating
hu$an $achine interaction using natural language in)ol)es se)eral #acets o# hu$an
language technology speech co$pression' recognition and understanding o# speech and
script' $achine translation' text generation' synthesis o# speech and cursi)e script. oth
#or$s o# language spo,en and %ritten are use#ul #or interaction %ith $achine. *ere' %e
connect oursel)es to the spo,en language and discuss the role o# linguistic ,no%ledge in
de)eloping speech inter#aces. The rele)ance o# linguistics in speech recognition' speech
understanding and speech synthesis %ill +e dealt %ith in the #ollo%ing sections.
1.1 History
The #irst speech recogni(er appeared in 165 and consisted o# a de)ice #or the recognition
o# single spo,en digits another early de)ice %as the I / Shoe+ox ' exhi+ited at the 1674
8e% 9or, 0orld:s air. ;ne o# the $ost nota+le do$ains #or the co$$ercial application
o# speech recognition in the United States has +een health care and in particular the %or,
o# the $edical transcriptionist /T2 According to industry experts' at its inception'
speech recognition SR2 %as sold as a %ay to co$pletely eli$inate transcription rather
than $a,e
the transcription process $ore e##icient' hence it %as not accepted. It %as also the case
that SR at that ti$e %as o#ten technically de#icient. Additionally' to +e used e##ecti)ely' it
3
http://en.wikipedia.org/wiki/IBM_Shoeboxhttp://en.wikipedia.org/wiki/1964_New_York_World's_Fairhttp://en.wikipedia.org/wiki/1964_New_York_World's_Fairhttp://en.wikipedia.org/wiki/Medical_transcriptionhttp://en.wikipedia.org/wiki/IBM_Shoeboxhttp://en.wikipedia.org/wiki/1964_New_York_World's_Fairhttp://en.wikipedia.org/wiki/1964_New_York_World's_Fairhttp://en.wikipedia.org/wiki/Medical_transcription -
8/13/2019 Pra a Aaaaaaaaaaaaaaa
4/27
re&uired changes to the %ays physicians %or,ed and docu$ented clinical encounters'
%hich $any i# not all %ere reluctant to do. The +iggest li$itation to speech recognition
auto$ating transcription' ho%e)er' is seen as the so#t%are. The nature o# narrati)e
dictation is highly interpreti)e and o#ten re&uires arti#icial syntax syste$s> %hich are usually
do$ain speci#ic and >natural language processing> %hich is usually language speci#ic.
?ach o# these types o# application presents its o%n particular goals and challenges.
2. ,peech ecogni ion
4
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
5/27
Speech recognition' the process o# translating a speech signal into a se&uence o#
%ords is at the heart o# speech input de)ices. Although tre$endous progress has +een
$ade in the area o# speech recognition SR2 technology' $ost o# it has co$e #ro$
ad)ances in $odeling speech sounds and their innocence on sounds in the i$$ediate
)icinity' and not Re$e$+er the pro)er+ A picture is %orth $ore than thousand %ords>
I$agine an atte$pt to con)ey so$ething to a person outside a glass %all using only
gestures %ithout the +eneath o# speech2 #ro$ ade&uate $odeling o# natural language.
The gra$$ar is nor$ally $odeled in ter$s o# statistical roperties o# language not
+ecause engineers pre#er statistical gra$$ar +ut +ecause there is no +etter %or,ing
alternati)e in the #or$ o# language $odels %ith a strong #oundation in #or$al linguistics.
@hrase structure gra$$ars' #or exa$ple' co$prise o# se)eral hundreds or thousands o#
rules descri+ing de#erent phrase types. ?ach o# these rules is annotated +y #eatures and
so$eti$es also +y expressions in a progra$$ing language. 0hen such gra$$ars reach a
certain si(e they +eco$e di##icult to $aintain' to extend and to reuse. The resulting
syste$s $ight +e su##iciently enceinte #or so$e applications +ut they lac, the speed o#
processing needed #or interacti)e syste$s such as applications in)ol)ing spo,en input2
or syste$s that ha)e to process large )olu$es o# texts as in $achine translation2.
"ontext #ree gra$$ars and their pro+a+ilistic )ersions ha)e +een tried and their
success in $odeling unseen data has +een only partial. ?sti$ation o# 8 gra$
pro+a+ilities'
the $ost popular statistical language $odel' has re$ained a sparse esti$ation pro+le$
despite the usage o# a )ery large corpus rele)ant to the tas, do$ain. or exa$ple' a#ter
o+ser)ing all trigra$s i.e.' consecuti)e triplets2 in 3! $illion %ords: %orth o# ne%spaper
articles' a #ull one third o# trigra$s in ne% articles #ro$ the sa$e source are no)el.5
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
6/27
/oreo)er' current language $odels are extre$ely sensiti)e to changes in the style' topic
or genre o# the text on %hich they are trained. A statistical language $odel trained %ith
ne%s%ire text #ro$ one co$pany %ill see its perplexity the geo$etric a)erage +ranching
#actor o# the language according to the $ode2 dou+led %hen applied to ne%s o# the sa$e
ti$e period #ro$ a si$ilar agency The inade&uacy o# language $odeling is e)ident in
the per#or$ance o# speech recognition syste$s in co$petiti)e BAR@A e)aluations. In the
$ost recent test o# SR syste$s %ith noisy telephone speech' the +est SR syste$ sho%ed
only 7 C %ord accuracy. /ost SR syste$s expect the user to spea, gra$$atically
correct sentences. This puts a lot o# load on users to #or$ulate such syntactically correct
sentence %ith no out o# )oca+ulary %ords' prior to spea,ing to the co$puter. A user
#riendly speech input syste$ should +e a+le to handle speech decencies and exile
gra$$ar. This is %here co$putational linguists can play a crucial role.
2.1 Performance of speech recognition systems
The per#or$ance o# speech recognition syste$s is usually speci#ied in ter$s o# accuracy
and speed. Accuracy $ay +e $easured in ter$s o# per#or$ance accuracy %hich is usually
rated %ith %ord error rate 0?R2' %hereas speed is $easured %ith the real ti$e #actor .
;ther $easures o# accuracy include Single 0ord ?rror Rate S0?R2 and "o$$and
Success Rate "SR2.
/ost speech recognition users %ould tend to agree that dictation $achines can
achie)e )ery high per#or$ance in controlled conditions. There is so$e con#usion'
ho%e)er' o)er the interchangea+ility o# the ter$s >speech recognition> and >dictation>.
"o$$ercially a)aila+le spea,er dependent dictation syste$s usually re&uire only a
short period o# training so$eti$es also called Denroll$ent:2 and $ay success#ully capture6
http://en.wikipedia.org/wiki/Word_error_ratehttp://en.wikipedia.org/wiki/Real_time_factorhttp://en.wikipedia.org/w/index.php?title=Single_Word_Error_Rate&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Command_Success_Rate&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Command_Success_Rate&action=edit&redlink=1http://en.wikipedia.org/wiki/Word_error_ratehttp://en.wikipedia.org/wiki/Real_time_factorhttp://en.wikipedia.org/w/index.php?title=Single_Word_Error_Rate&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Command_Success_Rate&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Command_Success_Rate&action=edit&redlink=1 -
8/13/2019 Pra a Aaaaaaaaaaaaaaa
7/27
continuous speech %ith a large )oca+ulary at nor$al pace %ith a )ery high accuracy.
/ost co$$ercial co$panies clai$ that recognition so#t%are can achie)e +et%een 6!C to
66C accuracy i# operated under opti$al conditions. D;pti$al conditions: usually assu$e
that users
ha)e speech characteristics %hich $atch the training data'
can achie)e proper spea,er adaptation' and
0or, in a clean noise en)iron$ent e.g. &uiet o##ice or la+oratory space2.
This explains %hy so$e users' especially those %hose speech is hea)ily accented' $ight
achie)e recognition rates $uch lo%er than expected. Speech recognition in )ideo has
+eco$e a popular search technology used +y se)eral )ideo search co$panies.
Li$ited )oca+ulary syste$s' re&uiring no training' can recogni(e a s$all nu$+er o# %ords
#or instance' the ten digits2 as spo,en +y $ost spea,ers. Such syste$s are popular #or
routing inco$ing phone calls to their destinations in large organi(ations.
oth acoustic $odeling and language $odeling are i$portant parts o# $odern statistically
+ased speech recognition algorith$s. *idden /ar,o) $odels *//s2 are %idely used in
$any syste$s. Language $odeling has $any other applications such as s$art ,ey+oard
and docu$ent classi#ication.
2.2 Hidden Markov model (HMM)-based speech recogni ion
7
http://en.wikipedia.org/wiki/Acoustic_Modelhttp://en.wikipedia.org/wiki/Language_modelhttp://en.wikipedia.org/w/index.php?title=Smart_keyboard&action=edit&redlink=1http://en.wikipedia.org/wiki/Document_classificationhttp://en.wikipedia.org/wiki/Acoustic_Modelhttp://en.wikipedia.org/wiki/Language_modelhttp://en.wikipedia.org/w/index.php?title=Smart_keyboard&action=edit&redlink=1http://en.wikipedia.org/wiki/Document_classification -
8/13/2019 Pra a Aaaaaaaaaaaaaaa
8/27
/odern general purpose speech recognition syste$s are generally +ased on
*idden /ar,o) /odels. These are statistical $odels %hich output a se&uence o# sy$+ols
or &uantities. ;ne possi+le reason %hy *//s are used in speech recognition is that a
speech signal could +e )ie%ed as a piece%ise stationary signal or a short ti$e stationary
signal. That is' one could assu$e in a short ti$e in the range o# 1E $illiseconds' speech
could +e approxi$ated as a stationary process . Speech could thus +e thought o# as a
/ar,o) $odel #or $any stochastic processes.
Another reason %hy *//s are popular is +ecause they can +e trained
auto$atically and are si$ple and co$putationally #easi+le to use. In speech recognition'
the hidden /ar,o) $odel %ould output a se&uence o# n di$ensional real )alued )ectors
%ith n +eing a s$all integer' such as 1E2' outputting one o# these e)ery 1E $illiseconds.
The )ectors %ould consist o# ca$pestral coe##icients' %hich are o+tained +y ta,ing a
ourier trans#or$ o# a short ti$e %indo% o# speech and decor relating the spectru$ using
a cosine trans#or$ ' then ta,ing the #irst $ost signi#icant2 coe##icients. The hidden
/ar,o) $odel %ill tend to ha)e in each state a statistical distri+ution that is a $ixture o#
diagonal co)ariance Gaussians %hich %ill gi)e li,elihood #or each o+ser)ed )ector. ?ach
%ord' or #or $ore general speech recognition syste$s2' each phone$e ' %ill ha)e a
di##erent output distri+utionF a hidden /ar,o) $odel #or a se&uence o# %ords or
phone$es is $ade +y concatenating the indi)idual trained hidden /ar,o) $odels #or the
separate %ords and phone$es.
Bescri+ed a+o)e are the core ele$ents o# the $ost co$$on' *// +ased
approach to speech recognition. /odern speech recognition syste$s use )arious
co$+inations o# a nu$+er o# standard techni&ues in order to i$pro)e results o)er the
+asic approach descri+ed a+o)e. A typical large )oca+ulary syste$ %ould need context8
http://en.wikipedia.org/wiki/Stationary_processhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Cepstrumhttp://en.wikipedia.org/wiki/Fourier_transformhttp://en.wikipedia.org/wiki/Cosine_transformhttp://en.wikipedia.org/wiki/Phonemehttp://en.wikipedia.org/wiki/Stationary_processhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Cepstrumhttp://en.wikipedia.org/wiki/Fourier_transformhttp://en.wikipedia.org/wiki/Cosine_transformhttp://en.wikipedia.org/wiki/Phoneme -
8/13/2019 Pra a Aaaaaaaaaaaaaaa
9/27
dependency #or the phone$es so phone$es %ith di##erent le#t and right context ha)e
di##erent reali(ations as *// states2F it %ould use cepstral nor$ali(ation to nor$ali(e
#or di##erent spea,er and recording conditionsF #or #urther spea,er nor$ali(ation it $ight
use )ocal tract length nor$ali(ation TL82 #or $ale #e$ale nor$ali(ation and
$axi$u$ li,elihood linear regression /LLR2 #or $ore general spea,er adaptation. The
#eatures %ould ha)e so called delta and delta delta coe##icients to capture speech
dyna$ics and in addition $ight use heteroscedastic linear discri$inate analysis *LBA2F
or $ight s,ip the delta and delta delta coe##icients and use splicing and an LBA +ased
pro
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
10/27
2.3 D#namic ime $arping (D%&)-based speech recogni ion
Byna$ic ti$e %arping is an approach that %as historically used #or speech
recognition +ut has no% largely +een displaced +y the $ore success#ul *// +ased
approach. Byna$ic ti$e %arping is an algorith$ #or $easuring si$ilarity +et%een t%o
se&uences %hich $ay )ary in ti$e or speed. or instance' si$ilarities in %al,ing patterns
%ould +e detected' e)en i# in one )ideo the person %as %al,ing slo%ly and i# in another
they %ere %al,ing $ore &uic,ly' or e)en i# there %ere accelerations and decelerations
during the course o# one o+ser)ation. BT0 has +een applied to )ideo' audio' and
graphics H indeed' any data %hich can +e turned into a linear representation can +e
analy(ed %ith BT0.
A %ell ,no%n application has +een auto$atic speech recognition' to cope %ith
di##erent spea,ing speeds. In general' it is a $ethod that allo%s a co$puter to #ind an
opti$al $atch +et%een t%o gi)en se&uences e.g. ti$e series2 %ith certain restrictions'
i.e. the se&uences are >%arped> non linearly to $atch each other. This se&uence
align$ent $ethod is o#ten used in the context o# hidden /ar,o) $odels.
Further information
@opular speech recognition con#erences held each year or t%o include I"ASS@'
?uro speech=I"SL@ no% na$ed Interspeech2 and the I??? ASRU. "on#erences in the
#ield o# 8atural language processing ' such as A"L' 8AA"L' ?/8L@' and *LT' are
+eginning to include papers on speech processing. I$portant unda$entals o# Speech Recognition> +y La%rence10
http://en.wikipedia.org/wiki/Natural_language_processinghttp://en.wikipedia.org/wiki/IEEEhttp://en.wikipedia.org/wiki/IEEEhttp://en.wikipedia.org/wiki/Lawrence_Rabinerhttp://en.wikipedia.org/wiki/Natural_language_processinghttp://en.wikipedia.org/wiki/IEEEhttp://en.wikipedia.org/wiki/IEEEhttp://en.wikipedia.org/wiki/Lawrence_Rabiner -
8/13/2019 Pra a Aaaaaaaaaaaaaaa
11/27
Ra+iner can +e use#ul to ac&uire +asic ,no%ledge +ut $ay not +e #ully up to date 16632.
Another good source can +e >Statistical /ethods #or Speech Recognition> +y rederic,
eline, and >Spo,en Language @rocessing EE12> +y Juedong *uang etc. /ore up to
date is >"o$puter Speech>' +y /an#red R. Schroeder ' second edition pu+lished in EE4.
The recently updated text+oo, o# >Speech and Language @rocessing EE!2> +y ura#s,y
and /artin presents the +asics and the state o# the art #or ASR. A good insight into the
techni&ues used in the +est $odern syste$s can +e gained +y paying attention to
go)ern$ent sponsored e)aluations such as those organi(ed +y BAR@A the largest
speech recognition related pro
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
12/27
3 ,peech nders anding
Speech understanding in)ol)es integration o# speech recognition' and natural
language 8L2 understanding. This integration has great ad)antages To 8L' SR can
+ring prosodic in#or$ation in#or$ation i$portant #or syntax and se$antics +ut not %ell
represented in text2F 8L can +ring to SR additional ,no%ledge sources e.g.' syntax and
se$antics2. The integration o# these technologies presents technical challenges' and
challenges related to the &uite de#erent cultures' techni&ues and +elie#s o# the people
representing the co$ponent technologies. In large part' 8L research has +een pursued in
co$puter science and linguistics depart$entsF the goal is to $odel language
understanding $oti)ated +y a desire to understand cogniti)e processes. *ence' the
underlying theories tend to +e #ro$ linguistics and psychology. @ractical applications
ha)e +een less i$portant than increasing intuitions a+out hu$an processes. There#ore'
co)erage o# pheno$ena o# theoretical interest usually the $ore rare pheno$ena2 has
traditionally +een $ore i$portant than +road co)erage. ;n the other hand' speech
recognition research has largely +een practiced in engineering depart$ents %ith practical
applications in $ind. Techni&ues $oti)ated +y ,no%ledge o# hu$an processes ha)e
there#ore +een less i$portant than techni&ues that can +e auto$atically de)eloped or
tuned' and +road co)erage o# a representati)e sa$ple is $ore i$portant than co)erage o#
any particular pheno$enon. The integration o# SR and 8L needs to o)erco$e not only
technical challenges +ut also the de#erenceNs in $oti)ation' interests'
theoretical underpinnings' techni&ues' tools' and criteria #or success o# the t%o groups.
*o%e)er' +oth groups ha)e $uch to gain #ro$ colla+oration and such a trend is )isi+le
around the %orld. 3.1 Integration o# Speech Recognition and 8atural Language
@rocessing SR is concerned %ith acoustic attri+utes o# %ords to a large extent' and %ith
lexical and syntactic in#or$ation to lesser extent. ;n the other hand' hu$an speech12
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
13/27
understanding in)ol)es the integration o# a great )ariety o# ,no%ledge sources' including
,no%ledge o# the %orld or context' ,no%ledge o# the spea,er and=or topic' lexical
#re&uency' pre)ious uses o# a %ord or a se$antically related topic' #acial expressions'
prosody. Thus' integration o# SR and 8L has +een a consistent goal. *o%e)er' as
gra$$atical co)erage increases' standard 8L techni&ues can +eco$e co$putationally
di##icult. urther' %ith increased co)erage' 8L tends to pro)ide less constraint #or SR.
Si$ple $inded concatenation o# an existing speech recognition syste$ and an existing
8L syste$ is su+opti$al due to directed o%n o# in#or$ation. 8ot only errors in SR
syste$ can propagate' +ut also there is no %ay the higher le)el ,no%ledge sources help
the SR syste$ in pruning the search space. /ost i$portantly' $ost 8L syste$s deal %ith
%ritten language rather than spo,en language. In the #or$er case' one can expect
gra$$atically correct sentences' %hereas in an interacti)e dialogue' speech disuencies
such as restart' re)ision' repetitions' leer sounds and hesitations are co$$on. /ost 8L
syste$s are concerned %ith correct analyses o# co$plete sentences than to $ethods #or
reco)ery o# interpretations %hen parses are inco$plete %hich the need o# spo,en
language understanding is. Thus there is a need to re#or$ulate existing ,no%ledge o# 8L
syste$s and to de)ise co$putational $odels o# spo,en language. Traditionally' linguists
ha)e studied the properties o# natural language and docu$ented the o+ser)ations in
)arious #or$s. This ,no%ledge has to +e trans#or$ed to an algorith$ic #or$ as #ar as
possi+le so that
the collecti)e ,no%ledge and %isdo$ o# the linguistic co$$unity +eco$es $ore use#ul
to the hu$an ,ind. This ,ind o# colla+orati)e %or, +et%een linguists and engineers is
especially rele)ant in a $ulti lingual country such as India.
13
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
14/27
%e/ 0enera ion
;ne o# the purposes o# interacting %ith co$puters is to access in#or$ation. This
in#or$ation has to +e dra%n #ro$ a data+ase and +e presented to the user. A spo,en
&uery o# a user is processed +y a speech understanding syste$ %hich #or$ulates a
data+ase &uery. The in#or$ation in the data+ase has to +e trans#or$ed into natural
language %hich can +e presented to o# spo,en output gi)en the data representation'
context and dialogue state. In si$ple' speci#ic tas, do$ains' response sentence te$plates
can +e used in si$ple' #or exa$ple' in#or$ation a+out the a)aila+ility o# rail%ay
reser)ation2 to generate text. *o%e)er' a )ersatile text generation $odule should generate
coherent $ulti sentential responses' and interpreting and responding to users: su+se&uent
utterances in the context o# an ongoing interaction. Spo,en language generation re&uires
%hat concepts to include and ho% to reali(e the$ in %ords. In addition' it needs to
deter$ine into national #or$s #or speech synthesis.
14
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
15/27
,peech ,#n hesis
The tas, o# a speech synthesis $odule is to synthesi(e an intelligi+le' natural' easily
interpreted and appropriate spo,en )ersion o# the response ta,ing ad)antage o# the
context and dialogue state to e$phasi(e certain in#or$ation. The acoustic e)idence needs
to in#or$ a+stract units in syntax' se$antics' discourse' and prag$atics. 0hile the
intelligi+ility o# speech generated +y current speech synthesis syste$s is good' the
naturalness lea)es $uch to +e desired. The issue o# intelligi+ility is pri$arily related to
the generation and co$+ination o# speech sounds %hereas i$parting naturalness in)ol)es
incorporating supraseg$ental in#or$ation. This in)ol)es prosodic phrasing' i.e.'
chun,ing a long sentence into prosodic phrases. @atterns o# )ariation in #unda$ental
#re&uency' duration' a$plitude or intensity' pauses' and spea,ing rate ha)e +een sho%n to
carry in#or$ation a+out such prosodic ele$ents as lexical stress' phrase +rea,s' and
declarati)e or interrogati)e sentence #or$. A high &uality speech synthesis syste$ in any
language has to deal %ith these issues in order to +eco$e accepta+le to people as a $eans
o# deli)ery o# in#or$ation.
15
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
16/27
* ang+age eso+rces
The ter$ linguistic resources re#ers to usually large2 sets o# language data and
descriptions in $achine reada+le #or$' to +e used in +uilding' i$pro)ing' or e)aluating
natural language 8L2 and speech recognition and synthesis syste$s. ?xa$ples o#
linguistic resources are %ritten and spo,en corpora' lexical data+ases and gra$$ars. The
need #or such linguistic resources in )ast &uantities is e)en $ore i$portant #or speech
recognition syste$s as they use statistical $odels #or representing acoustic units as %ell
as language.
In the health care do$ain' e)en in the %a,e o# i$pro)ing speech recognition
technologies' $edical transcriptionists /Ts2 ha)e not yet +eco$e o+solete. /any
experts in the #ield Oanticipate that %ith increased use o# speech recognition technology'
the ser)ices pro)ided $ay +e redistri+uted rather than replaced. Speech recognition is
used to ena+le dea# people to understand the spo,en %ord )ia speech to text con)ersion'
%hich is )ery help#ul.
Speech recognition can +e i$ple$ented in #ront end or +ac, end o# the $edical
docu$entation process. ront ?nd SR is %here the pro)ider dictates into a speech
recognition engine' the recogni(ed %ords are displayed right a#ter they are spo,en' and
the dictator is responsi+le #or editing and signing o## on the docu$ent. It ne)er goes
through an /T=editor.
16
http://en.wikipedia.org/wiki/Health_carehttp://en.wikipedia.org/wiki/Health_care -
8/13/2019 Pra a Aaaaaaaaaaaaaaa
17/27
ac, ?nd SR or Be#erred SR is %here the pro)ider dictates into a digital dictation
syste$' and the )oice is routed through a speech recognition $achine and the recogni(ed
dra#t docu$ent is routed along %ith the original )oice #ile to the /T=editor' %ho edits the
dra#t and #inali(es the report. Be#erred SR is +eing %idely used in the industry currently.
/any ?lectronic /edical Records ?/R2 applications can +e $ore e##ecti)e and
$ay +e per#or$ed $ore easily %hen deployed in con
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
18/27
. Achie)e$ent o# )ery high recognition accuracy 65C or $ore2 %as the $ost
critical #actor #or $a,ing the speech recognition syste$ use#ul %ith lo%er
recognition rates' pilots %ould not use the syste$.
3. /ore natural )oca+ulary and gra$$ar' and shorter training ti$es %ould +e
use#ul' +ut only i# )ery high recognition rates could +e $aintained.
La+oratory research in ro+ust speech recognition #or $ilitary en)iron$ents has
produced pro$ising results %hich' i# extenda+le to the coc,pit' should i$pro)e the utility
o# speech recognition in high per#or$ance aircra#t.
0or,ing %ith S%edish pilots #lying in the AS 36 Gripen coc,pit' ?ngland EE42
#ound recognition deteriorated %ith increasing G loads. It %as also concluded that
adaptation greatly i$pro)ed the results in all cases and introducing $odels #or +reathing
%as sho%n to i$pro)e recognition scores signi#icantly. "ontrary to %hat $ight +e
expected' no e##ects o# the +ro,en ?nglish o# the spea,ers %ere #ound. It %as e)ident that
spontaneous speech caused pro+le$s #or the recogni(er' as could +e expected. A
restricted )oca+ulary' and a+o)e all' a proper syntax' could thus +e expected to i$pro)e
recognition accuracy su+stantially. O P
The ?uro #ighter Typhoon currently in ser)ice %ith the U RA e$ploys a spea,er
dependent syste$' i.e. it re&uires each pilot to create a te$plate. The syste$ is not used
#or any sa#ety critical or %eapon critical tas,s' such as %eapon release or lo%ering o# the
undercarriage' +ut is used #or a %ide range o# other coc,pit #unctions. oice co$$ands
are con#ir$ed +y )isual and=or aural #eed+ac,. The syste$ is seen as a $a
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
19/27
hi$sel# %ith t%o si$ple )oice co$$ands or to any o# his %ing$en %ith only #i)e
co$$ands
Helicop ers
The pro+le$s o# achie)ing high recognition accuracy under stress and noise pertain
strongly to the helicopter en)iron$ent as %ell as to the #ighter en)iron$ent. The acoustic
noise pro+le$ is actually $ore se)ere in the helicopter en)iron$ent' not only +ecause o#
the high noise le)els +ut also +ecause the helicopter pilot generally does not %ear a
#ace$as,' %hich %ould reduce acoustic noise in the $icrophone. Su+stantial test and
e)aluation progra$s ha)e +een carried out in the past decade in speech recognition
syste$s applications in helicopters' nota+ly +y the U.S. Ar$y A)ionics Research and
Be)elop$ent Acti)ity A RABA2 and +y the Royal Aerospace ?sta+lish$ent RA?2 in
the U . 0or, in rance has included speech recognition in the @u$a helicopter. There
has also +een $uch use#ul %or, in "anada. Results ha)e +een encouraging' and )oice
applications ha)e included control o# co$$unication radiosF setting o# na)igation
syste$sF and control o# an auto$ated target hando)er syste$.
As in #ighter applications' the o)erriding issue #or )oice in helicopters is the i$pact on
pilot e##ecti)eness. ?ncouraging results are reported #or the A RABA tests' although
these represent only a #easi+ility de$onstration in a test en)iron$ent. /uch re$ains to +e
done +oth in speech recognition and in o)erall speech recognition technology' in order to
consistently achie)e per#or$ance i$pro)e$ents in operational settings.
attle /anage$ent co$$and centers generally re&uire rapid access to and control o#
large' rapidly changing in#or$ation data+ases. "o$$anders and syste$ operators need to
&uery these data+ases as con)eniently as possi+le' in an eyes +usy en)iron$ent %here19
http://en.wikipedia.org/wiki/Battle_Managementhttp://en.wikipedia.org/wiki/Battle_Management -
8/13/2019 Pra a Aaaaaaaaaaaaaaa
20/27
$uch o# the in#or$ation is presented in a display #or$at. *u$an $achine interaction +y
)oice has the potential to +e )ery use#ul in these en)iron$ents. A nu$+er o# e##orts ha)e
+een underta,en to inter#ace co$$ercially a)aila+le isolated %ord recogni(ers into +attle
$anage$ent en)iron$ents. In one #easi+ility study speech recognition e&uip$ent %as
tested in con
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
21/27
speech as the pri$ary output o# the controller' hence reducing the di##iculty o# the speech
recognition tas,.
The U.S. 8a)al Training ?&uip$ent "enter has sponsored a nu$+er o#
de)elop$ents o# prototype AT" trainers using speech recognition. Generally' the
recognition accuracy #alls short o# pro)iding grace#ul interaction +et%een the trainee and
the syste$. *o%e)er' the
prototype training syste$s ha)e de$onstrated a signi#icant potential #or )oice
interaction in these syste$s' and in other training applications. The U.S. 8a)y has
sponsored a large scale e##ort in AT" training syste$s' %here a co$$ercial speech
recognition unit %as integrated %ith a co$plex training syste$ including displays and
scenario creation. Although the recogni(er %as constrained in )oca+ulary' one o# the
goals o# the training progra$s %as to teach the controllers to spea, in a constrained
language' using speci#ic )oca+ulary speci#ically designed #or the AT" tas,. Research in
rance has #ocused on the application o# speech recognition in AT" training syste$s'
directed at issues +oth in speech recognition and in application o# tas, do$ain gra$$ar
constraints.
The USA ' US/"' US Ar$y' and AA are currently using AT" si$ulators %ith
speech recognition #ro$ a nu$+er o# di##erent )endors' including U A' Inc ' and Adacel
Syste$s Inc ASI2 . This so#t%are uses speech recognition and synthetic speech to ena+le
the trainee to control aircra#t and ground )ehicles in the si$ulation %ithout the need #or
pseudo pilots.
21
http://www.ufainc.com/http://www.adacel.com/http://www.adacel.com/http://www.ufainc.com/http://www.adacel.com/http://www.adacel.com/ -
8/13/2019 Pra a Aaaaaaaaaaaaaaa
22/27
Another approach to AT" si$ulation %ith speech recognition has +een created +y
Supre$es. The Supre$es syste$ is not constrained +y rigid gra$$ars i$posed +y the
underlying li$itations o# other recognition strategies.
%elephon# and o her domains
ASR in the #ield o# telephony is no% co$$onplace and in the #ield o# co$puter
ga$ing and si$ulation is +eco$ing $ore %idespread. Bespite the high le)el o#
integration %ith %ord processing in general personal co$puting' ho%e)er' ASR in the
#ield o# docu$ent production has not seen the expected increases in use.
The i$pro)e$ent o# $o+ile processor speeds $ade #easi+le the speech ena+led
Sy$+ian and 0indo%s /o+ile S$art phones. "urrent speech to text progra$s are too
large and re&uire too $uch "@U po%er to +e practical #or the @oc,et @". Speech is used
$ostly as a part o# User Inter#ace' #or creating pre de#ined or custo$ speech co$$ands.
Leading so#t%are )endors in this #ield are /icroso#t "orporation /icroso#t oice
"o$$and2' 8uance "o$$unications 8uance oice "ontrol2' ito Technology IT;
oice Go2' Speereo So#t%are Speereo oice Translator2 and S ;J.
@eople %ith disa+ilities can +ene#it #ro$ speech recognition progra$s. Speech
recognition is especially use#ul #or people %ho ha)e di##iculty using their hands' ranging
#ro$ $ild repetiti)e stress in
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
23/27
paper co$$unication essentially they thin, o# an idea +ut it is processed incorrectly
causing it to end up di##erently on paper2 can +ene#it #ro$ the so#t%are
!. 455 I64%I7N,
Auto$atic translation
Auto$oti)e speech recognition e.g.' ord Sync 2
Tele$etric e.g. )ehicle 8a)igation Syste$s2
"ourt reporting Real ti$e oice 0riting2
*ands #ree co$puting )oice co$$and recognition co$puter user inter#ace
*o$e auto$ation
Interacti)e )oice response
/o+ile telephony ' including $o+ile e$ail
/ulti$odal interaction23
http://en.wikipedia.org/wiki/Ford_Synchttp://en.wikipedia.org/wiki/Hands-free_computinghttp://en.wikipedia.org/wiki/User_interfacehttp://en.wikipedia.org/wiki/User_interfacehttp://en.wikipedia.org/wiki/Home_automationhttp://en.wikipedia.org/wiki/Interactive_voice_responsehttp://en.wikipedia.org/wiki/Mobile_telephonyhttp://en.wikipedia.org/wiki/Mobile_telephonyhttp://en.wikipedia.org/wiki/Multimodal_interactionhttp://en.wikipedia.org/wiki/Ford_Synchttp://en.wikipedia.org/wiki/Hands-free_computinghttp://en.wikipedia.org/wiki/User_interfacehttp://en.wikipedia.org/wiki/Home_automationhttp://en.wikipedia.org/wiki/Interactive_voice_responsehttp://en.wikipedia.org/wiki/Mobile_telephonyhttp://en.wikipedia.org/wiki/Multimodal_interaction -
8/13/2019 Pra a Aaaaaaaaaaaaaaa
24/27
@ronunciation e)aluation in co$puter aided language learning applications
Ro+otics
ideo ga$es ' possi+le expansion into the RTS genre #ollo%ing To$ "lancy:s ?nd
0ar
Transcription digital speech to text2.
Speech to text transcription o# speech into $o+ile text $essages2
Air Tra##ic "ontrol Speech Recognition
8. 9 % E ,675E
In #uture i$portant
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
25/27
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
26/27
strengthening existing colla+orations +et%een linguists and speech engineers as %ell as
initiating ne% ones.
'. :I:I 70 45;
1.. ". G. rat( enstein' Sur la raissance de la #or$ation des )oyelle s' . @hys.' olt 1' pp.
35!3!E' 1K! .
. *. Budley and T. *. Tarnoc(y' The Spea,ing /achin e o# 0ol#gang )on e$pelen' .
Acoust.
Soc. A$.' ol. ' pp. 151 177' 165E.
3.. Sir "harles 0heatstone' The Scienti#ic @apers o# Sir "harles 0heatstone' London
26
-
8/13/2019 Pra a Aaaaaaaaaaaaaaa
27/27
Ta ylor and rancis' 1!K6.
4. . L. lanagan' Speech Analysis' Synthesis and @erception' Second ?dition' Springer
erlag'
16K .
5. . ry and @. Benes' the Besign and ;peration o# the /echan ical Speech Recogni(er at
Uni)ersit y "ollege London' . ritish Inst. Radio ?ngr.' ol. 16' 8o. 4' pp. 11 6' 1656.
7. T. . /artin' A . L. 8elson and * . . Qadell' Speech Recognition +y eature A+straction
Techni&ues' Tech. Report AL TBR 74 1K7' Air orce A)ionics La+' 1674.
27