Pra a Aaaaaaaaaaaaaaa

8/13/2019 Pra a Aaaaaaaaaaaaaaa

1/27

INDEX

1 Introduction 1

1.1 History 2

2. Speech Recognition 4

2.1 Performance of speech recognition systems 5

2.2 Hidden Markov model (HMM)-based speech recogni ion !

2." D#namic ime $arping (D%&)-based speech recogni ion '

3 Speech Understanding 11

Text Generation 13

Speech Syntheses 14

* Language Resources 15

!. Applications

!. "onclusion 3

'. Re#erences 4

1


2/27

1. In rod+c ion

In#or$ation Technology deals %ith the ac&uisition' organi(ation' storage'

processing' trans$ission and deli)ery o# in#or$ation. *u$an +eings collect )arious type

o# data %ith the intention o# extraction in#or$ation rele)ant to decision $a,ing. A large

part o# data processing is conducted using co$puters than,s to their enor$ous capa+ility

#or nu$erical co$putation. *o%e)er' co$puters e)en today play the role o# an assistant

in decision $a,ing rather than the role o# a decision $a,er' and rightly so. They #ull this

role +y presenting the in#or$ation and ,no%ledge gleaned #ro$ data processing to the

hu$ans in a #or$ %hich is easily interpreta+le +y the hu$an +eings. -uite o#ten' people

issue co$$and to the co$puter to rune the in#or$ation #ollo%ing so$e $ethodology

%hich is dyna$ically deter$ined depending on the pro+le$ at hand. Thus' the hu$an

decision $a,ing process %ith the help o# co$puters in)ol)es a dialogue +et%een $an

and $achine.

"o$$unication a$ong hu$an +eings is inherently $ulti $odal' )isual and aural

$odes +eing the pri$ary $odes. "urrently' the principal $eans o# hu$an $achine

co$$unication is hea)ily +iased to%ards the con)enience o# the $achine rather than that

o# $an. /ouse and ,ey+oard are pri$ary input de)ices and )isual display unit is the

pri$ary output de)ice. Usage o# such inter#aces re&uires special s,ills and $ental attitude

%hich $any people are not endo%ed %ith. This $achine centric $ode o# co$$unication

needs to +e changed in #a)or o# hu$an centric inter#aces so that the +eneath o# the po%er

o# co$puters is shared +y all people. 0hile )isual $ode is $ost e##ecti)e in capturing

In#or$ation' speech re$ains the pre#erred and $ost con)enient $eans o#

con)eying in#or$ation. The ad)antage o# and the co$pelling reason #or2 )er+al2


3/27

co$$unication has +eco$e e)en $ore stronger today due to con)ergence o# co$puters

and teleco$$unication syste$s %hich allo%s people to access in#or$ation on co$puters

located re$otely. The )er+al co$$unication in)ol)es natural language' and this +rings to

#ore the role o# linguistics in the in#or$ation technology.

ro$ the a+o)e discussion' it is clear that hu$an centric inter#ace to co$puter is

the they share in#or$ation' thoughts and ideas artlessly a$ong the$sel)es. acilitating

hu$an $achine interaction using natural language in)ol)es se)eral #acets o# hu$an

language technology speech co$pression' recognition and understanding o# speech and

script' $achine translation' text generation' synthesis o# speech and cursi)e script. oth

#or$s o# language spo,en and %ritten are use#ul #or interaction %ith $achine. *ere' %e

connect oursel)es to the spo,en language and discuss the role o# linguistic ,no%ledge in

de)eloping speech inter#aces. The rele)ance o# linguistics in speech recognition' speech

understanding and speech synthesis %ill +e dealt %ith in the #ollo%ing sections.

1.1 History

The #irst speech recogni(er appeared in 165 and consisted o# a de)ice #or the recognition

o# single spo,en digits another early de)ice %as the I / Shoe+ox ' exhi+ited at the 1674

8e% 9or, 0orld:s air. ;ne o# the $ost nota+le do$ains #or the co$$ercial application

o# speech recognition in the United States has +een health care and in particular the %or,

o# the $edical transcriptionist /T2 According to industry experts' at its inception'

speech recognition SR2 %as sold as a %ay to co$pletely eli$inate transcription rather

than $a,e

the transcription process $ore e##icient' hence it %as not accepted. It %as also the case

that SR at that ti$e %as o#ten technically de#icient. Additionally' to +e used e##ecti)ely' it

3
http://en.wikipedia.org/wiki/IBM_Shoeboxhttp://en.wikipedia.org/wiki/1964_New_York_World's_Fairhttp://en.wikipedia.org/wiki/1964_New_York_World's_Fairhttp://en.wikipedia.org/wiki/Medical_transcriptionhttp://en.wikipedia.org/wiki/IBM_Shoeboxhttp://en.wikipedia.org/wiki/1964_New_York_World's_Fairhttp://en.wikipedia.org/wiki/1964_New_York_World's_Fairhttp://en.wikipedia.org/wiki/Medical_transcription


4/27

re&uired changes to the %ays physicians %or,ed and docu$ented clinical encounters'

%hich $any i# not all %ere reluctant to do. The +iggest li$itation to speech recognition

auto$ating transcription' ho%e)er' is seen as the so#t%are. The nature o# narrati)e

dictation is highly interpreti)e and o#ten re&uires arti#icial syntax syste$s> %hich are usually

do$ain speci#ic and >natural language processing> %hich is usually language speci#ic.

?ach o# these types o# application presents its o%n particular goals and challenges.

2. ,peech ecogni ion

4


5/27

Speech recognition' the process o# translating a speech signal into a se&uence o#

%ords is at the heart o# speech input de)ices. Although tre$endous progress has +een

$ade in the area o# speech recognition SR2 technology' $ost o# it has co$e #ro$

ad)ances in $odeling speech sounds and their innocence on sounds in the i$$ediate

)icinity' and not Re$e$+er the pro)er+ A picture is %orth $ore than thousand %ords>

I$agine an atte$pt to con)ey so$ething to a person outside a glass %all using only

gestures %ithout the +eneath o# speech2 #ro$ ade&uate $odeling o# natural language.

The gra$$ar is nor$ally $odeled in ter$s o# statistical roperties o# language not

+ecause engineers pre#er statistical gra$$ar +ut +ecause there is no +etter %or,ing

alternati)e in the #or$ o# language $odels %ith a strong #oundation in #or$al linguistics.

@hrase structure gra$$ars' #or exa$ple' co$prise o# se)eral hundreds or thousands o#

rules descri+ing de#erent phrase types. ?ach o# these rules is annotated +y #eatures and

so$eti$es also +y expressions in a progra$$ing language. 0hen such gra$$ars reach a

certain si(e they +eco$e di##icult to $aintain' to extend and to reuse. The resulting

syste$s $ight +e su##iciently enceinte #or so$e applications +ut they lac, the speed o#

processing needed #or interacti)e syste$s such as applications in)ol)ing spo,en input2

or syste$s that ha)e to process large )olu$es o# texts as in $achine translation2.

"ontext #ree gra$$ars and their pro+a+ilistic )ersions ha)e +een tried and their

success in $odeling unseen data has +een only partial. ?sti$ation o# 8 gra$

pro+a+ilities'

the $ost popular statistical language $odel' has re$ained a sparse esti$ation pro+le$

despite the usage o# a )ery large corpus rele)ant to the tas, do$ain. or exa$ple' a#ter

o+ser)ing all trigra$s i.e.' consecuti)e triplets2 in 3! $illion %ords: %orth o# ne%spaper

articles' a #ull one third o# trigra$s in ne% articles #ro$ the sa$e source are no)el.5


6/27

/oreo)er' current language $odels are extre$ely sensiti)e to changes in the style' topic

or genre o# the text on %hich they are trained. A statistical language $odel trained %ith

ne%s%ire text #ro$ one co$pany %ill see its perplexity the geo$etric a)erage +ranching

#actor o# the language according to the $ode2 dou+led %hen applied to ne%s o# the sa$e

ti$e period #ro$ a si$ilar agency The inade&uacy o# language $odeling is e)ident in

the per#or$ance o# speech recognition syste$s in co$petiti)e BAR@A e)aluations. In the

$ost recent test o# SR syste$s %ith noisy telephone speech' the +est SR syste$ sho%ed

only 7 C %ord accuracy. /ost SR syste$s expect the user to spea, gra$$atically

correct sentences. This puts a lot o# load on users to #or$ulate such syntactically correct

sentence %ith no out o# )oca+ulary %ords' prior to spea,ing to the co$puter. A user

#riendly speech input syste$ should +e a+le to handle speech decencies and exile

gra$$ar. This is %here co$putational linguists can play a crucial role.

2.1 Performance of speech recognition systems

The per#or$ance o# speech recognition syste$s is usually speci#ied in ter$s o# accuracy

and speed. Accuracy $ay +e $easured in ter$s o# per#or$ance accuracy %hich is usually

rated %ith %ord error rate 0?R2' %hereas speed is $easured %ith the real ti$e #actor .

;ther $easures o# accuracy include Single 0ord ?rror Rate S0?R2 and "o$$and

Success Rate "SR2.

/ost speech recognition users %ould tend to agree that dictation $achines can

achie)e )ery high per#or$ance in controlled conditions. There is so$e con#usion'

ho%e)er' o)er the interchangea+ility o# the ter$s >speech recognition> and >dictation>.

"o$$ercially a)aila+le spea,er dependent dictation syste$s usually re&uire only a

short period o# training so$eti$es also called Denroll$ent:2 and $ay success#ully capture6
http://en.wikipedia.org/wiki/Word_error_ratehttp://en.wikipedia.org/wiki/Real_time_factorhttp://en.wikipedia.org/w/index.php?title=Single_Word_Error_Rate&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Command_Success_Rate&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Command_Success_Rate&action=edit&redlink=1http://en.wikipedia.org/wiki/Word_error_ratehttp://en.wikipedia.org/wiki/Real_time_factorhttp://en.wikipedia.org/w/index.php?title=Single_Word_Error_Rate&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Command_Success_Rate&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Command_Success_Rate&action=edit&redlink=1


7/27

continuous speech %ith a large )oca+ulary at nor$al pace %ith a )ery high accuracy.

/ost co$$ercial co$panies clai$ that recognition so#t%are can achie)e +et%een 6!C to

66C accuracy i# operated under opti$al conditions. D;pti$al conditions: usually assu$e

that users

ha)e speech characteristics %hich $atch the training data'

can achie)e proper spea,er adaptation' and

0or, in a clean noise en)iron$ent e.g. &uiet o##ice or la+oratory space2.

This explains %hy so$e users' especially those %hose speech is hea)ily accented' $ight

achie)e recognition rates $uch lo%er than expected. Speech recognition in )ideo has

+eco$e a popular search technology used +y se)eral )ideo search co$panies.

Li$ited )oca+ulary syste$s' re&uiring no training' can recogni(e a s$all nu$+er o# %ords

#or instance' the ten digits2 as spo,en +y $ost spea,ers. Such syste$s are popular #or

routing inco$ing phone calls to their destinations in large organi(ations.

oth acoustic $odeling and language $odeling are i$portant parts o# $odern statistically

+ased speech recognition algorith$s. *idden /ar,o) $odels *//s2 are %idely used in

$any syste$s. Language $odeling has $any other applications such as s$art ,ey+oard

and docu$ent classi#ication.

2.2 Hidden Markov model (HMM)-based speech recogni ion

7
http://en.wikipedia.org/wiki/Acoustic_Modelhttp://en.wikipedia.org/wiki/Language_modelhttp://en.wikipedia.org/w/index.php?title=Smart_keyboard&action=edit&redlink=1http://en.wikipedia.org/wiki/Document_classificationhttp://en.wikipedia.org/wiki/Acoustic_Modelhttp://en.wikipedia.org/wiki/Language_modelhttp://en.wikipedia.org/w/index.php?title=Smart_keyboard&action=edit&redlink=1http://en.wikipedia.org/wiki/Document_classification


8/27

/odern general purpose speech recognition syste$s are generally +ased on

*idden /ar,o) /odels. These are statistical $odels %hich output a se&uence o# sy$+ols

or &uantities. ;ne possi+le reason %hy *//s are used in speech recognition is that a

speech signal could +e )ie%ed as a piece%ise stationary signal or a short ti$e stationary

signal. That is' one could assu$e in a short ti$e in the range o# 1E $illiseconds' speech

could +e approxi$ated as a stationary process . Speech could thus +e thought o# as a

/ar,o) $odel #or $any stochastic processes.

Another reason %hy *//s are popular is +ecause they can +e trained

auto$atically and are si$ple and co$putationally #easi+le to use. In speech recognition'

the hidden /ar,o) $odel %ould output a se&uence o# n di$ensional real )alued )ectors

%ith n +eing a s$all integer' such as 1E2' outputting one o# these e)ery 1E $illiseconds.

The )ectors %ould consist o# ca$pestral coe##icients' %hich are o+tained +y ta,ing a

ourier trans#or$ o# a short ti$e %indo% o# speech and decor relating the spectru$ using

a cosine trans#or$ ' then ta,ing the #irst $ost signi#icant2 coe##icients. The hidden

/ar,o) $odel %ill tend to ha)e in each state a statistical distri+ution that is a $ixture o#

diagonal co)ariance Gaussians %hich %ill gi)e li,elihood #or each o+ser)ed )ector. ?ach

%ord' or #or $ore general speech recognition syste$s2' each phone$e ' %ill ha)e a

di##erent output distri+utionF a hidden /ar,o) $odel #or a se&uence o# %ords or

phone$es is $ade +y concatenating the indi)idual trained hidden /ar,o) $odels #or the

separate %ords and phone$es.

Bescri+ed a+o)e are the core ele$ents o# the $ost co$$on' *// +ased

approach to speech recognition. /odern speech recognition syste$s use )arious

co$+inations o# a nu$+er o# standard techni&ues in order to i$pro)e results o)er the

+asic approach descri+ed a+o)e. A typical large )oca+ulary syste$ %ould need context8
http://en.wikipedia.org/wiki/Stationary_processhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Cepstrumhttp://en.wikipedia.org/wiki/Fourier_transformhttp://en.wikipedia.org/wiki/Cosine_transformhttp://en.wikipedia.org/wiki/Phonemehttp://en.wikipedia.org/wiki/Stationary_processhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Cepstrumhttp://en.wikipedia.org/wiki/Fourier_transformhttp://en.wikipedia.org/wiki/Cosine_transformhttp://en.wikipedia.org/wiki/Phoneme


9/27

dependency #or the phone$es so phone$es %ith di##erent le#t and right context ha)e

di##erent reali(ations as *// states2F it %ould use cepstral nor$ali(ation to nor$ali(e

#or di##erent spea,er and recording conditionsF #or #urther spea,er nor$ali(ation it $ight

use )ocal tract length nor$ali(ation TL82 #or $ale #e$ale nor$ali(ation and

$axi$u$ li,elihood linear regression /LLR2 #or $ore general spea,er adaptation. The

#eatures %ould ha)e so called delta and delta delta coe##icients to capture speech

dyna$ics and in addition $ight use heteroscedastic linear discri$inate analysis *LBA2F

or $ight s,ip the delta and delta delta coe##icients and use splicing and an LBA +ased

pro


10/27

2.3 D#namic ime $arping (D%&)-based speech recogni ion

Byna$ic ti$e %arping is an approach that %as historically used #or speech

recognition +ut has no% largely +een displaced +y the $ore success#ul *// +ased

approach. Byna$ic ti$e %arping is an algorith$ #or $easuring si$ilarity +et%een t%o

se&uences %hich $ay )ary in ti$e or speed. or instance' si$ilarities in %al,ing patterns

%ould +e detected' e)en i# in one )ideo the person %as %al,ing slo%ly and i# in another

they %ere %al,ing $ore &uic,ly' or e)en i# there %ere accelerations and decelerations

during the course o# one o+ser)ation. BT0 has +een applied to )ideo' audio' and

graphics H indeed' any data %hich can +e turned into a linear representation can +e

analy(ed %ith BT0.

A %ell ,no%n application has +een auto$atic speech recognition' to cope %ith

di##erent spea,ing speeds. In general' it is a $ethod that allo%s a co$puter to #ind an

opti$al $atch +et%een t%o gi)en se&uences e.g. ti$e series2 %ith certain restrictions'

i.e. the se&uences are >%arped> non linearly to $atch each other. This se&uence

align$ent $ethod is o#ten used in the context o# hidden /ar,o) $odels.

Further information

@opular speech recognition con#erences held each year or t%o include I"ASS@'

?uro speech=I"SL@ no% na$ed Interspeech2 and the I??? ASRU. "on#erences in the

#ield o# 8atural language processing ' such as A"L' 8AA"L' ?/8L@' and *LT' are

+eginning to include papers on speech processing. I$portant unda$entals o# Speech Recognition> +y La%rence10
http://en.wikipedia.org/wiki/Natural_language_processinghttp://en.wikipedia.org/wiki/IEEEhttp://en.wikipedia.org/wiki/IEEEhttp://en.wikipedia.org/wiki/Lawrence_Rabinerhttp://en.wikipedia.org/wiki/Natural_language_processinghttp://en.wikipedia.org/wiki/IEEEhttp://en.wikipedia.org/wiki/IEEEhttp://en.wikipedia.org/wiki/Lawrence_Rabiner


11/27

Ra+iner can +e use#ul to ac&uire +asic ,no%ledge +ut $ay not +e #ully up to date 16632.

Another good source can +e >Statistical /ethods #or Speech Recognition> +y rederic,

eline, and >Spo,en Language @rocessing EE12> +y Juedong *uang etc. /ore up to

date is >"o$puter Speech>' +y /an#red R. Schroeder ' second edition pu+lished in EE4.

The recently updated text+oo, o# >Speech and Language @rocessing EE!2> +y ura#s,y

and /artin presents the +asics and the state o# the art #or ASR. A good insight into the

techni&ues used in the +est $odern syste$s can +e gained +y paying attention to

go)ern$ent sponsored e)aluations such as those organi(ed +y BAR@A the largest

speech recognition related pro


12/27

3 ,peech nders anding

Speech understanding in)ol)es integration o# speech recognition' and natural

language 8L2 understanding. This integration has great ad)antages To 8L' SR can

+ring prosodic in#or$ation in#or$ation i$portant #or syntax and se$antics +ut not %ell

represented in text2F 8L can +ring to SR additional ,no%ledge sources e.g.' syntax and

se$antics2. The integration o# these technologies presents technical challenges' and

challenges related to the &uite de#erent cultures' techni&ues and +elie#s o# the people

representing the co$ponent technologies. In large part' 8L research has +een pursued in

co$puter science and linguistics depart$entsF the goal is to $odel language

understanding $oti)ated +y a desire to understand cogniti)e processes. *ence' the

underlying theories tend to +e #ro$ linguistics and psychology. @ractical applications

ha)e +een less i$portant than increasing intuitions a+out hu$an processes. There#ore'

co)erage o# pheno$ena o# theoretical interest usually the $ore rare pheno$ena2 has

traditionally +een $ore i$portant than +road co)erage. ;n the other hand' speech

recognition research has largely +een practiced in engineering depart$ents %ith practical

applications in $ind. Techni&ues $oti)ated +y ,no%ledge o# hu$an processes ha)e

there#ore +een less i$portant than techni&ues that can +e auto$atically de)eloped or

tuned' and +road co)erage o# a representati)e sa$ple is $ore i$portant than co)erage o#

any particular pheno$enon. The integration o# SR and 8L needs to o)erco$e not only

technical challenges +ut also the de#erenceNs in $oti)ation' interests'

theoretical underpinnings' techni&ues' tools' and criteria #or success o# the t%o groups.

*o%e)er' +oth groups ha)e $uch to gain #ro$ colla+oration and such a trend is )isi+le

around the %orld. 3.1 Integration o# Speech Recognition and 8atural Language

@rocessing SR is concerned %ith acoustic attri+utes o# %ords to a large extent' and %ith

lexical and syntactic in#or$ation to lesser extent. ;n the other hand' hu$an speech12


13/27

understanding in)ol)es the integration o# a great )ariety o# ,no%ledge sources' including

,no%ledge o# the %orld or context' ,no%ledge o# the spea,er and=or topic' lexical

#re&uency' pre)ious uses o# a %ord or a se$antically related topic' #acial expressions'

prosody. Thus' integration o# SR and 8L has +een a consistent goal. *o%e)er' as

gra$$atical co)erage increases' standard 8L techni&ues can +eco$e co$putationally

di##icult. urther' %ith increased co)erage' 8L tends to pro)ide less constraint #or SR.

Si$ple $inded concatenation o# an existing speech recognition syste$ and an existing

8L syste$ is su+opti$al due to directed o%n o# in#or$ation. 8ot only errors in SR

syste$ can propagate' +ut also there is no %ay the higher le)el ,no%ledge sources help

the SR syste$ in pruning the search space. /ost i$portantly' $ost 8L syste$s deal %ith

%ritten language rather than spo,en language. In the #or$er case' one can expect

gra$$atically correct sentences' %hereas in an interacti)e dialogue' speech disuencies

such as restart' re)ision' repetitions' leer sounds and hesitations are co$$on. /ost 8L

syste$s are concerned %ith correct analyses o# co$plete sentences than to $ethods #or

reco)ery o# interpretations %hen parses are inco$plete %hich the need o# spo,en

language understanding is. Thus there is a need to re#or$ulate existing ,no%ledge o# 8L

syste$s and to de)ise co$putational $odels o# spo,en language. Traditionally' linguists

ha)e studied the properties o# natural language and docu$ented the o+ser)ations in

)arious #or$s. This ,no%ledge has to +e trans#or$ed to an algorith$ic #or$ as #ar as

possi+le so that

the collecti)e ,no%ledge and %isdo$ o# the linguistic co$$unity +eco$es $ore use#ul

to the hu$an ,ind. This ,ind o# colla+orati)e %or, +et%een linguists and engineers is

especially rele)ant in a $ulti lingual country such as India.

13


14/27

%e/ 0enera ion

;ne o# the purposes o# interacting %ith co$puters is to access in#or$ation. This

in#or$ation has to +e dra%n #ro$ a data+ase and +e presented to the user. A spo,en

&uery o# a user is processed +y a speech understanding syste$ %hich #or$ulates a

data+ase &uery. The in#or$ation in the data+ase has to +e trans#or$ed into natural

language %hich can +e presented to o# spo,en output gi)en the data representation'

context and dialogue state. In si$ple' speci#ic tas, do$ains' response sentence te$plates

can +e used in si$ple' #or exa$ple' in#or$ation a+out the a)aila+ility o# rail%ay

reser)ation2 to generate text. *o%e)er' a )ersatile text generation $odule should generate

coherent $ulti sentential responses' and interpreting and responding to users: su+se&uent

utterances in the context o# an ongoing interaction. Spo,en language generation re&uires

%hat concepts to include and ho% to reali(e the$ in %ords. In addition' it needs to

deter$ine into national #or$s #or speech synthesis.

14


15/27

,peech ,#n hesis

The tas, o# a speech synthesis $odule is to synthesi(e an intelligi+le' natural' easily

interpreted and appropriate spo,en )ersion o# the response ta,ing ad)antage o# the

context and dialogue state to e$phasi(e certain in#or$ation. The acoustic e)idence needs

to in#or$ a+stract units in syntax' se$antics' discourse' and prag$atics. 0hile the

intelligi+ility o# speech generated +y current speech synthesis syste$s is good' the

naturalness lea)es $uch to +e desired. The issue o# intelligi+ility is pri$arily related to

the generation and co$+ination o# speech sounds %hereas i$parting naturalness in)ol)es

incorporating supraseg$ental in#or$ation. This in)ol)es prosodic phrasing' i.e.'

chun,ing a long sentence into prosodic phrases. @atterns o# )ariation in #unda$ental

#re&uency' duration' a$plitude or intensity' pauses' and spea,ing rate ha)e +een sho%n to

carry in#or$ation a+out such prosodic ele$ents as lexical stress' phrase +rea,s' and

declarati)e or interrogati)e sentence #or$. A high &uality speech synthesis syste$ in any

language has to deal %ith these issues in order to +eco$e accepta+le to people as a $eans

o# deli)ery o# in#or$ation.

15


16/27

* ang+age eso+rces

The ter$ linguistic resources re#ers to usually large2 sets o# language data and

descriptions in $achine reada+le #or$' to +e used in +uilding' i$pro)ing' or e)aluating

natural language 8L2 and speech recognition and synthesis syste$s. ?xa$ples o#

linguistic resources are %ritten and spo,en corpora' lexical data+ases and gra$$ars. The

need #or such linguistic resources in )ast &uantities is e)en $ore i$portant #or speech

recognition syste$s as they use statistical $odels #or representing acoustic units as %ell

as language.

In the health care do$ain' e)en in the %a,e o# i$pro)ing speech recognition

technologies' $edical transcriptionists /Ts2 ha)e not yet +eco$e o+solete. /any

experts in the #ield Oanticipate that %ith increased use o# speech recognition technology'

the ser)ices pro)ided $ay +e redistri+uted rather than replaced. Speech recognition is

used to ena+le dea# people to understand the spo,en %ord )ia speech to text con)ersion'

%hich is )ery help#ul.

Speech recognition can +e i$ple$ented in #ront end or +ac, end o# the $edical

docu$entation process. ront ?nd SR is %here the pro)ider dictates into a speech

recognition engine' the recogni(ed %ords are displayed right a#ter they are spo,en' and

the dictator is responsi+le #or editing and signing o## on the docu$ent. It ne)er goes

through an /T=editor.

16
http://en.wikipedia.org/wiki/Health_carehttp://en.wikipedia.org/wiki/Health_care


17/27

ac, ?nd SR or Be#erred SR is %here the pro)ider dictates into a digital dictation

syste$' and the )oice is routed through a speech recognition $achine and the recogni(ed

dra#t docu$ent is routed along %ith the original )oice #ile to the /T=editor' %ho edits the

dra#t and #inali(es the report. Be#erred SR is +eing %idely used in the industry currently.

/any ?lectronic /edical Records ?/R2 applications can +e $ore e##ecti)e and

$ay +e per#or$ed $ore easily %hen deployed in con


18/27

. Achie)e$ent o# )ery high recognition accuracy 65C or $ore2 %as the $ost

critical #actor #or $a,ing the speech recognition syste$ use#ul %ith lo%er

recognition rates' pilots %ould not use the syste$.

3. /ore natural )oca+ulary and gra$$ar' and shorter training ti$es %ould +e

use#ul' +ut only i# )ery high recognition rates could +e $aintained.

La+oratory research in ro+ust speech recognition #or $ilitary en)iron$ents has

produced pro$ising results %hich' i# extenda+le to the coc,pit' should i$pro)e the utility

o# speech recognition in high per#or$ance aircra#t.

0or,ing %ith S%edish pilots #lying in the AS 36 Gripen coc,pit' ?ngland EE42

#ound recognition deteriorated %ith increasing G loads. It %as also concluded that

adaptation greatly i$pro)ed the results in all cases and introducing $odels #or +reathing

%as sho%n to i$pro)e recognition scores signi#icantly. "ontrary to %hat $ight +e

expected' no e##ects o# the +ro,en ?nglish o# the spea,ers %ere #ound. It %as e)ident that

spontaneous speech caused pro+le$s #or the recogni(er' as could +e expected. A

restricted )oca+ulary' and a+o)e all' a proper syntax' could thus +e expected to i$pro)e

recognition accuracy su+stantially. O P

The ?uro #ighter Typhoon currently in ser)ice %ith the U RA e$ploys a spea,er

dependent syste$' i.e. it re&uires each pilot to create a te$plate. The syste$ is not used

#or any sa#ety critical or %eapon critical tas,s' such as %eapon release or lo%ering o# the

undercarriage' +ut is used #or a %ide range o# other coc,pit #unctions. oice co$$ands

are con#ir$ed +y )isual and=or aural #eed+ac,. The syste$ is seen as a $a


19/27

hi$sel# %ith t%o si$ple )oice co$$ands or to any o# his %ing$en %ith only #i)e

co$$ands

Helicop ers

The pro+le$s o# achie)ing high recognition accuracy under stress and noise pertain

strongly to the helicopter en)iron$ent as %ell as to the #ighter en)iron$ent. The acoustic

noise pro+le$ is actually $ore se)ere in the helicopter en)iron$ent' not only +ecause o#

the high noise le)els +ut also +ecause the helicopter pilot generally does not %ear a

#ace$as,' %hich %ould reduce acoustic noise in the $icrophone. Su+stantial test and

e)aluation progra$s ha)e +een carried out in the past decade in speech recognition

syste$s applications in helicopters' nota+ly +y the U.S. Ar$y A)ionics Research and

Be)elop$ent Acti)ity A RABA2 and +y the Royal Aerospace ?sta+lish$ent RA?2 in

the U . 0or, in rance has included speech recognition in the @u$a helicopter. There

has also +een $uch use#ul %or, in "anada. Results ha)e +een encouraging' and )oice

applications ha)e included control o# co$$unication radiosF setting o# na)igation

syste$sF and control o# an auto$ated target hando)er syste$.

As in #ighter applications' the o)erriding issue #or )oice in helicopters is the i$pact on

pilot e##ecti)eness. ?ncouraging results are reported #or the A RABA tests' although

these represent only a #easi+ility de$onstration in a test en)iron$ent. /uch re$ains to +e

done +oth in speech recognition and in o)erall speech recognition technology' in order to

consistently achie)e per#or$ance i$pro)e$ents in operational settings.

attle /anage$ent co$$and centers generally re&uire rapid access to and control o#

large' rapidly changing in#or$ation data+ases. "o$$anders and syste$ operators need to

&uery these data+ases as con)eniently as possi+le' in an eyes +usy en)iron$ent %here19
http://en.wikipedia.org/wiki/Battle_Managementhttp://en.wikipedia.org/wiki/Battle_Management


20/27

$uch o# the in#or$ation is presented in a display #or$at. *u$an $achine interaction +y

)oice has the potential to +e )ery use#ul in these en)iron$ents. A nu$+er o# e##orts ha)e

+een underta,en to inter#ace co$$ercially a)aila+le isolated %ord recogni(ers into +attle

$anage$ent en)iron$ents. In one #easi+ility study speech recognition e&uip$ent %as

tested in con


21/27

speech as the pri$ary output o# the controller' hence reducing the di##iculty o# the speech

recognition tas,.

The U.S. 8a)al Training ?&uip$ent "enter has sponsored a nu$+er o#

de)elop$ents o# prototype AT" trainers using speech recognition. Generally' the

recognition accuracy #alls short o# pro)iding grace#ul interaction +et%een the trainee and

the syste$. *o%e)er' the

prototype training syste$s ha)e de$onstrated a signi#icant potential #or )oice

interaction in these syste$s' and in other training applications. The U.S. 8a)y has

sponsored a large scale e##ort in AT" training syste$s' %here a co$$ercial speech

recognition unit %as integrated %ith a co$plex training syste$ including displays and

scenario creation. Although the recogni(er %as constrained in )oca+ulary' one o# the

goals o# the training progra$s %as to teach the controllers to spea, in a constrained

language' using speci#ic )oca+ulary speci#ically designed #or the AT" tas,. Research in

rance has #ocused on the application o# speech recognition in AT" training syste$s'

directed at issues +oth in speech recognition and in application o# tas, do$ain gra$$ar

constraints.

The USA ' US/"' US Ar$y' and AA are currently using AT" si$ulators %ith

speech recognition #ro$ a nu$+er o# di##erent )endors' including U A' Inc ' and Adacel

Syste$s Inc ASI2 . This so#t%are uses speech recognition and synthetic speech to ena+le

the trainee to control aircra#t and ground )ehicles in the si$ulation %ithout the need #or

pseudo pilots.

21
http://www.ufainc.com/http://www.adacel.com/http://www.adacel.com/http://www.ufainc.com/http://www.adacel.com/http://www.adacel.com/


22/27

Another approach to AT" si$ulation %ith speech recognition has +een created +y

Supre$es. The Supre$es syste$ is not constrained +y rigid gra$$ars i$posed +y the

underlying li$itations o# other recognition strategies.

%elephon# and o her domains

ASR in the #ield o# telephony is no% co$$onplace and in the #ield o# co$puter

ga$ing and si$ulation is +eco$ing $ore %idespread. Bespite the high le)el o#

integration %ith %ord processing in general personal co$puting' ho%e)er' ASR in the

#ield o# docu$ent production has not seen the expected increases in use.

The i$pro)e$ent o# $o+ile processor speeds $ade #easi+le the speech ena+led

Sy$+ian and 0indo%s /o+ile S$art phones. "urrent speech to text progra$s are too

large and re&uire too $uch "@U po%er to +e practical #or the @oc,et @". Speech is used

$ostly as a part o# User Inter#ace' #or creating pre de#ined or custo$ speech co$$ands.

Leading so#t%are )endors in this #ield are /icroso#t "orporation /icroso#t oice

"o$$and2' 8uance "o$$unications 8uance oice "ontrol2' ito Technology IT;

oice Go2' Speereo So#t%are Speereo oice Translator2 and S ;J.

@eople %ith disa+ilities can +ene#it #ro$ speech recognition progra$s. Speech

recognition is especially use#ul #or people %ho ha)e di##iculty using their hands' ranging

#ro$ $ild repetiti)e stress in


23/27

paper co$$unication essentially they thin, o# an idea +ut it is processed incorrectly

causing it to end up di##erently on paper2 can +ene#it #ro$ the so#t%are

!. 455 I64%I7N,

Auto$atic translation

Auto$oti)e speech recognition e.g.' ord Sync 2

Tele$etric e.g. )ehicle 8a)igation Syste$s2

"ourt reporting Real ti$e oice 0riting2

*ands #ree co$puting )oice co$$and recognition co$puter user inter#ace

*o$e auto$ation

Interacti)e )oice response

/o+ile telephony ' including $o+ile e$ail

/ulti$odal interaction23
http://en.wikipedia.org/wiki/Ford_Synchttp://en.wikipedia.org/wiki/Hands-free_computinghttp://en.wikipedia.org/wiki/User_interfacehttp://en.wikipedia.org/wiki/User_interfacehttp://en.wikipedia.org/wiki/Home_automationhttp://en.wikipedia.org/wiki/Interactive_voice_responsehttp://en.wikipedia.org/wiki/Mobile_telephonyhttp://en.wikipedia.org/wiki/Mobile_telephonyhttp://en.wikipedia.org/wiki/Multimodal_interactionhttp://en.wikipedia.org/wiki/Ford_Synchttp://en.wikipedia.org/wiki/Hands-free_computinghttp://en.wikipedia.org/wiki/User_interfacehttp://en.wikipedia.org/wiki/Home_automationhttp://en.wikipedia.org/wiki/Interactive_voice_responsehttp://en.wikipedia.org/wiki/Mobile_telephonyhttp://en.wikipedia.org/wiki/Multimodal_interaction


24/27

@ronunciation e)aluation in co$puter aided language learning applications

Ro+otics

ideo ga$es ' possi+le expansion into the RTS genre #ollo%ing To$ "lancy:s ?nd

0ar

Transcription digital speech to text2.

Speech to text transcription o# speech into $o+ile text $essages2

Air Tra##ic "ontrol Speech Recognition

8. 9 % E ,675E

In #uture i$portant


25/27


26/27

strengthening existing colla+orations +et%een linguists and speech engineers as %ell as

initiating ne% ones.

'. :I:I 70 45;

1.. ". G. rat( enstein' Sur la raissance de la #or$ation des )oyelle s' . @hys.' olt 1' pp.

35!3!E' 1K! .

. *. Budley and T. *. Tarnoc(y' The Spea,ing /achin e o# 0ol#gang )on e$pelen' .

Acoust.

Soc. A$.' ol. ' pp. 151 177' 165E.

3.. Sir "harles 0heatstone' The Scienti#ic @apers o# Sir "harles 0heatstone' London

26


27/27

Ta ylor and rancis' 1!K6.

4. . L. lanagan' Speech Analysis' Synthesis and @erception' Second ?dition' Springer

erlag'

16K .

5. . ry and @. Benes' the Besign and ;peration o# the /echan ical Speech Recogni(er at

Uni)ersit y "ollege London' . ritish Inst. Radio ?ngr.' ol. 16' 8o. 4' pp. 11 6' 1656.

7. T. . /artin' A . L. 8elson and * . . Qadell' Speech Recognition +y eature A+straction

Techni&ues' Tech. Report AL TBR 74 1K7' Air orce A)ionics La+' 1674.

27

Pra a Aaaaaaaaaaaaaaa

Documents

Transcript of Pra a Aaaaaaaaaaaaaaa