Pra a Aaaaaaaaaaaaaaa

download Pra a Aaaaaaaaaaaaaaa

of 27

Transcript of Pra a Aaaaaaaaaaaaaaa

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    1/27

    INDEX

    1 Introduction 1

    1.1 History 2

    2. Speech Recognition 4

    2.1 Performance of speech recognition systems 5

    2.2 Hidden Markov model (HMM)-based speech recogni ion !

    2." D#namic ime $arping (D%&)-based speech recogni ion '

    3 Speech Understanding 11

    Text Generation 13

    Speech Syntheses 14

    * Language Resources 15

    !. Applications

    !. "onclusion 3

    '. Re#erences 4

    1

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    2/27

    1. In rod+c ion

    In#or$ation Technology deals %ith the ac&uisition' organi(ation' storage'

    processing' trans$ission and deli)ery o# in#or$ation. *u$an +eings collect )arious type

    o# data %ith the intention o# extraction in#or$ation rele)ant to decision $a,ing. A large

    part o# data processing is conducted using co$puters than,s to their enor$ous capa+ility

    #or nu$erical co$putation. *o%e)er' co$puters e)en today play the role o# an assistant

    in decision $a,ing rather than the role o# a decision $a,er' and rightly so. They #ull this

    role +y presenting the in#or$ation and ,no%ledge gleaned #ro$ data processing to the

    hu$ans in a #or$ %hich is easily interpreta+le +y the hu$an +eings. -uite o#ten' people

    issue co$$and to the co$puter to rune the in#or$ation #ollo%ing so$e $ethodology

    %hich is dyna$ically deter$ined depending on the pro+le$ at hand. Thus' the hu$an

    decision $a,ing process %ith the help o# co$puters in)ol)es a dialogue +et%een $an

    and $achine.

    "o$$unication a$ong hu$an +eings is inherently $ulti $odal' )isual and aural

    $odes +eing the pri$ary $odes. "urrently' the principal $eans o# hu$an $achine

    co$$unication is hea)ily +iased to%ards the con)enience o# the $achine rather than that

    o# $an. /ouse and ,ey+oard are pri$ary input de)ices and )isual display unit is the

    pri$ary output de)ice. Usage o# such inter#aces re&uires special s,ills and $ental attitude

    %hich $any people are not endo%ed %ith. This $achine centric $ode o# co$$unication

    needs to +e changed in #a)or o# hu$an centric inter#aces so that the +eneath o# the po%er

    o# co$puters is shared +y all people. 0hile )isual $ode is $ost e##ecti)e in capturing

    In#or$ation' speech re$ains the pre#erred and $ost con)enient $eans o#

    con)eying in#or$ation. The ad)antage o# and the co$pelling reason #or2 )er+al2

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    3/27

    co$$unication has +eco$e e)en $ore stronger today due to con)ergence o# co$puters

    and teleco$$unication syste$s %hich allo%s people to access in#or$ation on co$puters

    located re$otely. The )er+al co$$unication in)ol)es natural language' and this +rings to

    #ore the role o# linguistics in the in#or$ation technology.

    ro$ the a+o)e discussion' it is clear that hu$an centric inter#ace to co$puter is

    the they share in#or$ation' thoughts and ideas artlessly a$ong the$sel)es. acilitating

    hu$an $achine interaction using natural language in)ol)es se)eral #acets o# hu$an

    language technology speech co$pression' recognition and understanding o# speech and

    script' $achine translation' text generation' synthesis o# speech and cursi)e script. oth

    #or$s o# language spo,en and %ritten are use#ul #or interaction %ith $achine. *ere' %e

    connect oursel)es to the spo,en language and discuss the role o# linguistic ,no%ledge in

    de)eloping speech inter#aces. The rele)ance o# linguistics in speech recognition' speech

    understanding and speech synthesis %ill +e dealt %ith in the #ollo%ing sections.

    1.1 History

    The #irst speech recogni(er appeared in 165 and consisted o# a de)ice #or the recognition

    o# single spo,en digits another early de)ice %as the I / Shoe+ox ' exhi+ited at the 1674

    8e% 9or, 0orld:s air. ;ne o# the $ost nota+le do$ains #or the co$$ercial application

    o# speech recognition in the United States has +een health care and in particular the %or,

    o# the $edical transcriptionist /T2 According to industry experts' at its inception'

    speech recognition SR2 %as sold as a %ay to co$pletely eli$inate transcription rather

    than $a,e

    the transcription process $ore e##icient' hence it %as not accepted. It %as also the case

    that SR at that ti$e %as o#ten technically de#icient. Additionally' to +e used e##ecti)ely' it

    3

    http://en.wikipedia.org/wiki/IBM_Shoeboxhttp://en.wikipedia.org/wiki/1964_New_York_World's_Fairhttp://en.wikipedia.org/wiki/1964_New_York_World's_Fairhttp://en.wikipedia.org/wiki/Medical_transcriptionhttp://en.wikipedia.org/wiki/IBM_Shoeboxhttp://en.wikipedia.org/wiki/1964_New_York_World's_Fairhttp://en.wikipedia.org/wiki/1964_New_York_World's_Fairhttp://en.wikipedia.org/wiki/Medical_transcription
  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    4/27

    re&uired changes to the %ays physicians %or,ed and docu$ented clinical encounters'

    %hich $any i# not all %ere reluctant to do. The +iggest li$itation to speech recognition

    auto$ating transcription' ho%e)er' is seen as the so#t%are. The nature o# narrati)e

    dictation is highly interpreti)e and o#ten re&uires arti#icial syntax syste$s> %hich are usually

    do$ain speci#ic and >natural language processing> %hich is usually language speci#ic.

    ?ach o# these types o# application presents its o%n particular goals and challenges.

    2. ,peech ecogni ion

    4

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    5/27

    Speech recognition' the process o# translating a speech signal into a se&uence o#

    %ords is at the heart o# speech input de)ices. Although tre$endous progress has +een

    $ade in the area o# speech recognition SR2 technology' $ost o# it has co$e #ro$

    ad)ances in $odeling speech sounds and their innocence on sounds in the i$$ediate

    )icinity' and not Re$e$+er the pro)er+ A picture is %orth $ore than thousand %ords>

    I$agine an atte$pt to con)ey so$ething to a person outside a glass %all using only

    gestures %ithout the +eneath o# speech2 #ro$ ade&uate $odeling o# natural language.

    The gra$$ar is nor$ally $odeled in ter$s o# statistical roperties o# language not

    +ecause engineers pre#er statistical gra$$ar +ut +ecause there is no +etter %or,ing

    alternati)e in the #or$ o# language $odels %ith a strong #oundation in #or$al linguistics.

    @hrase structure gra$$ars' #or exa$ple' co$prise o# se)eral hundreds or thousands o#

    rules descri+ing de#erent phrase types. ?ach o# these rules is annotated +y #eatures and

    so$eti$es also +y expressions in a progra$$ing language. 0hen such gra$$ars reach a

    certain si(e they +eco$e di##icult to $aintain' to extend and to reuse. The resulting

    syste$s $ight +e su##iciently enceinte #or so$e applications +ut they lac, the speed o#

    processing needed #or interacti)e syste$s such as applications in)ol)ing spo,en input2

    or syste$s that ha)e to process large )olu$es o# texts as in $achine translation2.

    "ontext #ree gra$$ars and their pro+a+ilistic )ersions ha)e +een tried and their

    success in $odeling unseen data has +een only partial. ?sti$ation o# 8 gra$

    pro+a+ilities'

    the $ost popular statistical language $odel' has re$ained a sparse esti$ation pro+le$

    despite the usage o# a )ery large corpus rele)ant to the tas, do$ain. or exa$ple' a#ter

    o+ser)ing all trigra$s i.e.' consecuti)e triplets2 in 3! $illion %ords: %orth o# ne%spaper

    articles' a #ull one third o# trigra$s in ne% articles #ro$ the sa$e source are no)el.5

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    6/27

    /oreo)er' current language $odels are extre$ely sensiti)e to changes in the style' topic

    or genre o# the text on %hich they are trained. A statistical language $odel trained %ith

    ne%s%ire text #ro$ one co$pany %ill see its perplexity the geo$etric a)erage +ranching

    #actor o# the language according to the $ode2 dou+led %hen applied to ne%s o# the sa$e

    ti$e period #ro$ a si$ilar agency The inade&uacy o# language $odeling is e)ident in

    the per#or$ance o# speech recognition syste$s in co$petiti)e BAR@A e)aluations. In the

    $ost recent test o# SR syste$s %ith noisy telephone speech' the +est SR syste$ sho%ed

    only 7 C %ord accuracy. /ost SR syste$s expect the user to spea, gra$$atically

    correct sentences. This puts a lot o# load on users to #or$ulate such syntactically correct

    sentence %ith no out o# )oca+ulary %ords' prior to spea,ing to the co$puter. A user

    #riendly speech input syste$ should +e a+le to handle speech decencies and exile

    gra$$ar. This is %here co$putational linguists can play a crucial role.

    2.1 Performance of speech recognition systems

    The per#or$ance o# speech recognition syste$s is usually speci#ied in ter$s o# accuracy

    and speed. Accuracy $ay +e $easured in ter$s o# per#or$ance accuracy %hich is usually

    rated %ith %ord error rate 0?R2' %hereas speed is $easured %ith the real ti$e #actor .

    ;ther $easures o# accuracy include Single 0ord ?rror Rate S0?R2 and "o$$and

    Success Rate "SR2.

    /ost speech recognition users %ould tend to agree that dictation $achines can

    achie)e )ery high per#or$ance in controlled conditions. There is so$e con#usion'

    ho%e)er' o)er the interchangea+ility o# the ter$s >speech recognition> and >dictation>.

    "o$$ercially a)aila+le spea,er dependent dictation syste$s usually re&uire only a

    short period o# training so$eti$es also called Denroll$ent:2 and $ay success#ully capture6

    http://en.wikipedia.org/wiki/Word_error_ratehttp://en.wikipedia.org/wiki/Real_time_factorhttp://en.wikipedia.org/w/index.php?title=Single_Word_Error_Rate&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Command_Success_Rate&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Command_Success_Rate&action=edit&redlink=1http://en.wikipedia.org/wiki/Word_error_ratehttp://en.wikipedia.org/wiki/Real_time_factorhttp://en.wikipedia.org/w/index.php?title=Single_Word_Error_Rate&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Command_Success_Rate&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Command_Success_Rate&action=edit&redlink=1
  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    7/27

    continuous speech %ith a large )oca+ulary at nor$al pace %ith a )ery high accuracy.

    /ost co$$ercial co$panies clai$ that recognition so#t%are can achie)e +et%een 6!C to

    66C accuracy i# operated under opti$al conditions. D;pti$al conditions: usually assu$e

    that users

    ha)e speech characteristics %hich $atch the training data'

    can achie)e proper spea,er adaptation' and

    0or, in a clean noise en)iron$ent e.g. &uiet o##ice or la+oratory space2.

    This explains %hy so$e users' especially those %hose speech is hea)ily accented' $ight

    achie)e recognition rates $uch lo%er than expected. Speech recognition in )ideo has

    +eco$e a popular search technology used +y se)eral )ideo search co$panies.

    Li$ited )oca+ulary syste$s' re&uiring no training' can recogni(e a s$all nu$+er o# %ords

    #or instance' the ten digits2 as spo,en +y $ost spea,ers. Such syste$s are popular #or

    routing inco$ing phone calls to their destinations in large organi(ations.

    oth acoustic $odeling and language $odeling are i$portant parts o# $odern statistically

    +ased speech recognition algorith$s. *idden /ar,o) $odels *//s2 are %idely used in

    $any syste$s. Language $odeling has $any other applications such as s$art ,ey+oard

    and docu$ent classi#ication.

    2.2 Hidden Markov model (HMM)-based speech recogni ion

    7

    http://en.wikipedia.org/wiki/Acoustic_Modelhttp://en.wikipedia.org/wiki/Language_modelhttp://en.wikipedia.org/w/index.php?title=Smart_keyboard&action=edit&redlink=1http://en.wikipedia.org/wiki/Document_classificationhttp://en.wikipedia.org/wiki/Acoustic_Modelhttp://en.wikipedia.org/wiki/Language_modelhttp://en.wikipedia.org/w/index.php?title=Smart_keyboard&action=edit&redlink=1http://en.wikipedia.org/wiki/Document_classification
  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    8/27

    /odern general purpose speech recognition syste$s are generally +ased on

    *idden /ar,o) /odels. These are statistical $odels %hich output a se&uence o# sy$+ols

    or &uantities. ;ne possi+le reason %hy *//s are used in speech recognition is that a

    speech signal could +e )ie%ed as a piece%ise stationary signal or a short ti$e stationary

    signal. That is' one could assu$e in a short ti$e in the range o# 1E $illiseconds' speech

    could +e approxi$ated as a stationary process . Speech could thus +e thought o# as a

    /ar,o) $odel #or $any stochastic processes.

    Another reason %hy *//s are popular is +ecause they can +e trained

    auto$atically and are si$ple and co$putationally #easi+le to use. In speech recognition'

    the hidden /ar,o) $odel %ould output a se&uence o# n di$ensional real )alued )ectors

    %ith n +eing a s$all integer' such as 1E2' outputting one o# these e)ery 1E $illiseconds.

    The )ectors %ould consist o# ca$pestral coe##icients' %hich are o+tained +y ta,ing a

    ourier trans#or$ o# a short ti$e %indo% o# speech and decor relating the spectru$ using

    a cosine trans#or$ ' then ta,ing the #irst $ost signi#icant2 coe##icients. The hidden

    /ar,o) $odel %ill tend to ha)e in each state a statistical distri+ution that is a $ixture o#

    diagonal co)ariance Gaussians %hich %ill gi)e li,elihood #or each o+ser)ed )ector. ?ach

    %ord' or #or $ore general speech recognition syste$s2' each phone$e ' %ill ha)e a

    di##erent output distri+utionF a hidden /ar,o) $odel #or a se&uence o# %ords or

    phone$es is $ade +y concatenating the indi)idual trained hidden /ar,o) $odels #or the

    separate %ords and phone$es.

    Bescri+ed a+o)e are the core ele$ents o# the $ost co$$on' *// +ased

    approach to speech recognition. /odern speech recognition syste$s use )arious

    co$+inations o# a nu$+er o# standard techni&ues in order to i$pro)e results o)er the

    +asic approach descri+ed a+o)e. A typical large )oca+ulary syste$ %ould need context8

    http://en.wikipedia.org/wiki/Stationary_processhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Cepstrumhttp://en.wikipedia.org/wiki/Fourier_transformhttp://en.wikipedia.org/wiki/Cosine_transformhttp://en.wikipedia.org/wiki/Phonemehttp://en.wikipedia.org/wiki/Stationary_processhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Cepstrumhttp://en.wikipedia.org/wiki/Fourier_transformhttp://en.wikipedia.org/wiki/Cosine_transformhttp://en.wikipedia.org/wiki/Phoneme
  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    9/27

    dependency #or the phone$es so phone$es %ith di##erent le#t and right context ha)e

    di##erent reali(ations as *// states2F it %ould use cepstral nor$ali(ation to nor$ali(e

    #or di##erent spea,er and recording conditionsF #or #urther spea,er nor$ali(ation it $ight

    use )ocal tract length nor$ali(ation TL82 #or $ale #e$ale nor$ali(ation and

    $axi$u$ li,elihood linear regression /LLR2 #or $ore general spea,er adaptation. The

    #eatures %ould ha)e so called delta and delta delta coe##icients to capture speech

    dyna$ics and in addition $ight use heteroscedastic linear discri$inate analysis *LBA2F

    or $ight s,ip the delta and delta delta coe##icients and use splicing and an LBA +ased

    pro

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    10/27

    2.3 D#namic ime $arping (D%&)-based speech recogni ion

    Byna$ic ti$e %arping is an approach that %as historically used #or speech

    recognition +ut has no% largely +een displaced +y the $ore success#ul *// +ased

    approach. Byna$ic ti$e %arping is an algorith$ #or $easuring si$ilarity +et%een t%o

    se&uences %hich $ay )ary in ti$e or speed. or instance' si$ilarities in %al,ing patterns

    %ould +e detected' e)en i# in one )ideo the person %as %al,ing slo%ly and i# in another

    they %ere %al,ing $ore &uic,ly' or e)en i# there %ere accelerations and decelerations

    during the course o# one o+ser)ation. BT0 has +een applied to )ideo' audio' and

    graphics H indeed' any data %hich can +e turned into a linear representation can +e

    analy(ed %ith BT0.

    A %ell ,no%n application has +een auto$atic speech recognition' to cope %ith

    di##erent spea,ing speeds. In general' it is a $ethod that allo%s a co$puter to #ind an

    opti$al $atch +et%een t%o gi)en se&uences e.g. ti$e series2 %ith certain restrictions'

    i.e. the se&uences are >%arped> non linearly to $atch each other. This se&uence

    align$ent $ethod is o#ten used in the context o# hidden /ar,o) $odels.

    Further information

    @opular speech recognition con#erences held each year or t%o include I"ASS@'

    ?uro speech=I"SL@ no% na$ed Interspeech2 and the I??? ASRU. "on#erences in the

    #ield o# 8atural language processing ' such as A"L' 8AA"L' ?/8L@' and *LT' are

    +eginning to include papers on speech processing. I$portant unda$entals o# Speech Recognition> +y La%rence10

    http://en.wikipedia.org/wiki/Natural_language_processinghttp://en.wikipedia.org/wiki/IEEEhttp://en.wikipedia.org/wiki/IEEEhttp://en.wikipedia.org/wiki/Lawrence_Rabinerhttp://en.wikipedia.org/wiki/Natural_language_processinghttp://en.wikipedia.org/wiki/IEEEhttp://en.wikipedia.org/wiki/IEEEhttp://en.wikipedia.org/wiki/Lawrence_Rabiner
  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    11/27

    Ra+iner can +e use#ul to ac&uire +asic ,no%ledge +ut $ay not +e #ully up to date 16632.

    Another good source can +e >Statistical /ethods #or Speech Recognition> +y rederic,

    eline, and >Spo,en Language @rocessing EE12> +y Juedong *uang etc. /ore up to

    date is >"o$puter Speech>' +y /an#red R. Schroeder ' second edition pu+lished in EE4.

    The recently updated text+oo, o# >Speech and Language @rocessing EE!2> +y ura#s,y

    and /artin presents the +asics and the state o# the art #or ASR. A good insight into the

    techni&ues used in the +est $odern syste$s can +e gained +y paying attention to

    go)ern$ent sponsored e)aluations such as those organi(ed +y BAR@A the largest

    speech recognition related pro

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    12/27

    3 ,peech nders anding

    Speech understanding in)ol)es integration o# speech recognition' and natural

    language 8L2 understanding. This integration has great ad)antages To 8L' SR can

    +ring prosodic in#or$ation in#or$ation i$portant #or syntax and se$antics +ut not %ell

    represented in text2F 8L can +ring to SR additional ,no%ledge sources e.g.' syntax and

    se$antics2. The integration o# these technologies presents technical challenges' and

    challenges related to the &uite de#erent cultures' techni&ues and +elie#s o# the people

    representing the co$ponent technologies. In large part' 8L research has +een pursued in

    co$puter science and linguistics depart$entsF the goal is to $odel language

    understanding $oti)ated +y a desire to understand cogniti)e processes. *ence' the

    underlying theories tend to +e #ro$ linguistics and psychology. @ractical applications

    ha)e +een less i$portant than increasing intuitions a+out hu$an processes. There#ore'

    co)erage o# pheno$ena o# theoretical interest usually the $ore rare pheno$ena2 has

    traditionally +een $ore i$portant than +road co)erage. ;n the other hand' speech

    recognition research has largely +een practiced in engineering depart$ents %ith practical

    applications in $ind. Techni&ues $oti)ated +y ,no%ledge o# hu$an processes ha)e

    there#ore +een less i$portant than techni&ues that can +e auto$atically de)eloped or

    tuned' and +road co)erage o# a representati)e sa$ple is $ore i$portant than co)erage o#

    any particular pheno$enon. The integration o# SR and 8L needs to o)erco$e not only

    technical challenges +ut also the de#erenceNs in $oti)ation' interests'

    theoretical underpinnings' techni&ues' tools' and criteria #or success o# the t%o groups.

    *o%e)er' +oth groups ha)e $uch to gain #ro$ colla+oration and such a trend is )isi+le

    around the %orld. 3.1 Integration o# Speech Recognition and 8atural Language

    @rocessing SR is concerned %ith acoustic attri+utes o# %ords to a large extent' and %ith

    lexical and syntactic in#or$ation to lesser extent. ;n the other hand' hu$an speech12

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    13/27

    understanding in)ol)es the integration o# a great )ariety o# ,no%ledge sources' including

    ,no%ledge o# the %orld or context' ,no%ledge o# the spea,er and=or topic' lexical

    #re&uency' pre)ious uses o# a %ord or a se$antically related topic' #acial expressions'

    prosody. Thus' integration o# SR and 8L has +een a consistent goal. *o%e)er' as

    gra$$atical co)erage increases' standard 8L techni&ues can +eco$e co$putationally

    di##icult. urther' %ith increased co)erage' 8L tends to pro)ide less constraint #or SR.

    Si$ple $inded concatenation o# an existing speech recognition syste$ and an existing

    8L syste$ is su+opti$al due to directed o%n o# in#or$ation. 8ot only errors in SR

    syste$ can propagate' +ut also there is no %ay the higher le)el ,no%ledge sources help

    the SR syste$ in pruning the search space. /ost i$portantly' $ost 8L syste$s deal %ith

    %ritten language rather than spo,en language. In the #or$er case' one can expect

    gra$$atically correct sentences' %hereas in an interacti)e dialogue' speech disuencies

    such as restart' re)ision' repetitions' leer sounds and hesitations are co$$on. /ost 8L

    syste$s are concerned %ith correct analyses o# co$plete sentences than to $ethods #or

    reco)ery o# interpretations %hen parses are inco$plete %hich the need o# spo,en

    language understanding is. Thus there is a need to re#or$ulate existing ,no%ledge o# 8L

    syste$s and to de)ise co$putational $odels o# spo,en language. Traditionally' linguists

    ha)e studied the properties o# natural language and docu$ented the o+ser)ations in

    )arious #or$s. This ,no%ledge has to +e trans#or$ed to an algorith$ic #or$ as #ar as

    possi+le so that

    the collecti)e ,no%ledge and %isdo$ o# the linguistic co$$unity +eco$es $ore use#ul

    to the hu$an ,ind. This ,ind o# colla+orati)e %or, +et%een linguists and engineers is

    especially rele)ant in a $ulti lingual country such as India.

    13

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    14/27

    %e/ 0enera ion

    ;ne o# the purposes o# interacting %ith co$puters is to access in#or$ation. This

    in#or$ation has to +e dra%n #ro$ a data+ase and +e presented to the user. A spo,en

    &uery o# a user is processed +y a speech understanding syste$ %hich #or$ulates a

    data+ase &uery. The in#or$ation in the data+ase has to +e trans#or$ed into natural

    language %hich can +e presented to o# spo,en output gi)en the data representation'

    context and dialogue state. In si$ple' speci#ic tas, do$ains' response sentence te$plates

    can +e used in si$ple' #or exa$ple' in#or$ation a+out the a)aila+ility o# rail%ay

    reser)ation2 to generate text. *o%e)er' a )ersatile text generation $odule should generate

    coherent $ulti sentential responses' and interpreting and responding to users: su+se&uent

    utterances in the context o# an ongoing interaction. Spo,en language generation re&uires

    %hat concepts to include and ho% to reali(e the$ in %ords. In addition' it needs to

    deter$ine into national #or$s #or speech synthesis.

    14

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    15/27

    ,peech ,#n hesis

    The tas, o# a speech synthesis $odule is to synthesi(e an intelligi+le' natural' easily

    interpreted and appropriate spo,en )ersion o# the response ta,ing ad)antage o# the

    context and dialogue state to e$phasi(e certain in#or$ation. The acoustic e)idence needs

    to in#or$ a+stract units in syntax' se$antics' discourse' and prag$atics. 0hile the

    intelligi+ility o# speech generated +y current speech synthesis syste$s is good' the

    naturalness lea)es $uch to +e desired. The issue o# intelligi+ility is pri$arily related to

    the generation and co$+ination o# speech sounds %hereas i$parting naturalness in)ol)es

    incorporating supraseg$ental in#or$ation. This in)ol)es prosodic phrasing' i.e.'

    chun,ing a long sentence into prosodic phrases. @atterns o# )ariation in #unda$ental

    #re&uency' duration' a$plitude or intensity' pauses' and spea,ing rate ha)e +een sho%n to

    carry in#or$ation a+out such prosodic ele$ents as lexical stress' phrase +rea,s' and

    declarati)e or interrogati)e sentence #or$. A high &uality speech synthesis syste$ in any

    language has to deal %ith these issues in order to +eco$e accepta+le to people as a $eans

    o# deli)ery o# in#or$ation.

    15

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    16/27

    * ang+age eso+rces

    The ter$ linguistic resources re#ers to usually large2 sets o# language data and

    descriptions in $achine reada+le #or$' to +e used in +uilding' i$pro)ing' or e)aluating

    natural language 8L2 and speech recognition and synthesis syste$s. ?xa$ples o#

    linguistic resources are %ritten and spo,en corpora' lexical data+ases and gra$$ars. The

    need #or such linguistic resources in )ast &uantities is e)en $ore i$portant #or speech

    recognition syste$s as they use statistical $odels #or representing acoustic units as %ell

    as language.

    In the health care do$ain' e)en in the %a,e o# i$pro)ing speech recognition

    technologies' $edical transcriptionists /Ts2 ha)e not yet +eco$e o+solete. /any

    experts in the #ield Oanticipate that %ith increased use o# speech recognition technology'

    the ser)ices pro)ided $ay +e redistri+uted rather than replaced. Speech recognition is

    used to ena+le dea# people to understand the spo,en %ord )ia speech to text con)ersion'

    %hich is )ery help#ul.

    Speech recognition can +e i$ple$ented in #ront end or +ac, end o# the $edical

    docu$entation process. ront ?nd SR is %here the pro)ider dictates into a speech

    recognition engine' the recogni(ed %ords are displayed right a#ter they are spo,en' and

    the dictator is responsi+le #or editing and signing o## on the docu$ent. It ne)er goes

    through an /T=editor.

    16

    http://en.wikipedia.org/wiki/Health_carehttp://en.wikipedia.org/wiki/Health_care
  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    17/27

    ac, ?nd SR or Be#erred SR is %here the pro)ider dictates into a digital dictation

    syste$' and the )oice is routed through a speech recognition $achine and the recogni(ed

    dra#t docu$ent is routed along %ith the original )oice #ile to the /T=editor' %ho edits the

    dra#t and #inali(es the report. Be#erred SR is +eing %idely used in the industry currently.

    /any ?lectronic /edical Records ?/R2 applications can +e $ore e##ecti)e and

    $ay +e per#or$ed $ore easily %hen deployed in con

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    18/27

    . Achie)e$ent o# )ery high recognition accuracy 65C or $ore2 %as the $ost

    critical #actor #or $a,ing the speech recognition syste$ use#ul %ith lo%er

    recognition rates' pilots %ould not use the syste$.

    3. /ore natural )oca+ulary and gra$$ar' and shorter training ti$es %ould +e

    use#ul' +ut only i# )ery high recognition rates could +e $aintained.

    La+oratory research in ro+ust speech recognition #or $ilitary en)iron$ents has

    produced pro$ising results %hich' i# extenda+le to the coc,pit' should i$pro)e the utility

    o# speech recognition in high per#or$ance aircra#t.

    0or,ing %ith S%edish pilots #lying in the AS 36 Gripen coc,pit' ?ngland EE42

    #ound recognition deteriorated %ith increasing G loads. It %as also concluded that

    adaptation greatly i$pro)ed the results in all cases and introducing $odels #or +reathing

    %as sho%n to i$pro)e recognition scores signi#icantly. "ontrary to %hat $ight +e

    expected' no e##ects o# the +ro,en ?nglish o# the spea,ers %ere #ound. It %as e)ident that

    spontaneous speech caused pro+le$s #or the recogni(er' as could +e expected. A

    restricted )oca+ulary' and a+o)e all' a proper syntax' could thus +e expected to i$pro)e

    recognition accuracy su+stantially. O P

    The ?uro #ighter Typhoon currently in ser)ice %ith the U RA e$ploys a spea,er

    dependent syste$' i.e. it re&uires each pilot to create a te$plate. The syste$ is not used

    #or any sa#ety critical or %eapon critical tas,s' such as %eapon release or lo%ering o# the

    undercarriage' +ut is used #or a %ide range o# other coc,pit #unctions. oice co$$ands

    are con#ir$ed +y )isual and=or aural #eed+ac,. The syste$ is seen as a $a

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    19/27

    hi$sel# %ith t%o si$ple )oice co$$ands or to any o# his %ing$en %ith only #i)e

    co$$ands

    Helicop ers

    The pro+le$s o# achie)ing high recognition accuracy under stress and noise pertain

    strongly to the helicopter en)iron$ent as %ell as to the #ighter en)iron$ent. The acoustic

    noise pro+le$ is actually $ore se)ere in the helicopter en)iron$ent' not only +ecause o#

    the high noise le)els +ut also +ecause the helicopter pilot generally does not %ear a

    #ace$as,' %hich %ould reduce acoustic noise in the $icrophone. Su+stantial test and

    e)aluation progra$s ha)e +een carried out in the past decade in speech recognition

    syste$s applications in helicopters' nota+ly +y the U.S. Ar$y A)ionics Research and

    Be)elop$ent Acti)ity A RABA2 and +y the Royal Aerospace ?sta+lish$ent RA?2 in

    the U . 0or, in rance has included speech recognition in the @u$a helicopter. There

    has also +een $uch use#ul %or, in "anada. Results ha)e +een encouraging' and )oice

    applications ha)e included control o# co$$unication radiosF setting o# na)igation

    syste$sF and control o# an auto$ated target hando)er syste$.

    As in #ighter applications' the o)erriding issue #or )oice in helicopters is the i$pact on

    pilot e##ecti)eness. ?ncouraging results are reported #or the A RABA tests' although

    these represent only a #easi+ility de$onstration in a test en)iron$ent. /uch re$ains to +e

    done +oth in speech recognition and in o)erall speech recognition technology' in order to

    consistently achie)e per#or$ance i$pro)e$ents in operational settings.

    attle /anage$ent co$$and centers generally re&uire rapid access to and control o#

    large' rapidly changing in#or$ation data+ases. "o$$anders and syste$ operators need to

    &uery these data+ases as con)eniently as possi+le' in an eyes +usy en)iron$ent %here19

    http://en.wikipedia.org/wiki/Battle_Managementhttp://en.wikipedia.org/wiki/Battle_Management
  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    20/27

    $uch o# the in#or$ation is presented in a display #or$at. *u$an $achine interaction +y

    )oice has the potential to +e )ery use#ul in these en)iron$ents. A nu$+er o# e##orts ha)e

    +een underta,en to inter#ace co$$ercially a)aila+le isolated %ord recogni(ers into +attle

    $anage$ent en)iron$ents. In one #easi+ility study speech recognition e&uip$ent %as

    tested in con

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    21/27

    speech as the pri$ary output o# the controller' hence reducing the di##iculty o# the speech

    recognition tas,.

    The U.S. 8a)al Training ?&uip$ent "enter has sponsored a nu$+er o#

    de)elop$ents o# prototype AT" trainers using speech recognition. Generally' the

    recognition accuracy #alls short o# pro)iding grace#ul interaction +et%een the trainee and

    the syste$. *o%e)er' the

    prototype training syste$s ha)e de$onstrated a signi#icant potential #or )oice

    interaction in these syste$s' and in other training applications. The U.S. 8a)y has

    sponsored a large scale e##ort in AT" training syste$s' %here a co$$ercial speech

    recognition unit %as integrated %ith a co$plex training syste$ including displays and

    scenario creation. Although the recogni(er %as constrained in )oca+ulary' one o# the

    goals o# the training progra$s %as to teach the controllers to spea, in a constrained

    language' using speci#ic )oca+ulary speci#ically designed #or the AT" tas,. Research in

    rance has #ocused on the application o# speech recognition in AT" training syste$s'

    directed at issues +oth in speech recognition and in application o# tas, do$ain gra$$ar

    constraints.

    The USA ' US/"' US Ar$y' and AA are currently using AT" si$ulators %ith

    speech recognition #ro$ a nu$+er o# di##erent )endors' including U A' Inc ' and Adacel

    Syste$s Inc ASI2 . This so#t%are uses speech recognition and synthetic speech to ena+le

    the trainee to control aircra#t and ground )ehicles in the si$ulation %ithout the need #or

    pseudo pilots.

    21

    http://www.ufainc.com/http://www.adacel.com/http://www.adacel.com/http://www.ufainc.com/http://www.adacel.com/http://www.adacel.com/
  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    22/27

    Another approach to AT" si$ulation %ith speech recognition has +een created +y

    Supre$es. The Supre$es syste$ is not constrained +y rigid gra$$ars i$posed +y the

    underlying li$itations o# other recognition strategies.

    %elephon# and o her domains

    ASR in the #ield o# telephony is no% co$$onplace and in the #ield o# co$puter

    ga$ing and si$ulation is +eco$ing $ore %idespread. Bespite the high le)el o#

    integration %ith %ord processing in general personal co$puting' ho%e)er' ASR in the

    #ield o# docu$ent production has not seen the expected increases in use.

    The i$pro)e$ent o# $o+ile processor speeds $ade #easi+le the speech ena+led

    Sy$+ian and 0indo%s /o+ile S$art phones. "urrent speech to text progra$s are too

    large and re&uire too $uch "@U po%er to +e practical #or the @oc,et @". Speech is used

    $ostly as a part o# User Inter#ace' #or creating pre de#ined or custo$ speech co$$ands.

    Leading so#t%are )endors in this #ield are /icroso#t "orporation /icroso#t oice

    "o$$and2' 8uance "o$$unications 8uance oice "ontrol2' ito Technology IT;

    oice Go2' Speereo So#t%are Speereo oice Translator2 and S ;J.

    @eople %ith disa+ilities can +ene#it #ro$ speech recognition progra$s. Speech

    recognition is especially use#ul #or people %ho ha)e di##iculty using their hands' ranging

    #ro$ $ild repetiti)e stress in

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    23/27

    paper co$$unication essentially they thin, o# an idea +ut it is processed incorrectly

    causing it to end up di##erently on paper2 can +ene#it #ro$ the so#t%are

    !. 455 I64%I7N,

    Auto$atic translation

    Auto$oti)e speech recognition e.g.' ord Sync 2

    Tele$etric e.g. )ehicle 8a)igation Syste$s2

    "ourt reporting Real ti$e oice 0riting2

    *ands #ree co$puting )oice co$$and recognition co$puter user inter#ace

    *o$e auto$ation

    Interacti)e )oice response

    /o+ile telephony ' including $o+ile e$ail

    /ulti$odal interaction23

    http://en.wikipedia.org/wiki/Ford_Synchttp://en.wikipedia.org/wiki/Hands-free_computinghttp://en.wikipedia.org/wiki/User_interfacehttp://en.wikipedia.org/wiki/User_interfacehttp://en.wikipedia.org/wiki/Home_automationhttp://en.wikipedia.org/wiki/Interactive_voice_responsehttp://en.wikipedia.org/wiki/Mobile_telephonyhttp://en.wikipedia.org/wiki/Mobile_telephonyhttp://en.wikipedia.org/wiki/Multimodal_interactionhttp://en.wikipedia.org/wiki/Ford_Synchttp://en.wikipedia.org/wiki/Hands-free_computinghttp://en.wikipedia.org/wiki/User_interfacehttp://en.wikipedia.org/wiki/Home_automationhttp://en.wikipedia.org/wiki/Interactive_voice_responsehttp://en.wikipedia.org/wiki/Mobile_telephonyhttp://en.wikipedia.org/wiki/Multimodal_interaction
  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    24/27

    @ronunciation e)aluation in co$puter aided language learning applications

    Ro+otics

    ideo ga$es ' possi+le expansion into the RTS genre #ollo%ing To$ "lancy:s ?nd

    0ar

    Transcription digital speech to text2.

    Speech to text transcription o# speech into $o+ile text $essages2

    Air Tra##ic "ontrol Speech Recognition

    8. 9 % E ,675E

    In #uture i$portant

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    25/27

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    26/27

    strengthening existing colla+orations +et%een linguists and speech engineers as %ell as

    initiating ne% ones.

    '. :I:I 70 45;

    1.. ". G. rat( enstein' Sur la raissance de la #or$ation des )oyelle s' . @hys.' olt 1' pp.

    35!3!E' 1K! .

    . *. Budley and T. *. Tarnoc(y' The Spea,ing /achin e o# 0ol#gang )on e$pelen' .

    Acoust.

    Soc. A$.' ol. ' pp. 151 177' 165E.

    3.. Sir "harles 0heatstone' The Scienti#ic @apers o# Sir "harles 0heatstone' London

    26

  • 8/13/2019 Pra a Aaaaaaaaaaaaaaa

    27/27

    Ta ylor and rancis' 1!K6.

    4. . L. lanagan' Speech Analysis' Synthesis and @erception' Second ?dition' Springer

    erlag'

    16K .

    5. . ry and @. Benes' the Besign and ;peration o# the /echan ical Speech Recogni(er at

    Uni)ersit y "ollege London' . ritish Inst. Radio ?ngr.' ol. 16' 8o. 4' pp. 11 6' 1656.

    7. T. . /artin' A . L. 8elson and * . . Qadell' Speech Recognition +y eature A+straction

    Techni&ues' Tech. Report AL TBR 74 1K7' Air orce A)ionics La+' 1674.

    27