I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I...

19
ESEARCH R EP R ORT IDIAP

Transcript of I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I...

Page 1: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

ES

EA

RC

HR

EP

RO

RT

ID

IA

P

D a l l e M o l l e I n s t i t u t efor Per eptua l Art i f i ia lIntelligen e � P.O.Box 592 �Martigny �Valais � Switzerlandphone +41� 27� 721 77 11fax +41� 27� 721 77 12e-mail se retariat�idiap. hinternet http://www.idiap. h

Improving Fa eAuthenti ation UsingVirtual SamplesNorman Poh Hoon Thian aS�ebastien Mar el a Samy Bengio aIDIAP{RR 02-40O tober 2002published in2003 IEEE International Conferen e on A ousti s, Spee h, and SignalPro essing (ICASSP'03), Se tion III, pages 233-236, Hong Kong.

a IDIAP, CP 592, 1920 Martigny, Switzerland

Page 2: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e
Page 3: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

IDIAP Resear h Report 02-40Improving Fa e Authenti ation Using VirtualSamples

Norman Poh Hoon Thian S�ebastien Mar el Samy BengioO tober 2002published in2003 IEEE International Conferen e on A ousti s, Spee h, and Signal Pro essing (ICASSP'03),Se tion III, pages 233-236, Hong Kong.

Abstra t. In this paper, we present a simple yet e�e tive way to improve a fa e veri� ationsystem by generating multiple virtual samples from the unique image orresponding to an a essrequest. These images are generated using simple geometri transformations. This method isoften used during training to improve a ura y of a neural network model by making it robustagainst minor translation, s ale and orientation hange. The main ontribution of this paper isto introdu e su h method during testing. By generating N images from one single image andpropagating them to a trained network model, one obtains N s ores. By merging these s oresusing a simple mean operator, we show that the varian e of merged s ores is de reased by a fa torbetween 1 and N . An experiment is arried out on the XM2VTS database whi h a hieves newstate-of-the-art performan es.

Page 4: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

2 IDIAP{RR 02-401 INTRODUCTION1.1 Problem De�nitionBiometri authenti ation (BA) is the problem of verifying an identity laim using a person's be-havioural and physiologi al hara teristi s. BA is be oming an important alternative to traditionalauthenti ation methods su h as keys (\something one has", i.e., by possession) or PIN numbers(\something one knows", i.e., by knowledge) be ause it is essentially \who one is", i.e., by biometri information. Therefore, it is not sus eptible to mispla ement, forgetfulness or reprodu tion. Examplesof biometri sour es are �ngerprint, fa e, voi e, hand-geometry and retina s ans. General introdu tionof biometri s an be found in [5℄.Biometri data is often noisy be ause of the failure of biometri devi es to apture the plasti natureof biometri traits (e.g. deformed �ngerprint due to di�erent pressures), orruption by environmentalnoise, variability over time and o lusion by the user's a essories. The higher the noise, the lessreliable the biometri system be omes. Current biometri -based se urity systems (devi es, algorithms,ar hite tures) still have room for improvement, parti ularly in their a ura y, toleran e to variousnoisy environments and s alability as the number of individuals in reases. The fo us of this study isto improve the system a ura y by dire tly minimising the noise by using multiple virtual samples,when multiple real samples are not available.1.2 Related work in the literatureIn the literature, to the best of our knowledge, the losest work to ours is the one reported by Kittleret al [1℄. The fundamental di�eren e is that they assume that multiple samples are available. Inreal-life situation, where a fa e image is s anned and transfered over a ommuni ation line, obtainingmultiple fa e images for ea h a ess may not be feasible. In this ase, \virtual" samples ould be used.Although there is no gain in information, in this paper, it is shown that a ura y an still be exploitedby redu ing varian e of the virtual samples. Moreover, this approa h an be easily generalised toother pattern re ognition problems.An alternative approa h to reating variations due to geometri transformation is to synthesizevirtual images from an approximated user- ustomized 3D model. This approa h, although maybemore e�e tive than the proposed method, is not onsidered here due to the possible ina ura y ofapproximating the model in the �rst pla e. Our approa h does not require su h an estimation. Therest of this paper is organised as follows: Se tion 2 explains the theoreti al bounds in the expe tedgain oming from averaging s ores; a des ription of the experiment an be found in Se tion 3; this isfollowed by on lusions.2 VARIANCE REDUCTION VIA AVERAGING2.1 Varian e redu tionLet us assume that the measured relationship between a feature ve tor xi and its asso iated s ore yi an be written as: yi = f(xi) + �i: (1)where f(�) is the true relation and �i is a random additive noise with zero mean. The mean of y overN trials, denoted as �y is: �y = 1N NXi=1 yi: (2)With enough samples, the expe ted value of y, denoted as E[y℄, whi h is estimated by the mean of y,approximates the \true" measure: E[y℄ = E[f(x)℄ +E[�℄ (3)

Page 5: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

IDIAP{RR 02-40 3= f(x): (4)Moreover, the varian e of y an be written as:Var[y℄ = 1N Var[�℄ (5)Therefore, it an be on luded that when N s ores of a single biometri sour e are averaged, noise thato urs due to lassi� ation an be redu ed by a fa tor of N . The e�e t of averaging in Equation 2 anbest be observed using syntheti ally generated data in Figure 1. Assume that in the original problem,the genuine user s ores follow a normal distribution of mean 1.0 and varian e 0.9, denoted asN (1; 0:9),and that the impostor s ores follow a normal distribution of N (�1; 0:6) (both graphs are plotted with'+'). If for ea h a ess, three on�den e s ores are available, a ording to Equation 5, the varian e ofthe resulting distribution will be redu ed by a fa tor of three. Both resulting distributions are plottedwith 'o'. Note the area where both the distributions ross before and after. This area orresponds tothe zone where minimum amount of mistakes will be ommitted given that the threshold is optimal 1.The de rease in this area means an improvement in the re ognition rate. In general, the more samples

−3 −2 −1 0 1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Scores

Genuine pdfGenuine averaged pdfImpostor pdfImpostor averaged pdf

Figure 1: Averaging s ores distribution in a two- lass problemare used, the sharper (taller and with shorter tails at both ends) both the impostors' and the lients's ore distributions be ome. The sharper they are, the lower the area where these two distributionsoverlap. The lower this area is, the lower the number of mistakes ommitted.2.2 Error redu tionThe above dis ussion is only true when s ores are orrupted by noise with zero-mean and un orrelated.In reality, one knows that s ores oming from virtual samples are dependent on the original image.What would then be the upper and lower bounds of su h a gain? Here, we refer to the work ofBishop [2, Chap. 9℄ who has shown that by averaging s ores of N lassi�ers, a ommittee ouldperform better than a single lassi�er. The assumptions were that ea h lassi�er was not orrelatedand that the error of ea h lassi�er had zero mean. He showed that:err = 1N2 NXi=1 erri (6)= 1Nmean(erri): (7)1Optimal in the Bayes sense, when (1) the ost and (2) probability of both types of errors are equal.

Page 6: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

4 IDIAP{RR 02-40where err is the error of the ommittee and erri is the error asso iated to the i-th lassi�er. Notethat the major di�eren e between Bishop's ontext and ours is that s ores are due to variation ofN lassi�ers. In our ontext, s ores are due to variation in the \virtual" samples obtained from Ngeometri transformations. The index i is referred to a sample hereinafter.Due to the false assumption of un orrelation in s ores obtained from virtual samples, the errorredu tion obtained using the mean operator will not be N as shown in Equation 7 but less. Thisequation should be rightly written as: err = 1�mean(err) (8)1 � � � N:where � an be understood as a \gain" in error redu tion. It shows that the maximum gain inaveraging s ores is N with respe t to the average performan e of ea h virtual sample. This is, inpra ti e, not attainable sin e the s ores are orrelated. The minimum gain, a ording to Equation 8is 1, whi h means that there is no gain but one does not loose in the ombination neither. This an beunderstood as follows: If the errors made by ea h virtual s ore are dependent, i.e., they make exa tlythe same error in the extreme ase (8i;j(erri = errj)), then mean(err) = erri = err , whi h impliesthat � = 1.As in the ase of ommittee of lassi�ers, by averaging N s ores from N transformed images, thegain fa tor in terms of error redu tion with respe t to a single input image is in the range [1; N ℄.Therefore, s ore averaging is a simple yet e�e tive way to in rease system a ura y.3 EXPERIMENT3.1 Database and Proto olsThe XM2VTS fa e database is used for this purpose be ause it is a ben hmark database with well-de�ned proto ols alled the Lausanne Proto ols [3℄. The XM2VTS database ontains syn hronizedimage and spee h data re orded on 295 subje ts during four sessions taken at one month intervals.On ea h session, two re ordings were made, ea h onsisting of a spee h shot and a head rotation shot.The database was divided into three sets: a training set, an evaluation set, and a test set. Thetraining set was used to build lient models, while the evaluation set was used to ompute the de ision(by estimating thresholds for instan e, or parameters of a fusion algorithm). Finally, the test set wasused only to estimate the performan e of the system.The 295 subje ts were divided into a set of 200 lients, 25 evaluation impostors, and 70 testimpostors. Two di�erent evaluation on�gurations were de�ned. They di�er in the distribution of lient training and lient evaluation data. Both the training lient and evaluation lient data weredrawn from the same re ording sessions for on�guration I (LP1) whi h might lead to biased estimationon the evaluation set and hen e poor performan e on the test set. For on�guration II (LP2) on theother hand, the evaluation lient and test lient sets were drawn from di�erent re ording sessionswhi h might lead to more realisti results. More details an be obtained from [3℄.In this database, ea h a ess is represented by only one fa e image. We an in rease the number ofimages by using geometri transformations. In this way, we obtain multiple \virtual" samples from asingle a ess. For ea h virtual image, features will be extra ted in the same way as a real fa e image.Both feature extra tion and geometri transformations are explained in se tions below.3.2 FeaturesIn the XM2VTS database, a bounding box is pla ed on a fa e a ording to eyes oordinates lo atedmanually. This assumes a perfe t fa e dete tion. The fa e is ropped and the extra ted sub-image isdownsized to a 30� 40 image. After enhan ement and smoothing, the fa e image has a feature ve torof dimension 1200.

Page 7: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

IDIAP{RR 02-40 5In addition to these normalised features, RGB (Red-Green-Blue) histogram features are used. To onstru t this additional feature set, a skin olour look-up table must �rst be onstru ted using alarge number of olour images whi h ontain only skin. In the se ond step, fa e images are �ltereda ording to this look-up table. Unavoidably, non-skin pixels are aptured as well. This noise will besubmitted to a lassi�er to dis riminate its degree of relevan e. For ea h olor hannel, a histogramis built using 32 dis rete bins. Hen e, the histograms of three hannels, when on atenated, form afeature ve tor of 96 elements. More details about this method, in luding experiments, an be obtainedfrom [4℄.3.3 Geometri TransformationsThe extended number of patterns is omputed su h that given an a ess image, N geometri trans-formations are performed. This number is al ulated as follows: N = 2 � A � B, whi h shows themirrored number of shifted and s aled fa e patterns. A = number of shifts�8+1 is the total numberof shifts, in 8 dire tions, in luding the original frame, for ea h s ale. B = number of s ales� 2 + 1 isthe total number of s ales, in 2 dire tions (zooming-in and zooming-out), in luding the original s ale.In the experiment, 4 shifts and 2 s ales are used. This produ es 330 virtual images per original image.In the following experiments, we ompared the system from [4℄ (denoted \original") to our system(denoted \averaged"). In the original system, geometri transformations were added to the trainingset only, while in the averaged system, they were also added to the evaluation and test sets.The training set is used to train an MLP for ea h lient and the evaluation set is used to stop thetraining using an early-stopping riterion. At the end of training, the trained MLP model is applied onthe evaluation set again to estimate the global threshold that optimises the Equal Error Rate (EER).On e all parameters are set, in luding threshold, the trained MLP model is applied on the test set.Thus the obtained Half Total Error Rate (HTER) on the test set is said to be a priori, while if thethreshold was optimising EER on the test set, it would be alled a posteriori. Of ourse, the a prioriresults are more realisti . In the experiment, the optimised lient dependent MLPs had 20 hiddenunits ea h.3.4 ResultsThe experiments are arried out on LP1 and LP2 on�gurations of XM2VTS database. The resultsare shown in Tables 1 and 2. Odd lines in these tables show the HTERs of the original approa h whileeven lines show the HTERs after averaging virtual s ores. In all omparisons, the improvements areobvious. The HTERs in Table 1 are a posteriori and thus not realisti , but nevertheless give insightsof the expe ted improvements. The HTERs in Table 2 are a priori. The orresponding DET urvesof Table 2 are shown in Figure 2. As expe ted, the performan e obtained by averaging is alwayssuperior. Moreover, to the best of our knowledge, the newly obtained a priori results appear to bethe best published ones on this ben hmark database.Table 1: Performa e of averaging s ores versus original approa h based on a posteriori sele ted thresh-olds Data sets Models FA[%℄ FR[%℄ HTER[%℄LP1 Eval Original 1.667 1.667 1.667LP1 Eval Averaged 1.333 1.333 1.333LP2 Eval Original 1.250 1.250 1.250LP2 Eval Averaged 1.107 1.000 1.054LP1 Test Original 1.817 1.750 1.783LP1 Test Averaged 1.692 1.750 1.721LP2 Test Original 1.726 1.750 1.738LP2 Test Averaged 1.514 1.500 1.507

Page 8: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

6 IDIAP{RR 02-40Table 2: Performa e of averaging s ores versus original approa h based on a priori sele ted thresholdsData sets Models FA[%℄ FR[%℄ HTER[%℄LP1 Test Original 1.230 2.750 1.990LP1 Test Averaged 1.474 1.750 1.612LP2 Test Original 1.469 2.250 1.860LP2 Test Averaged 1.285 1.750 1.5180.10.2

0.5

1

2

5

10

20

40

0.10.2 0.5 1 2 5 10 20 40

FR

[%]

FA [%]

DET curve

LP1 Test (Mean) LP1 Test (Original)

(a) LP1 on�guration 0.10.2

0.5

1

2

5

10

20

40

0.10.2 0.5 1 2 5 10 20 40

FR

[%]

FA [%]

DET curve

LP2 Test (Mean) LP2 Test (Original)

(b) LP2 on�gurationFigure 2: Test sets on XM2VTS database3.5 Analysis of virtual distribution s oresOne insight to examine the e�e tiveness of this method is by looking at the probability densityfun tion (pdf) of the 330 virtual s ores with respe t to a false reje tion and a orre t a eptan e.This is shown in Figure 3. When given an upright-frontal image of a lient within a ertain alloweddegree of transformation, one obtains a sharply pi ked pdf (with very low varian e) around the mean1. The MLP asso iated with lient 006, in this ase, was trained to give a response of 1 for a genuinea ess and �1 for an impostor a ess. When the original image is \out" of the allowed transformationrange, the pdf of virtual s ores has a large varian e and a mean displa ed away from 1. Note thatthe logarithmi s ale for the probability is used in the graph to amplify the hanges in distributiona ross the s ore range [�1; 1℄.While a single image normally produ es only one s ore, a set of virtual images has the advantageof produ ing another information: the s ore distribution. One way to measure this distribution isby its varian e. For instan e, for the example above, the varian e for the orre t a eptan e ase is1.5670e-05 while the varian e for the false reje tion ase is 0.0181. Clearly, varian e of virtual s ores an give supplementary information that the original approa h annot. In general, the pdf (not justthe varian e) ould probably provide useful insights to improve this method further.3.6 Varian e and error redu tionThis se tion tries to examine the relationship between the redu tion of both varian e and error. Thehypothesis here is that, whenN (N = 330 in our ase) virtual s ores are averaged, Equation 5 says thatthe redu tion is by a fa tor of N , assuming that the s ores are independent. They are unfortunatelynot in our ase. To measure the degree of independan e, we introdu e a varian e redu tion ratio,de�ned as: � = V ar[yv℄V ar[y℄ (9)

Page 9: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

IDIAP{RR 02-40 7(a) False reje tion

(b) Corre t a eptan e −1 −0.5 0 0.5 1 1.510

−3

10−2

10−1

100

Scores

Pro

babi

ligy

in n

atur

al lo

g sc

ale

False rejection pdfCorrect acceptance pdf

( ) Corresponding histogramsFigure 3: Examples of \bad" and \good" photos and their orresponding distribution of virtual s oresfor lient 006where y are either lient or impostor s ores from the original method and yv are either lient orimpostor virtual s ores. These values are shown in Table 3.Table 3: The gain fa tor � between the s ores of virtual samples and that of original samples.Data sets Pdfs of Gain fa tor �a ess type LP1 LP2Eval Client pdf 1.2716 1.2561Eval Impostor pdf 1.0960 1.0769Test Client pdf 1.1689 1.2675Test Impostor pdf 1.1642 1.0507In all ases, � > 1. Unfortunately, � is very lose to 1 and very far from N . This is expe ted be auseof strong depenan y of virtual s ores. In all ases, varian es of ea h data set ( lient and impostora esses) are redu ed systemati ally.How about the gain fa tor of HTER? These values are readily available from Table 1 by dividingthe odd lines HTER by the orresponding even line HTER. The de�nition of � an be derived fromEquation 8. The error redu tion for ea h set of Evaluation and Test data in both LP1 and LP2 on�gurations are shown in Table 4.Table 4: The gain fa tor of error redu tion a ording to Table 1Data sets Gain fa torLP1 Eval 1.251LP2 Eval 1.186LP1 Test 1.010LP2 Test 1.153Note that the varian e redu tion (Table 3) and error redu tion (Table 4) are somewhat propor-tional. In general, if there is a redu tion in varian e of lient or impostor pdf, there will be a redu tion

Page 10: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

8 IDIAP{RR 02-40in lassi� ation error (spe i� ally HTER in our ase). To our opinion, it is ne essary to investigatethis \intuition" further.As an be observed, these gain fa tors are very lose to the lower bound, i.e., 1, whi h means thatthe gain is very little. This may be due to the high orrelation among virtual s ores. Nevertheless,the fa t that improvement is guaranteed makes our approa h still very attra tive.Finally, an appropriate question to ask is: by how mu h the virtual samples approa h methodwins over the original real samples approa h? To answer this question, we literally omputed the totalerror (sum of false a eptan e and false reje tion errors) for both methods. The di�eren e betweenthese two erros, i.e., err(�) � errv(�) are plotted in Figure x.4 CONCLUSIONBy applying N geometri transformations to a given original fa e image a ess, it is shown that one ould redu e the varian e of the original s ore by a fa tor of N . Furthermore, by taking into a ountthe assumption that these N image samples are dependent on the original image, the lassi� ationerror, with respe t to the original method is shown to redu e by a fa tor between 1 and N .To put in a formal framework, our proposed approa h an be summarised as:y = 1jT jXt2T f(h(g(x; t))) (10)instead of y = f(h(x)) for the test set, where, t 2 T is a set of geometri transformation parametersapplied by g (the transformation fun tion) on the feature ve tor x, h is a feature extra tion fun tionand f is a trained lassi�er on h(f(x; t)) over t 2 T with x sampled from a training set. Equation 10explains why this method is robust against minor geometri transformations: it is integrated over thespa e of these transformations and hen e a hieves invarian e over this spa e.This method has the advantage of being simple to implement. Furthermore, it does not requiremultiple real examples. This makes it easily extendable to many general lass� ation and regressionproblems. The only added omplexity during testing is proportional to the number of arti� iallygenerated samples, given that a suitable transformation for a given data set an be de�ned.The future work will onsist of proposing a theoreti al model to understand the ne essary riteriaand onditions for averaging samples to work. Right now, the relationship between varian e redu tionand error redu tion have not thoroughly been investigated. Su h analysis will eventually show the riteria of su ess or failure of this approa h, i.e., when the performan e degrades.Referen es[1℄ J. Kittler, G. Matas, K. Jonsson, and M. U. R. San hez. Combining Eviden e in Personal IdentityVeri� ation Systems. Pattern Re ognition Letters, 18(9):845{852, September 1997.[2℄ C. Bishop, Networks for Pattern Re ognition, Oxford University Press, 1999.[3℄ J. L�uttin, Evaluation proto ol for the XM2FDB Database (Lausanne Proto ol), IDIAP Resear hReport, COM-05, 1998.[4℄ S. Mar el and S. Bengio, Improving Fa e Veri� ation using Skin Color Information, Pro eedingsof the 16th International Conferen e on Pattern Re ognition, 2002.[5℄ A.K. Jain and R. Bolle and S. Pankanti, Biometri s: Person Identi� ation in Networked So iety,Kluwer Publi ations, 1999.

Page 11: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

IDIAP{RR 02-40 9APPENDIXThis se tion is a follow-up based on the observation reported before. It des ribes several attemptsto ombine virtual s ores. These methods in ludes using median operator and GMM (GaussianMixture Model) and entropy methods. The result based on the mean operator is analysed in details,parti ularly in omparaison with the original approa h (using only real samples).5 Combining virtual s oresIt has been shown that the averaging s ores from virtual samples an in rease the performan e.However, it is not lear how the distribution or density of s ores from virtual samples an be used.This se tion proposes several methods to do so:� Median operator. It is the losest operator to the mean operator and is known to be robustagainst out-liers.� Entropy method based on global models. The pdf of a global lient and impostor s ores are �rstestimated. These pdfs are then ompared to the pdf of a given a ess. The pdf of a given a ess an be al ulated from all the virtual s ores asso iated to an a ess request. The authenti ationtask then be omes mat hing of two pdfs using the Kullba k distan e.� Entropy method based on lo al models. This is similar to the global models ex ept that lo almodels are used. Lo al models means models estimated only from lient or impostor a essesasso iated to a given user-spe i� lassi�er.� Gaussian Mixture Model. GMM is very useful for mat hing sequen es whi h are assumed to havebeen derived from identi ally and independently distribution. Virtual s ores an be regarded as oming from a ertain form of distribution. This distribution an be estimated using a mixture(weighted sum) of Gaussians. During an a ess, set of virtual s ores that is obtained an beregarded as a sequen e. This sequen e is then mat hed to the GMM omputed a priori toobtain the likelihood. Two GMM models are needed: GMMs asso iated to the lient and to theimpostor.The entropy methods and GMM are explained in the following se tions.5.1 Entropy methodThe entropy method requires that the density of the data be estimated. There are several ways toestimate the density a ording to [2, Chap. 2℄: histrogram, Parzen window and GMM. These methodsre eive a set of data and output a density fun tion. Histogram su�ers from having the need to de�nethe length of ea h bin. Larger bins may produ e smoother density estimate but does not give a urateestimate on the density of a given value y. Thus, Parzen window and GMM are used. Parzen windowis des ribed below and GMM is dis ussed further.5.1.1 Density estimation using Parzen windowGiven a set of s ores yi; i = 1; : : : ; N , its density fun tion an be al ulated using:~p(y0) = 1N NXi=1 1hH �y0 � yih � (11)where H(u) is a kernel fun tion taking a s alar u. When H(u) is a Gaussian fun tion, Equation 11be omes: ~p(y0) = 1N NXi=1 1(2�h2)1=2 exp�y0 � yih � (12)

Page 12: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

10 IDIAP{RR 02-40In pra ti e, y0 is sampled within two bounded values. For our ase, y0 is bounded within [�1:2; 1:2℄be ause the MLPs used are trained to give s ores between �1 and 1. Therefore, values outside therange are not very useful. 1000 samples of y0 within the said range are sampled for ea h lient andimpostor pdf over the evaluation sets of LP1 and LP2. The parameter h, whi h orresponds to thevarian e of distribution, ontrols the smoothness of the resultant pdf.We have attemped to use ross validation to estimate an optimal h value. In ea h fold of ross-validation, one held-out set is used for test while the rest are used for training. The training set yiis used to estimate Qj ~p(y0j) with y0j as s ores from the test set. The goal is to use several h valuesthat minimises the produ t. Unfortunately, su h ross-validation annot be established when h takeson smaller and smaller values (equivalent to higher apa ity) be ause y and y0 are very similar. Infa t, these s ores (y and y0) are extremely on entrated at +1 for the lient and �1 for the impostors ores. As a onsequen e, h is �xed arbitrarily to 0:2.In a tual implementation, the negative sum of logairthm is used to over ome the omputation pre- ision problem, i.e., �Pj ln ~p(y0j). This is be ause the produ t of several small values will eventuallylead to zero in �nite pre ision.The larger h is, the smoother the resultant pdf. Note that this method is similar to histogramex ept that it gives a smoother estimation of pdf. Furthermore, another major di�ern e is that theParzen window method have bins entered around the data point, ontrary to histogram whi h has a�xed bin. The resultant pdfs are shown in Figure 4. Note that in both proto ol on�gurations, the

−1.5 −1 −0.5 0 0.5 1 1.50

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01

Scores

Pro

babi

lity

dens

ity

Original client pdfOriginal impostor pdfExtended client pdfExtended impostor pdf

(a) LP1 −1.5 −1 −0.5 0 0.5 1 1.50

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01

Scores

Pro

babi

lity

dens

ity

Original client pdfOriginal impostor pdfExtended client pdfExtended impostor pdf

(b) LP2Figure 4: The lient and impostor pdfs of the evaluation sets of LP1 and LP2 on�gurationpdf of the original lient and that of its extension (with virtual method) are not omparable be ausethe extended method has 330 times more data than the original5.1.2 Entropy method based on global modelsEntropy is used to ompare two pdfs from a set of virtual s ores Y . In our ase, one pdf omes froma global model ( lient or impostor), denoted as p(Y ), and the other pdf omes from another set ofvirtual s ores, denoted as q(Y ). Both pdfs are estimated by the Parzen window method des ribedearlier. Both pdfs are sampled at same i-th lo ation in the s ore spa e. This an be denoted as yi.The entropy of a given a ess distribution q(Y ) an then be de�ned as:L(p; q) = �Xi p(yi)lnq(yi)p(yi) : (13)Entropy an be regarded as a distan e as to how mu h q(y) is similar to p(y) but not the otherway round, i.e., this distan e is not symetri . This alone does not give dis riminative information. To

Page 13: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

IDIAP{RR 02-40 11do so, entropy of a lient and impostor model should be used together. Let L(pw1; q) be the entropyof q(y) with respe t to a lient model and L(pw2; q) be that of q(y) with respe t to an impostor model.Then the distan e between these two entropy an be de�ned as:4 = L(pw2; q)� L(pw1; q) (14)4 > 0 means that the entropy of an impostor model is more than that of a lient. Therefore4 > 0 re e ts how likely a set of virtual s ores belong to a lient.5.1.3 Entropy method based on lo al modelsInstead of using two global models to represent a lient and impostor s ore pdf, one an also represent a lient and impostor s ore pdf f or ea h lient. This an be done by repla ing pw1 and pw2 in Equation 14to two lo al models as follows: 4 = L(pnw2; q)� L(pnw1; q); (15)where n is an index unique to a lient.One possible problem with this approa h is that one does not have enough data to estimate pw1 orre tly, be ause lient a esses are limited for training.5.1.4 Gaussian Mixture ModelGiven a laim for genuine lient w1's identity and a set of N virtual s ores Y = fyigNi=1 supportingthe laim, the average log likelihood of the laimant being the true laimant is al ulated using:L(Y j�w1) = 1N NXi=1 log p(yij�w1) (16)where p(Y j�) = MXj=1mj N (y;�j ; �j) (17)and � = fmj ; �j ; �jgMj=1 (18)Here �w1 is the model for person w1. M is the number of mixtures, mj is the weight for mixture j(with onstraintPMj=1mj = 1), and N (y;�; �) is a multi-variate Gaussian fun tion with mean � andvarian e �: N (y; ~�; �) = 1p(2��2) exp ��(y � �)22�2 � (19)The number of mixture of Gaussian omponents are estimated using 5-fold ross-validation froma giving training set.The impostor model is onstru ted in a similar way a oring to Equations 16, 17 and 18, with Yas all s ores belonging to impostors w2.An opinion on the laim is found using:4(Y ) = L(Y j�w1)�L(Y j�w2) (20)The opinion re e ts the likelihood that a given laimant is the true laimant (i.e., a low opinionsuggests that the laimant is an impostor, while a high opinion suggests that the laimant is the true laimant).

Page 14: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

12 IDIAP{RR 02-40Table 5: Di�erent ombination methods of virtual s ores on LP1.Method HTEREvaluation Test a posteriori Test a prioriOriginal 1.667 1.783 1.875Mean 1.333 1.721 1.612Median 1.667 1.750 1.667GMM 1.518 1.741 1.709Global Entropy 1.333 1.734 1.606Lo al Entropy � 0.499 3.000 2.186(*) indi ates biased estimate.Table 6: Di�erent ombination methods of virtual s ores on LP2.Method HTEREvaluation Test a posteriori Test a prioriOriginal 1.250 1.738 1.737Mean 1.054 1.507 1.518Median 1.238 1.750 1.547GMM 1.034 1.500 1.493Global Entropy 1.218 1.500 1.559Lo al Entropy � 0.251 2.500 2.043(*) indi ates biased estimate.5.2 Experiment ResultsUsing the dis ussed methods, experiments are arrried out on LP1 and LP2 proto ols. The resultsare shown in Table 5Ex ept the lo al entropy method, all the methods improve the original approa h. GMM seems toperform the best in LP2 proto ol but among the worse in the LP1 proto ol. It is therefore diÆ ult tojudge the best ombination method. It is surprsing to see that the mean operator whi h is a simplemethod, is among the best way to merge the virtual s ores in both proto ols.6 Half total error rate and lassi� ation error rate revisitedIn biometri authenti ation, HTER is often used as an important riteria. In this se tion, we wish to larify between these two types of errors as evaluation riteria.Let us de�ne false reje tion with respe t to a threshold � as follows:FR(�) = kfyjy 2 w1 ^ y < �gk; (21)where w1 is lient lass and k � k is the ardinality (number of elements) of �. Similaryly, falsea eptan e with respe t to a threshold � an be de�ned as:FA(�) = kfyjy 2 w2 ^ y � �gk; (22)where w2 is an impostor lass.In other words, a lient s ore is onsidered a false reje tion ase when it is below a threshold.Similarly, an impostor s ore is onsidered a false a eptan e ase when it is above or equal a giventhreshold.

Page 15: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

IDIAP{RR 02-40 13We now try to relate these two equations to its probability density, assuming that it is known forboth the lient and impostor lasses. FR(�) an be written as:FR(�) = kw1k Z ��1 p(yjw1)dy: (23)Similarly, FA(�) an be written as:FA(�) = kw2k Z 1� p(yjw2)dy: (24)A ording to Bayesian rule, the probability of ommiting error, (denoted as ERR hereinafter), bytaking into a ount of lass prior, an be written as:ERR(�) = Z ��1 p(yjw1)dy � P (w1) + Z 1� p(yjw2)dy � P (w2): (25)An intuitive way to understand the probability of error is that the density of false reje tion and thatof false a eptan e are weighted by P (w1) and P (w2), where the lass priors (weights) sum to one,i.e., P (w1) + P (w2) = 1.By making use of Equations 23 and 24, we an rewrite Equation 25 as:ERR(�) = FR(�)kw1k � P (w1) + FA(�)kw2k � P (w2): (26)In biometri authenti ation, P (w1) and P (w2) are often unknown, or assumed to be unknown (su his the ase during testing: P (w1) are P (w2) are known under laboratory ondition but are deliberatelyassumed to be unknown). In either situation, P (w1) = P (w2) = 12 . Therefore, Equation 25 an besimpli�ed as: HTER(�) = 12 �FR(�)kw1k + FA(�)kw2k � : (27)This error is ommonly alled Half Total Error Rate (HTER).To ompare HTER and probability of lassi� ation error (ERR), ERR an be rewritten as:ERR(�) = FR(�)jjw1jj � jjw1jjjjw1jj+ jjw2jj + FA(�)jjw2jj � jjw2jjjjw1jj+ jjw2jj= FR(�) + FA(�)jjw1jj+ jjw2jj ; (28)using the knowledge that P (w1) = jjw1jjjjw1jj+jjw2jj and P (w2) = jjw2jjjjw1jj+jjw2jj .To give an idea how these two errors behave, we have generated lient and impostor sets of s oresarti� ially. The lient has a density distribution of N (1; 0:3) (mean 1; varian e 0.3) while the impostorhas a density of N (�1; 0:2). These two distribution fun tions are shown in Figure 5(a).For the �rst ase ( alled balan ed lass on�guration), the lient and impostor sets have 1000a ess s ores respe tively. For the se ond ase ( alled unbalan ed lass on�guration), the lient sethas 1000 s ores while the impostor set has 10000 a esses, i.e., unbalan ed by a fa tor of 10. TheHTER and ERR urves (as a fun tion of threshold �) of these two ases are shown in Figure 5(b) and( ). Note that, due to unblana ed lass prior, the ERR is a�e ted while HTER is not.Note that, in reality, errors ommitted in FA and FR have di�erent osts. Let CFA and CFR bethe ost of FA and FR, respe tively. Then, Equations 27 and 28 an be written in terms of ost as:CHTER(�) = 12 �FR(�)jjw1jj � CFR + FA(�)jjw2jj � CFA� (29)

Page 16: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

14 IDIAP{RR 02-40−3 −2 −1 0 1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Scores

Pro

babi

lity

clientimpostor

(a) pdf −3 −2 −1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

Scores

HT

ER

(θ)

HTER(θ) of balanced classHTER(θ) of unbalanced class

(b) HTER(�) −3 −2 −1 0 1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Scores

EE

R(θ

)

EER(θ) of balanced classEER(θ) of unbalanced class

( ) ERR(�)Figure 5: Arti� ially generated s ores and their HTER(�) and EER(�) urvesand CEER(�) = FR(�)� CFR + FA(�)� CFAjjw1jj+ jjw2jj (30)respe tively.These two types of error (or ost) have an important impa t on our result, to be des ribed in thenext se tion.7 Analysis of results using the mean operatorBy using the HTER riteria from Equation 27, we expli itly al ulated the HTER as a fun iton of� for the evalluation set of LP1 on experiment using original samples and virtual samples ombinedwith the mean operator. The HTER urves is shown in Figure 6(a). This graph has a form similarto Figure 5(b), as expe ted. However, it gives little information on how mu h one method wins overthe other method. To visualise this information, we introdu e the di�ren e of HTER as:4(�) = CORL(�) � CV IR(�); (31)where CORL is the ost of using original samples and CV IR is the ost of using virtual samples.For both ases, the ost ould be evaluated using HTER riterion (CHTER(�)) or ERR riterion(CERR(�)). Distin tions are made here be ause they give di�erent results.7.1 Equal ost assumptionIn the following se tion, the osts of FA and FR are assumed equal, i.e., 1. Di�erent values of these osts will be used later.The di�eren e a ording to HTER riteria, 4HTER(�), is shown in Figure 6(b). Figure 6( ) showsa zoom-in version of (b). The blue ir les show the position where 4HTER(�) is positive, i.e., whereCV IRHTER(�) is smaller than CORLHTER(�), whi h is desirable.When using the ERR as riteria, a ording to Equation 28, the ERR urves for both the originaland virtual methods are shown in Figure 6(d). As expe ted, it is similar to Figure 5( ), with their tailsnot easily per eived due the highly unbalan ed lass prior. In this parti ular data set (LP1 Evaluationset), the lient set has 600 s ores and the impostor set has 40000 s ores.The ost di�eren e of both the original and virtual methods, as a fun tion of threshold � are shownin Figure 6(e). The zoom-in version of it is shown in Figure 6(d).Comparing HTER (Figure 6(a- )) and EER (Figure 6(d-f)) riteria, one an observe that thevirtual methods wins over the original method using the ERR riteria be ause the winning positions

Page 17: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

IDIAP{RR 02-40 15are almost always ontinuous within two bounds [a; b℄, where a > �1 and b < 1. This is unfortunatelynot the ase for HTER riteria.−1.5 −1 −0.5 0 0.5 1 1.50

0.2

0.4

0.6

0.8

1

1.2

Scores

HT

ER

OringialVirtual

(a) HTER −1.5 −1 −0.5 0 0.5 1 1.5−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

Scores

HT

ER

HTER differencewinning positions(b) Di�. of HTER −1 −0.5 0 0.5 1

−0.01

−0.008

−0.006

−0.004

−0.002

0

0.002

0.004

0.006

0.008

0.01

Scores

HT

ER

HTER differencewinning positions

( ) Zoom-in of (b)−1.5 −1 −0.5 0 0.5 1 1.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Scores

Cla

ssifi

catio

n er

ror

rate

OringialVirtual

(d) ERR −1.5 −1 −0.5 0 0.5 1 1.5−1800

−1600

−1400

−1200

−1000

−800

−600

−400

−200

0

200

Scores

Num

ber

of e

rror

s

ER differencewinning positions(e) Di�. of ERR −1 −0.5 0 0.5 1

−50

−40

−30

−20

−10

0

10

20

30

Scores

Num

ber

of e

rror

s

ER differencewinning positions(f) Zoom-in of (e)Figure 6: Comparaison of the original and virtual methods using mean operator using HTER and lassi� ation error (ERR) riteria based on the evaluation data set of LP1. There are 40000 impostora esses omparing to 600 lient a esses.7.2 Explanation for winning bound in ERR and HTERWe believe that these winning positions bounded in [a; b℄ for EER riterion are not a onin iden e.The same is true for the dis ontinued winning positions for the HTER riterion.To analyse this behaviour, it is desirable to know \how mu h ontribution a FA and a FR is to theoverall ost". This quantity is just the derivative of the ost. For both the HTER and ERR riteria,their derivatives an be al ulated from Equation 29 and 30 as:ÆCHTER(�)Æ� = CFR2jjw1jj ÆFR(�)Æ� + CFA2jjw2jj ÆFA(�)Æ� (32)and ÆCERR(�)Æ� = CFRjjw1jj+ jjw2jj ÆFR(�)Æ� + CFAjjw1jj+ jjw2jj ÆFA(�)Æ� (33)respe tively.In biometri appli ation, jjw2jj � jjw1jj. If everything else osidered equal, i.e., CFA = CFR = 1and , j ÆFA(�)Æ� j � j ÆFR(�)Æ� j, for the ase of HTER riterion, in rease of one FR ontributes 12jjw1jj to the ost while in rease of one FA ontributes 12jjw2jj . Obviously, ontribution of FA is downplayed by thefa tor 12jjw2jj be ause 12jjw2jj � 12jjw2jj .This minimum ost an be found by setting the derived ost fun tion to zero. We will studyonly the HTER riterion be ause it is more relevant in this appli ation. Setting Equation 32 to zero,together with Equation 23 and 24 gives:0 = CFR2jjw1jj ÆFR(�)Æ� + CFA2jjw2jj ÆFA(�)Æ�

Page 18: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

16 IDIAP{RR 02-400 = CFR2jjw1jj ÆÆ� kw1k Z ��1 p(yjw1)dy!+ CFA2jjw2jj ÆÆ� kw2k Z ��1 p(yjw2)dy!0 = CFR2 p(��jw1) + CFA2 p(��jw2) (34)Indeed, there exists one single threshold �� that optimises the ost riteria. The wider the s orespa e where the virtual method wins over the original method (the winning positions), the higher theprobability that the virtual method will win. This is be ause �� will have higher probability of fallinginto one of these winning positions.7.3 Unequal ost assumptionWhat if the ost of FA and FR are di�erent? From Equation 32, intuitively, one ould predi t thathigh CFA will in rease favourably the ontribution of ÆFA(�)Æ� whi h is downplayed by very large jjw2jj.Indeed, this is often the ase be ause false a eptan e is often very serious in high se urity appli ation.A ording to the NIST standard, CFA = 10 while CFR = 1.Using these onvention, we al ulated the ost a ording to HTER for the ase (CFA = 10,CFR = 1) and (CFA = 10, CFR = 1). HTER riterion is used be ause it is a more realisti andrelevant riterion in biometri authenti ation then the ERR riterion. The ost urves are shown inFigure 7.In real appli ation, the threshold � takes on a value that optimises best the ost fun tion. We al ulated the optimal threshold �� together with its orresponding minimum ost of HTER. Theresults are shown in Table 7.Table 7: Di�erent ombination methods of virtual s ores on the evaluation set of LP1.CFR CFA Original Method Virtual MethodMin. ost �� Min. ost ��1 1 0.0313 -0.5910 0.0280 -0.65031 10 0.0641 0.2322 0.0635 0.075910 1 0.1343 -0.9754 0.1095 -0.9221On all three di�erent assumption of osts of FA and FR, the virtual method seems to be robust.Although the virtual method does not garantee to win over the original method at all s ore spa e, itis at least better when using the optimal threshold.It is indeed ex iting to observe in Figure 7(a) and (b) how the virtual method wins over the originalmethod. High ost of FA gives favourable result to the virtual method. Inversely, high ost of FR givesfavourable result to the orignal method. However, even in this disadvantage situation, Figure 7(d)shows that the virtual method still wins over the original method with a very narrow bound of [a; b℄values.It should be emphaised here that the threshold � does not take on any value. It takes a spe i� value that minimises the ost. As long as this optimal threshold, ��, falls within [a; b℄, then the virutalmethod will always be bene� ial.One possible explanation to why there exist a bound [a; b℄, often ontinuous, but not ne essarilyso, where the virtual method will win over the original method is due to the redu tion of varian e inusing multiple virtual samples omparing to the original method. When varian e redu es for both the lient and impostor pdfs, the peak of distribution will be ome higher than those of the original pdfs.The tails, on the other hand, will be longer and thiner omparing to the original pdfs. As a result, theoverlaping regions, where errors are made, redu es. When omputing the di�eren e between the ostfun tion of the original and the virtual method, the virtual method wins over the original method atthe area (the winning positions) where both the lient and impostor overlaps. This intutively shows

Page 19: I D I A P - pdfs.semanticscholar.org · E S E A R C H R E P R O R T I D I A P D a l l e M o l l e I n s t i t u t e f o r P e r c e p t u a l A r t i f i c i a l I n t e l l i g e

IDIAP{RR 02-40 17

−1.5 −1 −0.5 0 0.5 1 1.50

2

4

6

8

10

12

Scores

CH

TE

R (

CF

R=

1, C

FA=

10

OringialVirtual

(a) Cost (CFR = 1, CFA = 10) −1.5 −1 −0.5 0 0.5 1 1.5−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

Scores

Diff

eren

ce o

f cos

t of H

TE

R

Difference of cost of HTERwinning positions

(b) Di�. of ost in (a)

−1.5 −1 −0.5 0 0.5 1 1.50

1

2

3

4

5

6

7

8

Scores

CH

TE

R (

CF

R=

10, C

FA=

1

OringialVirtual

( ) Cost (CFR = 10, CFA = 1) −1.5 −1 −0.5 0 0.5 1 1.5−4.5

−4

−3.5

−3

−2.5

−2

−1.5

−1

−0.5

0

0.5

Scores

Diff

eren

ce o

f cos

t of H

TE

R

Difference of cost of HTERwinning positions(d) Di�. of ost in ( )Figure 7: Comparaison of the original and virtual methods merged using mean operator a ording tothe ost of HTER, based on the evaluation data set of LP1why the virtual method is useful. In pra ti e, we found that su h winning positions are not often ontinuous due to the di�erent lass prior and ost of FA and FR, a ording to the HTER riteria.It is well known in the problem of regression that redu tion of varian e gives more a urate outputfun tion. In the problem of two- lass lassi� ation as thoroughly studied here, redu tion of varian eby means of averaging virtual samples does leads to improved lassi� ation performan e in both EERand HTER riteria. Furthermore, the gain in EER riteria is more onsistent (better) than thegain in HTER. In addition to this obversavtion, high ost of FA in biometri authenti ation favoursthis virtual method. As a on lusion, the virtual method is an e�e tive way of improving a generalbiometri authenti ation system when only one sample is available.A knowledgementThe authors wish to thank the Swiss National S ien e Foundation for supporting this work throughthe National Centre of Competen e in Resear h (NCCR) on \Intera tive Multimodal InformationManagement (IM2)". Spe ial thanks go to Christine Mar el who provided the trained MLP models.