Post on 01-Dec-2014
description
1
Sketch-Finder – A New Approach for Sketch-Based
Image Retrieval
Carlos Alberto F. Pimentel Filho fragapimentel@gmail.com
Arnaldo de Albuquerque Araújo (UFMG) Michel Crucianu (CNAM)
2
Introduction
Content-Based Image Retrieval (CBIR)
Sketch-Based Image Retrieval (SBIR)
Mind-Finder Approach
Sketch-Finder Approach
Experiments
Conclusion
Future Work
3
Content-Based Image Retrieval
Query-by-Sketch:
Query-by-Painting:
Query-by-Example:
Query-by-Icon:
Query-by-Text: 3
4
Sketch-Based Image Retrieval
SBIR fills two gaps in image retrieval (i) Allows specification details like object position,
scale and rotation. (ii) Allows image retrieval when there is no example
image to use. Our goal is: to retrieve in large datasets images
visually similar to the query sketch object's shape at similar scale, position and rotation.
5
Sketch-Based Image Retrieval Why do We Care?
6
Web Image Retrieval Personal Image Retrieval
Mobile Image Retrieval Video Retrieval
Sketch-Based Image Retrieval
Query-by-Sketch:
osition Sensitive:
Object's shape at similar scale, position and rotation. Approaches:
Mind-Finder (EI)
Sketch-Finder
Compact Hash Bits
Object Sensitive:
Object's shape at any scale, position and rotation
Approaches (BoW):
HOG
GF-HOG
FISH
SYM-FISH
Mind-Finder (Edgel-Index)
9
Edgel-Index
* Compares matching of edgels
* Huge number of edgels for big dataset. * Edgel:
10
Sketch-Finder Image processing flow (dataset): Query flow:
11
Contour Detection & Threshold
Clique para adicionar texto
11
12
Orientation and Dilation
Clique para adicionar texto
12
13
Wavelet Transform
Clique para adicionar texto
Wedgel: Contour signature: set of wedgels
14
Similarity Measure
15
Why Wavelet Transform?
Clique para adicionar texto
16
Indexing Structure
Clique para adicionar texto
16
17
Dataset Evaluation
For evaluating we are comparing Edgel-Index [1] with Sketch-Finder
* Paris Dataset: 6412 images [2] * ImageNet Dataset: Subset of 535K images [3]
[1] Yang Cao et al, Edgel index for large-scale sketch-based image search. [2] Visual Geometry Group - http://www.robots.ox.ac.uk/~vgg/data/parisbuildings/index.html [3] ImageNet: http://www.image-net.org/
Genertic Algorithm
Genertic Algorithm
Population is a set of 100 Selection of the best results (mAP20)
Crossover
Mutation
Some user sketches
Group (VGG1) from Flickr.To collect the sketches for the Paris dataset queries, we
asked some voluntaries to draw one sketch for each one ofthe 11 categories of Paris landmarks present in the dataset(La Defense, Tour Ei↵el, Hotel des Invalides, Musee du Lou-vre, Moulin Rouge, Musee d’Orsay, Notre Dame, Pantheon,Pompidou, Sacre Cœur and Arc de Triomphe). We selected10 users composing a set of 110 sketches. The sketchesand the ground-truth are available2. Fig. 7 presents somesketches collected for the Paris dataset query evaluation.
Also, the Paris dataset was used to compare the e↵ec-tiveness of our approach with the sketch-finder [9] and themind-finder [3]. This e�ciency was evaluated consideringthe precision of z best rank position, and in this paper weused the 20 best positions as in [17].
Figure 7: Examples of the Paris sketch dataset.
To evaluate the e�ciency of [9], [3] and our approach, weused the CPU time and I/O in a big dataset with more than535K images. These images were issued from ImageNet3 asdescribed in [9], and we performed the experiments with 75queries.
5.1 Parameter Relevance EvaluationTo discover the most relevant parameters of our approach,
we applied the 2k Factorial Design [12]. We have six para-meters or factors and, as in the 2k Factorial Design eachfactor of k has two alternatives levels, the higher and lowervalue of the factor, our analysis has 26, or 64 possibilitiesconfiguration of parameters combinations.
The parameters and the e↵ect of each one are describedin the following:
a) radius map growth for the first wavelet trans-form: this parameter corresponds to the first grown edgelmap Gr1
✓ pattern used for the wavelet transform;b) radius map growth for the second wavelet trans-
form: this parameter corresponds to the second Gr2✓ used
for the wavelet transform;c) radius map growth for the third wavelet trans-
form: this parameter corresponds to the third Gr3✓ used for
the wavelet transform;d) radius map growth for pixel matching (OCM):
this parameter corresponds to the radius rOCM used in theOCM. In our approach, 0 meant to not use the pixel match-ing, i.e., only measure the wavelet similarity (WQ,T );
e) number of wavelet coe�cients: this parameter cor-responds to the number of wavelet coe�cients used to repre-sent the most discriminant information of each grown edegel1Visual Geometry Group – http://www.robots.ox.ac.uk/˜ vgg/data/parisbuildings/index.html2Paris sketches and ground-truth – https://sites.google.com/site/sketchretrieval/3ImageNet – http://www.imagenet.org/
Table 1: 2k Factorial design model
Factor L H R
a) Edgel map 1 3 15 13.3%
b) Edgel map 2 15 30 0.5%
c) Edgel map 3 30 45 0.2%
d) Edgel map (OCM) 0 45 36.2%
e) Coe�cients 20 50 13.5%
f) Threshold 0.18 0.30 36.3%
map in the compressed-domain of wavelet. For this parame-ters, half of the coe�cients are negative and the other halfpositive;f) threshold of the image contours: this parameter
is used to threshold the UCM image for indexing.Table 1, presents the lowest (L) and the highest (H) values
used to test each factor. Also Table 1 presents the relevance(R) of each factor in percentage.The two most important parameters in our set are the
threshold of the image contours and the radius size of theOCM similarity comparison, both summing more than 70%of impact in the precision. The variation on the number ofwavelet coe�cients from the lowest the highest value, andthe first radius of grown edge map presents each one almost14% of impact, while the variation of the second and thethird radius maps is almost insignificant.The di↵erence between our approach and the one pre-
sented in [9] is the addition of the oriented chamfer matchingfor comparing sketch and image contours. The e↵ect of thisparameter presented by the 2k Factorial Design indicatesimprovement in e↵ectiveness of our proposal over [9].
5.2 Parameter Tuning with genetic algorithmThe parameters described in Section 5.1 need to be well
setted in order to obtain the best performance of our ap-proach. Genetic algorithms are robust search and optimiza-tion techniques for finding the global optimum in a multi-modal landscape. In this section, we show how the para-meters of our approach were chosen in a genetic evolutionapproach [15].To set the parameters, due to the large number of exper-
iments, we used a small dataset, the Paris dataset with aground-truth for the sketches that we collected.The parameters and its intervals were chosen as described
in Section 5.1. We started the genetic algorithm with apopulation of one hundred random di↵erent configurationswhere each parameter had a random value between the low-est and highest limits (L and H) presented in Table 1. Oneach “evolution”, the best fitness solution was preserved forthe next iteration or “generation”. The other fitness solu-tions above the average were preserved for crossover andmutation, while solution fitness sub averaged were disrupted.Random best solutions in pairs were used to generate twonew solutions with crossover and mutation of parameters.For each hight fitness pair, with a probability Pc of 90%, weapplied a crossover in three of the six random parameters,and with a probability Pm of 10% we applied a mutation intwo random parameters inside the limits. Table 1 presentsthe lowest and highest limits (L and H) used in the mutationfor each parameter.To represent and evaluate each individual parameter con-
figuration in a single fitness value, we used the sub area
21
Some Results (Paris Dataset)
Clique para adicionar texto
under the precision⇥recall [5] curve of the 20 first results.We considered this sub area as the resume of the best rankposition as criteria because a good retrieval solution mustpresent the expected result in the very first positions.
Extensive experiments were conducted to achieve the bestrank of the proposed approach. More than 25 genetic gen-eration evolutions, each one with 100 individual set of pa-rameters were performed, what gave more than 2,500 ex-periments. Each experiment built one index solution andperformed on it 110 queries by sketch, i.e., in total 275,000queries.
5.3 Evaluation of sketch retrieval effectivenessand efficiency
The experiments on sketch retrieval used the best para-meters obtained by the genetic algorithm in average of 110queries. In the present approach, this configuration is re-spectively 14, 27, 45, 29, 46 and 0.18 for the parameters a,b, c, d, e and f, described in Section 5.1.
For the sketch-finder, we used the same parameters incommon with our approach. This configuration is respec-tively 14, 27, 45, 46, and 0.18 for the parameters: a, b, c, eand f presented in Section 5.1.
Regarding to the evaluation of the Mind-Finder, we ap-plied the same parameters to the steps in common with thesketch-finder and our approach, i.e., 256⇥256 of image reso-lution, same contour detection (UCM) and threshold = 0.18.For the radius r we experimented several configurations, be-tween 25 and 65, finding that r = 45 brings the best fitnesson 110 queries of the Paris dataset using the same criteriafor the fitness as in our approach. The selected parameterswere used as default for the Mind-Finder in the comparisonswith the sketch-finder and our approach.
The experiments were realized in a machine with CPUIntel Xeon X5670 with 2.93GHz and 72Gb of RAM memory.
For the e↵ectiveness evaluation, we considered the 20 bestrank position. According to the results, our approach over-come Sketch-Finder (SF) and Mind-Finder (MF) in terms ofe↵ectiveness as shown by the precision curve of the 20 bestrank positions presented in Fig. 8.
Figure 8: Best z ranked images.
Fig. 9 shows some queries performed by our approachusing the Paris dataset. The first image on each line isthe query image and the following images are the four bestranked images. Also, some of these results on the Parisdataset are available4.
To compare the e�ciency of the three approaches, we eva-luated the query CPU time, I/O and the size of the indexes.To evaluate the CPU and I/O of the queries, we used thesubset of the ImageNet with 75 queries.
4Queries results of our approach on the Paris Dataset –https://sites.google.com/site/sketchretrieval/
Figure 9: Query results for the Paris dataset.
Table 2: CPU: query time in seconds
Sketch-Finder Mind-Finder Our
CPU AVG 6.56 394.43 31.66
CPU SD 1.13 401.57 11.73
Table 3: I/O in bytes
Sketch-Finder Mind-Finder Our
I/O AVG 2.89 ⇥ 108 2.96 ⇥ 109 1.52 ⇥ 109
I/O SD 1.58 ⇥ 107 2.79 ⇥ 108 4.08 ⇥ 108
Table 4: Index size in bytes
Sketch-Finder Mind-Finder Our
2.14 ⇥ 109 2.61 ⇥ 1010 2.05 ⇥ 1011
Regarding to the CPU cost, we did not considered thetime of I/O used to retrieve the inverted file list of IDs.With this strategy we can simulate the CPU time withoutconsider I/O variations and main memory limitations. Thebenchmark of the CPU cost is presented in seconds in Ta-ble 2, with its average time (AVG) and standard deviation(SD) of the 75 queries.Regarding to the I/O, we measured it in bytes considering
the the size of all inverted lists of IDs used on each one ofthe 75 queries. Table 3 presents the average I/O and thestandard deviation.The index size of our approach is the bigger among the
evaluated approaches due to the storage of the preprocessedgown edgel maps used in the OCM similarity comparison,however, as advantage, this index makes possible a fasterquery than the approach of [3], see Table 2. The total indexsize of each approach is presented in Table 4.
6. CONCLUSIONThis work presented an approach for SBIR using both,
the compressed-domain and the pixel domain indexes. Thecompressed-domain index allows the comparison betweenthe sketch and the image dataset contours in a few set ofdata while the pixel domain is used to improve the precisionby applying a spatial pixel consistency verification using the
22
Effectiveness Precision vs. Recall
We used the VGG ground-truth for the Paris dataset and built one for the ImageNet.
The same sketches were used to evaluate Mind-
Finder and Sketch-Finder
23
Average - Precision vs. Recall (75)
23
24
Efficiency Comparison - CPU
25
Efficiency Comparison – I/O
26
Conclusion
Sketch-Finder: • The number of retrieved inverted files is reduced to
a small and fixed number; • The volume of indexed data is 5% of the Edgel-
Index; • The speed of retrieval is faster due to less amount of
data.
27
Future Work
Build an android Sketch-Finder application
28
Thank You!
Carlos Alberto Fraga Pimentel Filho fragapimentel@gmail.com