Inteligência Artificial - Início

Inteligência Artificial

Sistemas com Múltiplos Classificadores

(Multiple Classifier Systems)

Prof. Fabio Augusto Faria1o semestre 2017

Tópicos

Motivação

Diversidade

Tipos de Sistemas com Múltiplos Classificadores (MCS)

Técnicas de MCS Abordagem proposta no doutorado

Tipos de seleção de classificadores e ensemble

Genetic Programming

Support Vector Machine

K-nearest neighbors

Motivação

Identificação de Regiões em ISR

Reconhecimento de Objetos

Reconhecimento Biométrico

*Google image

Motivação

“No Free Lunch” theorem;

LearningMethods

Features

*Google image

Motivação

“No Free Lunch” theorem; Alternativa está na fusão de informação.

+ +Fusion oflearningmethods

++Fusion ofFeatures

*Google image

Motivação

“No Free Lunch” theorem; Alternative is information fusion.

+ +

++

*Google image

Multiple Classifier Systems(MCS)

Fusion oflearningmethods

Fusion ofFeatures

* Ensemble or Stacking of Classifiers.

Motivação

Razões para usar MCS

Extracted from [1]

Treino com poucos exemplos Busca do ótimo global por diferentes caminhos

Não existe em H, uma hipótese h.

Diversidade

Mede o grau de concordância/discordância entre classificadores;

“Diversidade é uma propriedade essencial para MSC” [3];

Kuncheva mostra a relação de Diversidade e Eficácia em MCS;

Diversity is an elusive concept, not trivial to define and use in practice, Brown et al. 2005.

Is “useful diversity” a myth?, Kuncheva 2003.

“Diversity in Classifier Ensembles: Fertile Concept or Dead End?”, Roli et al. 2013.

Diversidade

G

Diversidade

G

C1

C2

C3

Accuracy

70%

70%

60%

Diversidade

G

C1

C2

C3

Accuracy

70%

70%

60%

Quais são os melhores candidatos para combinação?

Diversidade

G

C1

C2

C3

Accuracy

70%

70%

60%

Diversidade

G

C1

C2

C3

Accuracy

70%

70%

60%

D(C1,C2) = 2 D(C1,C3) = 0 D(C2,C3) = 1

Medidas de Diversidade

Relationship matrix

Double-fault Measure

Q-Statistic

Interrater Agreement k

Correlation Coefficient

Disagreement Measure

Extracted from [5]

Diversidade

Como obter diversidade?

Usar: (1) Mesmo modelo de aprendizagem com diferentes treinos

(2) Diferentes modelos de aprendizagem, mesmos treino

(3) Diferentes modelos de aprendizagem com diferentes tipos de saídas

(4) Saídas como atributos para treinar outro modelo de aprendizagem (meta-learning)

Arquiteturas MCS

Paralelo

Serial

Extracted from [4]

Votação Majoritária

Classificador 1

Classificador 2

Classificador 3

Classificador n

Classe Final (c2)ExemploTeste

Votos com mesno peso (1)

c1

c2

c3

c2

Votação Majoritária Ponderada

Classificador 1

Classificador 2

Classificador 3

Classificador n

Classe FinalExemploTeste

Votos com peso diferentes (confiança do classificador)

0.8

0.7

0.9

0.5

c1

c2

c3

c2

Bootstrap Agregating (Bagging)

Criado por L. Breiman em 1996; Utiliza abordagem “bootstrap” para criar diversidade entre os classificadores; Deixa modelos de apredizagem (e.g., DT, RNA) mais “estáveis“

Extracted from [5]

Extracted from [6]

Bootstrap Agregating (Bagging)

Random Forest

Criado por L. Breiman e A. Cutler em 2001;

Extensão do Bagging;

Utiliza abordagem “random subset of the features” para criar mais diversidade entre os classificadores;

Originalmente proposto para utilizar com árvores de decisão.

Floresta Randômica (RF)

Extracted from [6]

Adaptive Boosting (AdaBoost)

Extracted from [5]

Criado por Y. Freund and R. E. Schapire em 1996;

Arquitetura MCS serial;

Utiliza abordagem de ponderação de exemplos de treino para criar um classificador forte por meio de classificadores “fracos”

Extracted from [6]

// pesos para exemplos de treino

// Erro da hipótese i

Error-Correcting Output Codes (ECOC)

Criado por T. G. Dietterich e G. Bakiri em 1991;

Representa cada classe por um código binário (0s e 1s);

Utiliza uma abordagem OVA (One vs All) sob os classificadores para obter um código binários ;

Para realizar a classificação final, utiliza-se da distância de Hamming.

Error-Correcting Output Codes (ECOC)

Representação de cada classe por 15 dígitos; Note que um dos desafios está na própria representação. Podem existir colunas f (classificadores) irrelevantes.

Extracted from [7]

Meu doutorado!

Classifier = descriptor + learning method

Faria et al. (Journal PRL 2014)

Arcabouço Seleção e Fusão de Classificadores

Main steps to build the final classifier:

1) Classifiers training and validation

2) Classifiers selection

3) SVM meta-learning classification

12 3



Dataset



Dataset

Training Set (T)

Validation Set (V)


training set (T) to train the classifier

validation set (V) to compute the classifier opinion for each sample→→




Selection Process



Input

Output

SELECTION

Selection Process


SVM “Features”

We use the selected classifier opinions as SVM learning input!



?Faria et al. (Journal PRL 2014)


Base classifier opinion

...



Final class

...








Diversidade (relembrar)

Comparação

Comparison Table

Diversity in MCSAcronym Diversity Classifier Advantage Disadvantage

Bagging (1) simple Simple;Noise tolerant.

Unstable

Boosting (1)*

*Weighted training set

weak Usually gets better results in many applications.

Overfitting for the existence of noisehigh time cost

MV (2) simple Simple. High dependence on the good accuracy of simple classifiers

WMV (2) simple Simple; Decreases the influence of the “bad” classifiers.

High dependence on the good accuracy of simple classifiers

ECOC (1)*

*Manipulate output values

binary If the minimum Hamming distance is d, then the Code can correct at least (d-1)/2 single bit errors.

Not appropriate for large number of classes

RSM/RT (1)*

*Different features

any Avoid curse of dimensionality in large feature.

How to define the dimensionality of the subspace

Ourwork

(2) and (4) any Simple; Flexible (feature, learning, and fusion); Classifier selection (time).

How to define trade off between learning and feature to get enough diversity

Comparação






Diversidade (relembrar)

Comparison Table

Diversity in MCSAcronym Diversity Classifier Advantage Disadvantage

Bagging (1) simple Simple;Noise tolerant.

Unstable

Boosting (1)*

*Weighted training set

weak Usually gets better results in many applications.

Overfitting for the existence of noisehigh time cost

MV (2) simple Simple. High dependence on the good accuracy of simple classifiers

WMV (2) simple Simple; Decreases the influence of the “bad” classifiers.

High dependence on the good accuracy of simple classifiers

ECOC (1)*

*Manipulate output values

binary If the minimum Hamming distance is d, then the Code can correct at least (d-1)/2 single bit errors.

Not appropriate for large number of classes

RSM/RT (1)*

*Different features

any Avoid curse of dimensionality in large feature.

How to define the dimensionality of the subspace

Ourwork

(2) and (4) any Simple; Flexible (feature, learning, and fusion); Classifier selection (time).

How to define trade off between learning and feature to get enough diversity

Seleção de Classificador(es)

Extracted from [2]

(a) Seleção Estática de Ensemble;(b) Seleção dinâmica de Classificadores;(c) Seleção dinâmica de Ensemble;

Técnicas de Seleção

1) Selection based on Consensus;

1) Selection based on Kendall Correlation;

1) Selection based on Rank Aggregation.

Selection based on Consensus

Compute diversity measures and create R lists.


R lists sorted by diversity measure scores.


tR lists with top t.

Top t


Counts the number of occurrences of each classifier.


Select the classifiers |C*| that satisfy a defined threshold T.



Selection based on Kendall Correlation


Consider two diversity measures among all available diversity measures.

Kendall tau rank correlation

Measure the association between two ranked lists. Similarity of the orderings of the data when ranked by each of the lists.

are two ranked lists is the number of concordant pairs

is the number of discordant pairs is the number of positions in the ranked lists

Kendall tau rank correlation (example)

Make different ranked lists.


Assign an identification to each pair of classifiers.



Compute kendall tau for each pair of diversity measures.


a) Validation matrix MV ;

a) R lists sorted by diversity measures scores;

a) Select ranked lists to be used in the next step;

a) R lists with top t;

a) Counts the number of occurrences of each classifier;

a) Select the classifiers |C*| that satisfy a defined threshold T.

t


Selection based on Rank Aggregation

Combination of Diversity Measures


Combination of Evaluation Measures



Final Ranking of Pairs of Classifiers

Combination of Evaluation Measures


* normalized value in the range [0,1].


Referências

1. T. G. Dietterich, Ensemble Methods in Machine Learning, in: Proceedings of the 1st International Workshop on Multiple Classifier Systems, Springer, London, UK, 2000, pp. 1–15.

2. A.H.R. Ko, R. Sabourin, A.S. Britto Jr., From dynamic classifier selection to dynamic ensemble selection, Pattern Recognit. 41 (5) (2008) 1718–1731.

3. L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, 2004.

4. M Woźniak, M Graña, E Corchado. A survey of multiple classifier systems as hybrid systems. Information Fusion 16, 3-17

5. F. A. Faria. A framework for Pattern Classifier Selection and Fusion. IC-UNICAMP's Thesis, 2014.

6. M. Ponti. Combining Classifiers: from the creation of ensembles to the decision fusion. Tutorial at SIBGRAPI 2010.

7. Dietterich, T. G., Ghulum Bakiri . Solving Multiclass Learning Problems via Error-Correcting Output Codes, 1995.

Inteligência Artificial - Início

Documents

Transcript of Inteligência Artificial - Início