Lesson 4: Decision Trees

Aprendizagem Computacional Gladys Castillo, UA

Bayesian Networks Classifiers

Part I – Naive Bayes

Aprendizagem Computacional Gladys Castillo, U.A. Aprendizagem Computacional Gladys Castillo, U.A.2

The Supervised Classification Problem

A classifier is a function f: X C that assigns a class label c C ={c1,…, cm} to objects described by a

set of attributes X ={X1, X2, …, Xn} X

Supervised Learning Algorithm

hC C(N+1)

Inputs: Attribute values of x(N+1) Output: class of x(N+1)

Classification Phase: the class attached to x(N+1) is c(N+1)=hC(x(N+1)) C

Given: a dataset D with N labeled examples of <X, C>Build: a classifier, a hypothesis hC: X C that can correctly predict the class labels of new objects

Learning Phase:

c(N)<x(N), c(N)>

………

c(2)<x(2), c(2)>

c(1)<x(1), c(1)>

CCXXnn……XX11DD

c(N)<x(N), c(N)>

………

c(2)<x(2), c(2)>

c(1)<x(1), c(1)>

CCXXnn……XX11DD)1(

1x )1(nx

)2(1x )2(

Nx )( Nnx

Nx )1(2

Nx )1( Nnx…

attribute X2

give credit

don´t give

credit

Statistical Classifiers Treat the attributes X= {X1, X2, …, Xn} and

the class C as random variables A random variable is defined by

the probability density function

Give the probability P(cj | x) that x

belongs to a particular class rather than a simple classification

Probability density function of a random variable and few observations

)(xf )(xf

instead of having the map X C , we have X P(C| X)

The class c* attached to an example is the class with bigger P(cj|

Bayesian Classifiers

Bayesian because the class c* attached to an example x is determined by the Bayes’ TheoremP(X,C)= P(X|

C).P(C)P(X,C)= P(C|

X).P(X)Bayes theorem is the main tool in Bayesian inference

)|()()|(

CXPCPXCP

We can combine the prior distribution and the likelihood of the observed data in order to derive the posterior

distribution.

Bayes TheoremExample

Given: A doctor knows that meningitis causes stiff neck 50% of

the time Prior probability of any patient having meningitis is

1/50,000 Prior probability of any patient having stiff neck is 1/20

If a patient has stiff neck, what’s the probability he/she has meningitis?0002.0

20/150000/15.0

)()()|(

)|( SP

MPMSPSMP

likelihood

posterior prior x likelihood

)|()()|(

CXPCPXCP

Before observing the data, our prior beliefs can be expressed

in a prior probability distribution that represents the knowledge we have about the

unknown features. After observing the data our

revised beliefs are captured by a posterior distribution over

the unknown features.

)|()()|(

cPcPcP jj

Bayesian Classifier

Bayes Theorem

Maximum a posteriori

classification

How to determine P(cj | x) for each class cj ?

P(x) can be ignored because it is the same for all the classes (a

normalization constant)

)|()(max)(1

...mjBayes cPcParghc xx

Classify x to the class which has bigger posterior

probability

Bayesian Classificatio

n Rule

“Naïve” because of its very naïve independence assumption:

Naïve Bayes (NB) Classifier

all the attributes are conditionally independent given the class

Duda and Hart (1973); Langley (1992)

P(x | cj) can be decomposed into a product

of n terms, one term for each attribute

“Bayes” because the class c* attached to an example x is

determined by the Bayes’ Theorem

)|()(max)(1

...mjBayes cPcParghc xx

when the attribute space is high dimensional direct estimation is hard unless

we introduce some assumptions

...mjNB cxXPcParghc

* )|()(max)(xNB

Classification Rule

Xi continuous The attribute is discretized and then treats as a discrete

attribute A Normal distribution is usually assumed

2. Estimate P(Xi=xk |cj) for each value xk of the attribute Xi and for each class cj

Xi discrete

1. Estimate P(cj) for each class cj

Naïve Bayes (NB)Learning Phase (Statistical Parameter Estimation)

),;()|( ijijkjki xgcxXP

Given a training dataset D of N labeled examples (assuming complete data)

j )(ˆ

ijkjki N

NcxXP )|(ˆ

1),;( e

Nj - the number of examples of the class cj

Nijk - number of examples of the class cj

having the value xk for the attribute Xi

Two options

The mean ij e the standard deviation ij are estimated from D

2. Estimate P(Xi=xk |cj) for a value of the attribute Xi and for each class cj

For real attributes a Normal distribution is usually assumed

Xi | cj ~ N(μij, σ2ij) - the mean ij e the standard deviation ij are estimated from D

Continuous Attributes Normal or Gaussian Distribution

For a variable X ~ N(74, 36), the probability of observing the value 66 is given by: f(x) = g(66; 74, 6) = 0.0273

ensity

bability

- 2 323

N(0,2)

- 2 323

N(0,2)

f(x) is symmetrical around its mean

),;()|( ijijkjki xgcxXP 2

1),;( e

1. Estimate P(cj) for each class cj

2. Estimate P(Xi=xk |cj) for each value of X1 and each class cj

X1 discrete X2 continuos

Naïve Bayes Probability Estimates

)101 1.73 ()|( 2 .,x;gCxXP

3)(ˆ CP

2)|(ˆ CaXP i

Class X1 X2

+ a 1.0

+ b 1.2

+ a 3.0

- b 4.4

- b 4.5

Two classes: + (positive) e - (negative)

Two attributes: X1 – discrete which takes values a e b X2 – continuos

Binary Classification Problem

1)|(ˆ CbXP i

2)(ˆ CP

)070 4.45 ()|( 2 .,x;gCxXP 2

0)|(ˆ CaXP i 2

2)|(ˆ CbXP i

2+= 1.73, 2+ = 1.10

2-= 4.45, 22+ = 0.07

Example from John & Langley (1995)

Probability Estimates Discrete Attributes

for each class: P(No) = 7/10 P(Yes) = 3/10

for each attribute value and class:

Examples:

P(Status=Married|No) = 4/7

P(Refund=Yes|Yes)=0

Tid Refund Marital Status

Taxable Income Evade

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes 10

Adapted from © Tan,Steinbach, Kumar, Introduction to Data Mining book

j )(ˆ

ijkjki N

NcxXP )|(ˆ

To compute Nijk we need to count the

number of examples of the class cj

having the value xk for the attribute Xi

for each pair attribute-class (Xi, cj)

Example for (Income, Class=No)

If Class=No sample mean = 110 sample standard deviation =

0072.0)54.54(2

1)|120( )2975(2

)110120( 2

eNoIncomeP

Probability Estimates Continuos Attributes

Tid Refund Marital Status

Taxable Income Evade

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes 10

Adapted from © Tan,Steinbach, Kumar, Introduction to Data Mining book

1) ()|( e ij

ijijkjki ,;xgcxXP

The Balance-scale Problem

Each example has 4 numerical attributes: the left weight (Left_W) the left distance (Left_D) the right weight (Right_W) right distance (Right_D)

The dataset was generated to model psychological experimental results

Left_W

Each example is classified into 3 classes: the balance-scale:

tip to the right (Right)tip to the left (Left)is balanced (Balanced)

3 target rules:If LD x LW > RD x RW tip to the leftIf LD x LW < RD x RW tip to the rightIf LD x LW = RD x RW it is balanced

dataset from the UCI repository

Right_W

Left_D Right_D

The Balance-scale Problem

Left_W Left_D

Right_W Right_D

Balanced2643

LeftLeft_W_W

BalanceBalance--Scale DataSetScale DataSet

............

Left235

Right245

ClassClassRightRight_D_DRightRight_W_WLeftLeft_D_D

Balanced2643

LeftLeft_W_W

............

Left235

Right245

Adapted from © João Gama’s slides “Aprendizagem Bayesiana”

Discretization is applied: each attribute is mapped to 5 intervals

Balanced2643

LeftLeft_W_W

............

Left235

Right245

Balanced2643

LeftLeft_W_W

............

Left235

Right245

The Balance-scale Problem Learning Phase

Contingency tables

Contingency Tables

AttributeAttribute: : LeftLeft_ W_ W

2534496686RightRight

9108810BalancedBalanced

7271614214LeftLeft

I5I5I4I4I3I3I2I2I1I1ClassClass

AttributeAttribute: : LeftLeft_ W_ W

7271614214LeftLeft

AttributeAttribute: : LeftLeft_ D_ D

7770593816LeftLeft

AttributeAttribute: : LeftLeft_ D_ D

7770593816LeftLeft

AttributeAttribute: : RightRight_ W_ W

2833496387LeftLeft

AttributeAttribute: : RightRight_ W_ W

2833496387LeftLeft

AttributeAttribute: : RightRight_ D_ D

2535446591LeftLeft

AttributeAttribute: : RightRight_ D_ D

2535446591LeftLeft

LeftLeft

Classes Classes CountersCounters

56526045

TotalTotalRightRightBalancedBalanced

LeftLeft

Classes Classes CountersCounters

56526045

TotalTotalRightRightBalancedBalanced

565 examples

Assuming complete data, the computation of all the required estimates requires a simple scan through the data, an operation of time

complexity O(N n), where N is the number of training examples.

We need to estimate the posterior probabilities P(cj | x) for each class

How NB classifies this example?

The class for this example is the class which has bigger posterior probability

P(Left|x) P(Balanced|x)

P(Right|x)

0.277796 0.135227 0.586978Class = Right

P(cj |x) = P(cj ) x P(Left_W=1 | cj) x P(Left_D=5 | cj )

x P(Right_W=4 | cj ) x P(Right_D=2 | cj ),

cj {Left, Balanced, Right}

The class counters and contingency tables are used to compute the

posterior probabilities for each class

LeftLeft_W_W

Right245

LeftLeft_W_W

Right245

The Balance-scale Problem Classification Phase

...mjNB cxXPcParghc

* )|()(max)(x

Iris Dataset From the statistician Douglas

Fisher Three flower types (classes):

Setosa Virginica Versicolour

Four continuous attributes Sepal width and length Petal width and length

Virginica. Robert H. Mohlenbrock. USDA NRCS. 1995. Northeast wetland flora: Field office guide to plant species. Northeast National Technical Center, Chester, PA. Courtesy of USDA NRCS Wetland Science Institute.

dataset from the UCI repository

Iris Dataset

Scatter Plot Array of Iris Attributes

The attributes petal width and

petal length provide a moderate

separation of the Irish species

Naïve Bayes Iris Dataset – Continuous Attributes

Normal Probability Density Functions

Attribute: PetalWidth

P(PetalWidth|Setosa)

P(PetalWidth|Versicolor)

P(PetalWidth|Virginica)

Class Iris-setosa mean: 0.244, standard deviation: 0.107

Class Iris-versicolor mean: 1.326, standard deviation: 0.198

Class Iris-virginica mean: 2.026, standard deviation: 0.275

Class Iris-versicolor (0.327): ==============================Attribute sepallength mean: 5.936, standard deviation: 0.516Attribute sepalwidth mean: 2.770, standard deviation: 0.314Attribute petallengthmean: 4.260, standard deviation: 0.470Attribute petalwidthmean: 1.326, standard deviation: 0.198

Class Iris-setosa (0.327): ========================== Attribute sepallength mean: 5.006, standard deviation: 0.352Attribute sepalwidth mean: 3.418, standard deviation: 0.381Attribute petallength mean: 1.464, standard deviation: 0.174Attribute petalwidth mean: 0.244, standard deviation: 0.107

Class Iris-virginica (0.327): =============================Attribute sepallength mean: 6.588, standard deviation: 0.636Attribute sepalwidthmean: 2.974, standard deviation: 0.322Attribute petallengthmean: 5.552, standard deviation: 0.552Attribute petalwidthmean: 2.026, standard deviation: 0.275

Classification Phase

maximum values

Estimate P(cj | x) for each class:

P(setosa|x) P(versicolor|x)

P(virginica|x)

0 0.995 0.005Classe = versicolor

P(cj |x) = P(cj ) x P(sepalLength =5 | cj) x P(sepalWidth =3 | cj ) x P(petalLength =2 | cj ) x P(petalWidth =2 | cj ), cj {setosa, versicolor, virgínica}

How NB classifies this example?

The class for this example is the class which has bigger posterior probability

Estimate P(cj | x) for the versicolor class Class Iris-versicolor (0.327): ==============================Attribute sepallength mean: 5.936, standard deviation: 0.516Attribute sepalwidth mean: 2.770, standard deviation: 0.314Attribute petallengthmean: 4.260, standard deviation: 0.470Attribute petalwidthmean: 1.326, standard deviation: 0.198 P(versicolor |x) = P(versicolor ) x P(sepalLength = 5 | versicolor )

x P(sepalWidth =3 | versicolor) x P(petalLength =2 | versicolor) x P(petalWidth =2 | versicolor )

P(versicolor |x) = 0.327 x g(5; 5.936, 0.516) x g(3; 2.770, 0.314) x g(2; 4.260, 0.470) x g(2; 1.326, 0.198)

...mjNB cxXPcParghc

* )|()(max)(x

Naïve Bayes Continuous Attributes - Discretization

For continuos attributes

After BinDiscretization, bins=3

Class Iris-versicolor (0.327): ==============================Attribute sepallength range1: 0.220 range2: 0.720 range3: 0.060 Attribute sepalwidth range1: 0.460 range2: 0.540 range3: 0.000Attribute petallength range1: 0.000 range2: 0.960 range3: 0.040Attribute petalwidth range1: 0.000 range2: 0.980 range3: 0.020

Class Iris-setosa (0.327): ========================== Attribute sepallength range1: 0.940 range2: 0.060 range3: 0.000 Attribute sepalwidth range1: 0.020 range2: 0.720 range3: 0.260Attribute petallength range1: 1.000 range2: 0.000 range3: 0.000Attribute petalwidth range1: 1.000 range2: 0.000 range3: 0.000

Class Iris-virginica (0.327): =============================Attribute sepallength range1: 0.020 range2: 0.640 range3: 0.340 Attribute sepalwidthrange1: 0.380 range2: 0.580 range3: 0.040 Attribute petallengthrange1: 0.000 range2: 0.120 range3: 0.880 Attribute petalwidthrange1: 0.000 range2: 0.100 range3: 0.900

Class Iris-setosa (0.327):========================== Attribute sepallength range1: 0.940 range2: 0.060 range3: 0.000 Attribute sepalwidth range1: 0.020 range2: 0.720 range3: 0.260Attribute petallength range1: 1.000 range2: 0.000 range3: 0.000Attribute petalwidth range1: 1.000 range2: 0.000 range3: 0.000Class Iris-versicolor (0.327): ==============================Attribute sepallengthrange1: 0.220 range2: 0.720 range3: 0.060 Attribute sepalwidth range1: 0.460 range2: 0.540 range3: 0.000Attribute petallength range1: 0.000 range2: 0.960 range3: 0.040Attribute petalwidth range1: 0.000 range2: 0.980 range3: 0.020Class Iris-virginica (0.327): =============================Attribute sepallength range1: 0.020 range2: 0.640 range3: 0.340 Attribute sepalwidthrange1: 0.380 range2: 0.580 range3: 0.040Attribute petallengthrange1: 0.000 range2: 0.120 range3: 0.880 Attribute petalwidthrange1: 0.000 range2: 0.100 range3: 0.900

Class Probabilities

Setosa Versicolor Virgínica

0.327 0.327 0.327

Atributo Sepallength

Range1

Range2

Range3

Setosa 0.940 0.060 0.000

Versicolor 0.220 0.720 0.060

Virginica 0.020 0.640 0.340

Atributo Sepalwidth

Range1

Range2

Range3

Setosa 0.020 0.720 0.260

Versicolor 0.460 0.540 0.000

Virginica 0.380 0.580 0.040

We can build a conditional probability table (CPT) for each attribute

Class Iris-setosa (0.327):========================== Attribute sepallength range1: 0.940 range2: 0.060 range3: 0.000 Attribute sepalwidth range1: 0.020 range2: 0.720 range3: 0.260Attribute petallength range1: 1.000 range2: 0.000 range3: 0.000Attribute petalwidth range1: 1.000 range2: 0.000 range3: 0.000Class Iris-versicolor (0.327): ==============================Attribute sepallength range1: 0.220 range2: 0.720 range3: 0.060 Attribute sepalwidth range1: 0.460 range2: 0.540 range3: 0.000Attribute petallength range1: 0.000 range2: 0.960 range3: 0.040Attribute petalwidth range1: 0.000 range2: 0.980 range3: 0.020Class Iris-virginica (0.327): =============================Attribute sepallength range1: 0.020 range2: 0.640 range3: 0.340 Attribute sepalwidthrange1: 0.380 range2: 0.580 range3: 0.040Attribute petallengthrange1: 0.000 range2: 0.120 range3: 0.880 Attribute petalwidthrange1: 0.000 range2: 0.100 range3: 0.900

Atributo Petallength

Range1

Range2

Range3

Setosa 1.000 0.000 0.000

Versicolor 0.000 0.960 0.040

Virginica 0.000 0.100 0.900

Atributo Petalwidth

Range1

Range2

Range3

Setosa 1.000 0.000 0.000

Versicolor 0.000 0.980 0.020

Virginica 0.000 0.100 0.900

Naïve Bayes Iris Dataset – Discretized Attributes

Classification (Implementation) Phase

maximum values

Discretized Examples

Naïve Bayes Classification Phase

How to classify this example ?

P(setosa|x) = P(setosa ) x P(sepalLength = r1 | setosa ) x P(sepalWidth =r2 |

setosa) x P(petalLength = r1 | setosa) x P(petalWidth =r3 | setosa )

P(versicolor|x) = P(versicolor) x P(sepalLength = r1 | versicolor ) x P(sepalWidth =r2 | setosa) x P(petalLength = r1 | versicolor)

x P(petalWidth =r3 | versicolor )

We need to compute the conditional posterior probabilities for each class

sepallengt

sepalwidth

petallength

r1 r2 r1 r3

Example with discretized attributes

sepallengt

sepalwidth

petallength

5 3 2 2

Class Probabilities

0.327 0.327 0.327

Class Probabilities

0.327 0.327 0.327

Range1 Range2 Range3

Setosa 0.940 0.060 0.000

Versicolor 0.220 0.720 0.060

Virginica 0.020 0.640 0.340

Setosa 0.940 0.060 0.000

Versicolor 0.220 0.720 0.060

Virginica 0.020 0.640 0.340

Atributo Sepalwidth

Setosa 0.020 0.720 0.260

Versicolor 0.460 0.540 0.000

Virginica 0.380 0.580 0.040

Atributo Sepalwidth

Setosa 0.020 0.720 0.260

Versicolor 0.460 0.540 0.000

Virginica 0.380 0.580 0.040

Setosa 1.000 0.000 0.000

Versicolor 0.000 0.960 0.040

Virginica 0.000 0.100 0.900

Setosa 1.000 0.000 0.000

Versicolor 0.000 0.960 0.040

Virginica 0.000 0.100 0.900

Atributo Petalwidth

Setosa 1.000 0.000 0.000

Versicolor 0.000 0.980 0.020

Virginica 0.000 0.100 0.900

Atributo Petalwidth

Setosa 1.000 0.000 0.000

Versicolor 0.000 0.980 0.020

Virginica 0.000 0.100 0.900

petallengthpetallength

sepallengtsepallengt

petallengthpetallengthsepalwidthsepalwidth

Class Probabilities

0.327 0.327 0.327

Class Probabilities

0.327 0.327 0.327

Setosa 0.940 0.060 0.000

Versicolor 0.220 0.720 0.060

Virginica 0.020 0.640 0.340

Setosa 0.940 0.060 0.000

Versicolor 0.220 0.720 0.060

Virginica 0.020 0.640 0.340

Atributo Sepalwidth

Setosa 0.020 0.720 0.260

Versicolor 0.460 0.540 0.000

Virginica 0.380 0.580 0.040

Atributo Sepalwidth

Setosa 0.020 0.720 0.260

Versicolor 0.460 0.540 0.000

Virginica 0.380 0.580 0.040

Setosa 1.000 0.000 0.000

Versicolor 0.000 0.960 0.040

Virginica 0.000 0.100 0.900

Setosa 1.000 0.000 0.000

Versicolor 0.000 0.960 0.040

Virginica 0.000 0.100 0.900

Atributo Petalwidth

Setosa 1.000 0.000 0.000

Versicolor 0.000 0.980 0.020

Virginica 0.000 0.100 0.900

Atributo Petalwidth

Setosa 1.000 0.000 0.000

Versicolor 0.000 0.980 0.020

Virginica 0.000 0.100 0.900

Class Probabilities

0.327 0.327 0.327

Class Probabilities

0.327 0.327 0.327

Setosa 0.940 0.060 0.000

Versicolor 0.220 0.720 0.060

Virginica 0.020 0.640 0.340

Setosa 0.940 0.060 0.000

Versicolor 0.220 0.720 0.060

Virginica 0.020 0.640 0.340

Atributo Sepalwidth

Setosa 0.020 0.720 0.260

Versicolor 0.460 0.540 0.000

Virginica 0.380 0.580 0.040

Atributo Sepalwidth

Setosa 0.020 0.720 0.260

Versicolor 0.460 0.540 0.000

Virginica 0.380 0.580 0.040

Setosa 1.000 0.000 0.000

Versicolor 0.000 0.960 0.040

Virginica 0.000 0.100 0.900

Setosa 1.000 0.000 0.000

Versicolor 0.000 0.960 0.040

Virginica 0.000 0.100 0.900

Atributo Petalwidth

Setosa 1.000 0.000 0.000

Versicolor 0.000 0.980 0.020

Virginica 0.000 0.100 0.900

Atributo Petalwidth

Setosa 1.000 0.000 0.000

Versicolor 0.000 0.980 0.020

Virginica 0.000 0.100 0.900

If all the class probabilities are zero we can not determine the class for this example

Naïve Bayes Laplace Correction

Nijk number of examples in D such that

Xi = xk and C = cj Nj number of examples in D of class cj

k number of possible values of Xi

Setosa 0.940 0.060 0.000

Versicolor 0.220 0.720 0.060

Virginica 0.020 0.640 0.340

Setosa 0.940 0.060 0.000

Versicolor 0.220 0.720 0.060

Virginica 0.020 0.640 0.340

To avoid zero probabilities due to

zero counters we can implement the

Laplace correction

ijkjki N

NcxXP )|(ˆ

ijkjki

1)|(ˆ

To calculate the conditional probabilities, instead of using the estimate

To use the Laplace correction:

Bayesian SPAM Filters Binary Classification Problem

57 continuous attributes Some word frequencies Some character frequencies Capital letters frequency

Two classes 0 – is not SPAM 1 – is SPAM

Spambase dataset from the UCI repository

Bayesian SPAM Filters Implementation in RapidMiner

Discretization Method

Learning

Testing

Feature Subset Selection Method

Evaluation Method

In this dataset there are no missing values; otherwise we need first to use a method for replace the missing values

Bayesian SPAM Filters Confussion Matrix

SPAM Precision– percentage of e-mails classified as SPAMs which truly are

SPAM Recall – percentage of e-mails classified as SPAMs over the total of examples that are SPAM

sensibilidad

TPTPRrecall

TPprecision

Concept Learning Problem: Is an email a

True Positive = number of examples classified as positive which truly are

False Positive = number of examples classified as positive which are negative

False Negative = number of examples classified as negative which are positive

Experimental Results with continuous

attributes discretization after FSS discretization before FSS

1. Feature Selection (CFS) 2. Genetic Algorithm (CFS) 3. Wrapper 4. Feature Selection (CFS)Minimal Entropy Discretization 5. Genetic Algorithm (CFS)Minimal Entropy Discretization

6. Feature Selection (CFS)Frequency Discretization 7. WrapperMinimal Entropy Discretization 8. Minimal Entropy DiscretizationFeature Selection (CFS) 9. Minimal Entropy DiscretizationGenetic Algorithm (CFS) 10. Minimal Entropy DiscretizationWrapper

o Legenda – Ordem dos operadores de pré-processamento utilizados em cada caso:

NB is one of more simple and effective classifiers

NB has a very strong unrealistic independence assumption:

Naïve Bayes Performance

all the attributes are conditionally independent given the value of class

Bias-variance decomposition of Test Error Naive Bayes for Nursery Dataset

10.00%

12.00%

14.00%

16.00%

18.00%

500 2000 3500 5000 6500 8000 9500 11000 12500

# Examples

Variance

Bias in practice: independence assumption

is violated HIGH BIAS

it can lead to poor classification

However, NB is efficient due to its high

variance management

less parameters LOW VARIANCE

reducing the bias resulting from the modeling error by relaxing the attribute independence assumption

one natural extension: Bayesian Network Classifiers

Improving Naïve Bayes reducing the bias of the parameter estimates

by improving the probability estimates computed from data

Web and Pazzani (1998) - “Adjusted probability naive Bayesian induction” in LNCS v 1502

J. Gama (2001, 2003) - “Iterative Bayes”, in Theoretical Computer Science, v. 292

Friedman, Geiger and Goldszmidt (1998) “Bayesian Network Classifiers” in Machine Learning, 29 Pazzani (1995) - “Searching for attribute dependencies in Bayesian Network Classifiers” in Proc.

of the 5th Workshop of Artificial Intelligence and Statistics Keogh and Pazzani (1999) - “Learning augmented Bayesian classifiers…”, in Theoretical

Computer Science, v. 292Rele

Lesson 4: Decision Trees

Education

Transcript of Lesson 4: Decision Trees

Pre_Intermediary_ Revision - Lesson 4

Bom lesson plans_module_6

Vocabulary lesson 4

Lesson Learned #7

SAD - Decision Support System

Grade 4 Module 3 Lesson 5 COMPLETE.gwb - 1/15 - Tue Nov 11 ...€¦ · Grade 4 Module 3 Lesson 5_COMPLETE.gwb - 14/15 - Tue Nov 11 2014 08:18:34. Grade 4 Module 3 Lesson 5_COMPLETE.gwb

13 janvier lesson plan

Para Lesson(The Family)

Apresentação r Star-trees

Decision making in exponential times

Mata atlantica - croton trees (euphorbiaceae)

Para Lesson#2(Town)

Sao paulo - bofete, riparian juvenile trees

Decision Report 25

Lesson Plan 10_sua

Lesson Learned #5

HydroExpert Decision Support System

English lesson

Outros métodos - FCUPines/aulas/0910/MIM/aulas/indutivos.pdf · Outros métodos Árvores de decisão (decision trees) Clusterização (agrupamento - clustering) Baseados em explicação

Lesson Learned #6