javascript is disabled, enable it otherwise the site will not work properly!

javascript est désactivé, activez-le sinon le site ne fonctionnera pas correctement !

Bayes classifier

In statistical classification, the Bayes classifier is the classifier having the smallest probability of misclassification of all classifiers using the same set of features.

Definition

Suppose a pair $(X,Y)$ takes values in $\mathbb {R} ^{d}\times \{1,2,\dots ,K\}$ , where $Y$ is the class label of an element whose features are given by $X$ . Assume that the conditional distribution of X, given that the label Y takes the value r is given by

where "

\sim

" means "is distributed as", and where

P_{r}

denotes a probability distribution.

A classifier is a rule that assigns to an observation X=x a guess or estimate of what the unobserved label Y=r actually was. In theoretical terms, a classifier is a measurable function $C:\mathbb {R} ^{d}\to \{1,2,\dots ,K\}$ , with the interpretation that C classifies the point x to the class C(x). The probability of misclassification, or risk, of a classifier C is defined as

The Bayes classifier is

In practice, as in most of statistics, the difficulties and subtleties are associated with modeling the probability distributions effectively—in this case, $\operatorname {P} (Y=r\mid X=x)$ . The Bayes classifier is a useful benchmark in statistical classification.

The excess risk of a general classifier $C$ (possibly depending on some training data) is defined as ${\mathcal {R}}(C)-{\mathcal {R}}(C^{\text{Bayes}}).$ Thus this non-negative quantity is important for assessing the performance of different classification techniques. A classifier is said to be consistent if the excess risk converges to zero as the size of the training data set tends to infinity.

Considering the components $x_{i}$ of $x$ to be mutually independent, we get the naive Bayes classifier, where

Properties

Proof that the Bayes classifier is optimal and Bayes error rate is minimal proceeds as follows.

Define the variables: Risk $R(h)$ , Bayes risk $R^{*}$ , all possible classes to which the points can be classified $Y=\{0,1\}$ . Let the posterior probability of a point belonging to class 1 be $\eta (x)=Pr(Y=1|X=x)$ . Define the classifier ${\mathcal {h}}^{*}$ as

Then we have the following results:

Proof of (a): For any classifier $h$ , we have

where the second line was derived through Fubini's theorem

Notice that $R(h)$ is minimised by taking $\forall x\in X$ ,

Therefore the minimum possible risk is the Bayes risk, $R^{*}=R(h^{*})$ .

Proof of (b):

Proof of (c):

Proof of (d):

General case

The general case that the Bayes classifier minimises classification error when each element can belong to either of n categories proceeds by towering expectations as follows.

This is minimised by simultaneously minimizing all the terms of the expectation using the classifier $h(x)=k,\quad \arg \max _{k}Pr(Y=k|X=x)$ for each observation x.