Naive Bayes

constituents x \in \left\{0,1\right\}^{m} for m features y \in \left\{0,1\right\} for labels \phi_{j|y=1}, for p\left(x_{j} = 1 | y=1\right) \phi_{j|y=0}, for p\left(x_{j} = 1 | y=0\right) \phi_{y}, for p\left(y=1\right), the “cost prior” assumption ASSUME: features in x are conditionally independent given y
That is, we assume that:

\begin{align} p\left(x|y\right) &= p\left(x_1, x_2, \dots, x_{1000} | y\right) \\ &= p\left(x_1|y\right) p\left(x_2|y, x_1\right) p\left(x_3|y, x_1, x_2\right) \dots p\left(x_{1000}|y, x_1, \dots, x_{999}\right) \end{align}

This is insane to compute! But if we assume all x slots are conditionally independent, we write:

\begin{equation} p\left(x|y\right) = p\left(x_1|y\right) p\left(x_2|y\right) \dots = \prod_{j=1}^{n} p\left(x_{j}|y\right) \end{equation}

requirements To figure out best parameters for Maximum Likelihood Parameter Learning:

\begin{equation} \mathcal{L}\left(\phi\right) = \prod_{i=1}^{n} p\left(x^{(i)}, y^{(i)} \mid \phi\right) \end{equation}

You get exactly what you expect:

\begin{equation} \phi_{y} = p\left(y=1\right) = \frac{\sum_{i=1}^{n} 1\left\{y^{(i)}=1\right\}}{n} \end{equation}

\begin{equation} \phi_{j|y=1} = p\left(x_{j} = 1 | y=1\right)= \frac{\sum_{i=1}^{n}1 \left\{x_{j}^{(i)} =1, y^{(i)}= 1\right\}}{\sum_{i=1}^{n} 1\left\{y^{(i)}=1\right\}} \end{equation}

\begin{equation} \phi_{j|y=0} = p\left(x_{j} = 1 | y=0\right)= \frac{\sum_{i=1}^{n}1 \left\{x_{j}^{(i)} =1, y^{(i)}= 0\right\}}{\sum_{i=1}^{n} 1\left\{y^{(i)}=0\right\}} \end{equation}

and you can just check if p\left(y|x\right) more likely using Bayes rule.
additional information pseudocounting One problem with this approach is that it won’t handle OOD text that well. In particular, suppose you never see a particular feature being 1:

\begin{equation} p\left(x_{k} | y= 1\right) = 0 \end{equation}

for some k. So in practice, we actually estimate probability to add pseudocounts:
Laplace Smoothing \begin{equation} \phi_{j|y=1} = p\left(x_{j} = 1 | y=1\right)= \frac{1+\sum_{i=1}^{n}1 \left{x_{j}^{(i)} =1, y^{(i)}= 1\right}}{2+\sum_{i=1}^{n} 1\left{y^{(i)}=1\right}} \end{equation}

\begin{equation} \phi_{j|y=0} = p\left(x_{j} = 1 | y=0\right)= \frac{1+ \sum_{i=1}^{n}1 \left\{x_{j}^{(i)} =1, y^{(i)}= 0\right\}}{2+ \sum_{i=1}^{n} 1\left\{y^{(i)}=0\right\}} \end{equation}