Neural networks are a non-linear learning architecture that involves a combination of matrix multiplication and entry-wise non-linear operations. two layers constituents Consider a two layer neural network with: m hidden units d dimensional input x \in \mathbb{R}^{d} requirements \begin{align} &\forall j \in \left{1, \dots, m\right}\ &z_{j} = w_{j}^{(1)}^{T} x + b_{j}^{(1)}\ &a_{j} = \text{ReLU}\left(z_{j}\right) \ &a = \left(a_1, \dots, a_{m}\right)^{T} \in \mathbb{R}^{m} \ &h_{\theta} \left(x\right) = w^{(2)}^{T} a + b^{(2)} \end{align} z_{j} are hidden units, a_{j} are activated hidden units, h_{\theta} is the prediction function. vectorized two-layer constituents m hidden units per layer d input dimension requirements \begin{equation} W^{(1)} = \mqty[w_1^{(1)}^{T} \ \dots \ w_m^{(1)}^{T} ] \end{equation} which emits a m \times d matrix. So this gives:

\begin{equation} \mqty[z_1 \\ \dots \\ z_{M}] = \mqty[w_1^{(1)}^{T} \\ \dots \\ w_m^{(1)}^{T} ] \mqty[x_1 \\ \dots \\ x_{D}] + \mqty[b_1^{( 1 )} \\ \dots \\ b_m^{( 1 )}] \end{equation}

where z \in \mathbb{R}^{m \times 1}, w^{(1)} \in \mathbb{R}^{m \times d}, x \in \mathbb{R}^{d \times 1} , b^{(j)} \in \mathbb{R}^{m \times 1}. Writing this as matrix operations:

\begin{equation} z = w^{(1)} x + b^{(1)} \end{equation}

and

\begin{equation} a = \text{ReLU}\left(z\right) \end{equation}

with:

\begin{equation} h_{\theta}\left(x\right) = w^{(2)} a + b^{(2)} \end{equation}

multi-layer \begin{equation} a^{(1)} = \text{ReLU}\left(W^{(1)} x + b^{(1)}\right) \end{equation}

\begin{equation} a^{(2)} = ReLU\left(W^{(2)} a^{(1)} + b^{(2)}\right) \end{equation}

and so on…

\begin{equation} a^{(r-1)} = \text{ReLU}\left(W^{(r-1)} a^{(r-2)} + b^{(r-1)}\right) \end{equation}
\begin{equation} h_{\theta}\left(x\right) = W^{r} a^{r-1} + b^{r} \end{equation}

metadata total number of neurons: m_1 + m_2 + … + m_{r} number of parameters: \left(d+1\right) m_1 + \left(m_{1}+1\right)m_{2} + … + \left(m_{r-1}+1\right)m_{r} additional information Neural networks admit a local optima, and we cannot find a global optima. neuron Consider first a single neuron neural network in one dimension. For instance, let’s think of a slightly non-linear case first:

\begin{align} h_{\theta}\left(x\right) &= \max \left(wx+b, 0\right) \end{align}

it admits two parameters, \theta = \left(w, b\right) \in \mathbb{R}^{2}. Such a function is called relu function. What if we have multiple input features? Consider: x \in \mathbb{R}^{d}, w \in \mathbb{R}^{d}, and b \in \mathbb{R}. Now:

\begin{equation} h_{\theta} \left(x\right) = \text{ReLU}\left(w^{\top}x + b\right) \end{equation}

We refer to relu function as an “Activation Function”. neurons We can write latent units in terms of the input units, as well as parameters to weight them; for instance:

\begin{equation} a_1 = \text{ReLU}\left(\theta_{1} x_1 + \theta_{2} x_2 + \theta_{3}\right) \end{equation}

Instead of writing this directly, we can just make every neuron connected to every other neuron, resulting in:

\begin{equation} a_1 = \text{ReLU}\left(w_1^{T} x + b_1\right) \end{equation}
\begin{equation} a_2 = \text{ReLU}\left(w_2^{T} x + b_2\right) \end{equation}

and so on. why would the neurons learn different things? Because random initializations will give local minims, and each node would specialize. some activation functions see Activation Function see also Neural Networks see Neural Networks kernel methods vs deep learning Instead of using Kernel Trick and feature map to extract features yourselves, deep learning promises to learn the correct feature map after multiple, non-linear layers. Consider \beta as the parameters of a fully-connected neural network, then the final hypothesis function of a neural network is:

\begin{equation} h_{\theta} = W^{r} \phi_{\beta} \left(x\right) + b^{( r)} \end{equation}

in some sense, the entire damn neural network is a feature map for the final, linear output head. We can therefore think of training a neural network as automatically finding a feature map \phi_{\beta}, as well as learning a classifier for that feature map.

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?