xavier initialization

An neural network initialization scheme that tries to avoid Vanishing Gradients.
Consider Wx step in a neural network:

\begin{equation} o_{i} = \sum_{j=1}^{n_{\text{in}}} w_{ij} x_{j} \end{equation}

The variance of this:

\begin{equation} \text{Var}\left[o_{i}\right] = n_{\text{in}} \sigma^{2} v^{2} \end{equation}

[[curator]]

I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?