sigmoid function is used to squash your data between 0 and 1. Sigmoid is symmetric. It could take any number and squash it to look like a probability between 0 and 1.
\begin{equation} \sigma(z) = \frac{1}{1+ e^{-z}} \end{equation}
Say you have one discrete variable X, and one continuous variable Y, and you desire to express p(x|y). The simplest way to do this, of course, is to say something like:
\begin{equation} P(x^{j} \mid y) = \begin{cases} P(x^{j} \mid y) = 0, y < \theta \\ P(x^{j} \mid y) = 1, y > \theta \end{cases} \end{equation}
whereby if y is above or below a value, x^{j}|y behaves differently. But we often don’t want a card cap. To soften this, we can use a sigmoid model:
\begin{equation} P(x^{1} \mid y) = \frac{1}{1 + \exp \left(-2 \frac{y-\theta_{1}}{\theta_{2}}\right)} \end{equation}
whereby, \theta_{1} is where the threshold of activation is, and \theta_{2} is how soft you want the spread to be. The derivative of this function is also dead simple:
\begin{equation} \dv{\sigma(z)}{z} = \sigma(z) (1-\sigma(z)) \end{equation}