Types of Activation Functions

What is an Activation Function?

In the neural network activation function plays an important role. The activation function determines the output of a deep learning network, its accuracy, and computational efficiency of training the network.

1. Sigmoid function

Sigmoid is a non-linear activation function. Also known as the Logistic function. It is continuous and monotonic. The output is normalized in the range 0 to 1. It is differentiable and gives a smooth gradient curve. Sigmoid is mostly used before the output layer in binary classification.

  1. Smooth gradient, preventing “jumps” in output values.
  2. Output values bound between 0 and 1, normalizing the output of each neuron.
  1. Prone to gradient vanishing
  2. Function output is not zero-centered

2. tanh function

Hyperbolic tangent activation function value ranges from -1 to 1, and derivative values lie between 0 to 1. It is zero centric. Performs better than sigmoid. They are used in binary classification for hidden layers.

3. ReLU function

Rectified Linear Unit is the most used activation function in hidden layers of a deep learning model. The formula is pretty simple, if the input is a positive value, then that value is returned otherwise 0. Thus the derivative is also simple, 1 for positive values and 0 otherwise(since the function will be 0 then and treated as constant so derivative will be 0). Thus it solves the vanishing gradient problem. The range is 0 to infinity.

  1. When the input is positive, there is no gradient saturation problem.
  2. The calculation speed is much faster.
  3. The ReLU function has only a linear relationship.
  1. ReLU function is not a 0-centric function.
  2. When the input is negative, ReLU is completely inactive, which means that once a negative number is entered, ReLU will die. In forward propagation process, it is not a problem but in backpropagation process, if you enter a negative number, the gradient will be completely zero, which has the same problem as the sigmod function and tanh function.

4. Leaky ReLU function

LeakyReLU is a slight variation of ReLU. For positive values, it is same as ReLU, returns the same input, and for other values, a constant 0.01 with input is provided. This is done to solve the dying ReLu problem. The derivative is 1 for positive and 0.01 otherwise.

5. ELU (Exponential Linear Units) function

Exponential Linear Unit overcomes the problem of dying ReLU. Quite similar to ReLU except for the negative values. This function returns the same value if the value is positive otherwise, it results in alpha(exp(x) — 1), where alpha is a positive constant. The derivative is 1 for positive values and product of alpha and exp(x) for negative values. The Range is 0 to infinity. It is zero centric.

  • No Dead ReLU issues
  • The mean of the output is close to 0, zero-centered

6. PRelu (Parametric ReLU)

Parameterized Rectified Linear Unit is again a variation of ReLU and LeakyReLU with negative values computed as alpha*input. Unlike Leaky ReLU where the alpha is 0.01 here in PReLU alpha value will be learnt through backpropagation by placing different values and the will thus provide the best learning curve.

  • if aᵢ=0, f becomes ReLU
  • if aᵢ>0, f becomes leaky ReLU
  • if aᵢ is a learnable parameter, f becomes PReLU

7. Swish (A Self-Gated) Function

Swish is a kind of ReLU function. It is a self-grated function single it just requires the input and no other parameter. Formula y = x * sigmoid(x). Mostly used in LSTMs. Zero centric and solves the dead activation problem. Has smoothness which helps in generalisation and optimisation.

8. Softmax function

Softmax activation function returns probabilities of the inputs as output. The probabilities will be used to find out the target class. Final output will be the one with the highest probability. The sum of all these probabilities must be equal to 1. This is mostly used in classification problems, preferably in multiclass classification.

9. Softplus function

Finding the derivative of 0 is not mathematically possible. Most activation functions have failed at some point due to this problem. It is overcome by softplus activation function. Formula y = ln(1 + exp(x)). It is similar to ReLU. Smoother in nature. Ranges from 0 to infinity.

Activation functions through dance moves:



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ayushi choudhary

Ayushi choudhary


Machine learning and data science enthusiast. Eager to learn new technology advances.