layout | title | mathjax | tags | categories | description | |
---|---|---|---|---|---|---|
post |
15. Sigmoid Function |
true |
|
Basic_Machine_Learning |
Details information about the Sigmoid Function |
Sigmoid functions are the functions in mathematic that represent their shape as letter "S". Most important advantage of applying sigmoid functions in machine learning is that it is easy to get derivative which lead to reduce to time complexity when computing their derivative in learning process. We will go through three commonest sigmoid functions including logistic function, hyperbolic tangent, and arctangent.
Logistic Function is a commonly used activation function in machine learning, especially in classification problems. Its characteristic S-shape makes it a smooth, continuous function that maps any real-valued number to a value between 0 and 1, making it ideal for representing probabilities.
The logistic function
where
Key Characteristics
1. Range: The output of the logistic function is always between 0
and 1: As
2. S-Shaped Curve: The logistic function is often referred to as a "sigmoid" because of its S-shaped curve. This makes it particularly useful for probability estimation, as it maps large positive numbers close to 1, large negative numbers close to 0, and values near zero close to 0.5.
3. Derivative: The derivative of the sigmoid function has a specific
form, which makes it efficient to compute in backpropagation for neural
networks:
Applications
1. Logistic Regression: The logistic sigmoid function is foundational in logistic regression, where it is used to transform linear combinations of input features into probabilities. This makes it well-suited for binary classification tasks, where the output represents the probability of belonging to a particular class.
2. Neural Networks: In early neural networks, sigmoid functions were often used as activation functions in hidden layers. The output range of (0, 1) allows for gradient-based learning and enables the network to capture non-linear relationships.
3. Probabilistic Interpretation: Because the sigmoid output is always between 0 and 1, it can be interpreted as the probability of an instance belonging to a particular class. This probabilistic interpretation is valuable in many applications, including spam detection, medical diagnosis, and sentiment analysis.
Advantages and Limitations
- Advantages: The logistic function is differentiable, allowing for gradient-based optimization methods. In addition, its output range from 0 to 1 makes it ideal for probabilistic interpretations in classification models.
- Limitations: For very large or small input values, the logistic function saturates, meaning the gradient approaches zero. This can slow down or stop training in deep neural networks, particularly when many layers are stacked. Unlike functions such as tanh, which are centered around zero, the logistic sigmoid has an output range of (0, 1), which can lead to slower convergence in some neural networks.
The Hyperbolic Tangent (tanh) Function is widely used in machine learning, especially in neural networks, due to its sigmoid shape and symmetry around the origin. It is defined mathematically as:
An alternative expression for the tanh function, in terms of the
logistic sigmoid function
where
Key Characteristics
1. Range: The tanh function maps any real input to the range
2. Symmetry: The tanh function is an *odd function*, meaning it is
symmetric about the origin:
3. Derivative: The derivative of the tanh function is given by:
Applications
1. Neural Networks: The tanh function is often used as an activation
function in hidden layers. Since it maps inputs to a range between
2. Signal Processing and Control Systems: Tanh is suitable for modeling processes that require both positive and negative output ranges, making it applicable in signal processing and control systems.
3. Image Processing: In certain image processing tasks, tanh can help normalize pixel values or be used in transformations where a balanced range of pixel intensity is needed.
Advantages and Limitations
- Advantages: The centered output of tanh allows for balanced data,
which can speed up convergence in gradient-based optimization. The range
of
- Limitations: For very large or small input values, tanh outputs
values close to
Arctangent function (often denoted as
The arctangent function is defined as:
where
Key Characteristics
1. Range: The arctangent function maps real numbers to a bounded
range:
2. S-Shaped Curve: Similar to other sigmoid functions like the logistic and hyperbolic tangent functions, the arctangent has an S-shaped curve. It is smooth, continuous, and symmetric around the origin. This S-shape, coupled with the bounded range, makes it useful for applications where controlled, gradual changes in output are needed.
3. Derivative: The derivative of the arctangent function is:
1. Machine Learning and Activation Functions: Although less common than logistic and tanh functions, the arctangent function can serve as an activation function in neural networks, particularly for specialized tasks requiring smooth, bounded output. It is valued for its smoother slope near zero and its inherent symmetry.
2. Signal Processing: In signal processing, arctangent is used to
calculate phase angles, especially in applications involving complex
numbers. For example, the two-argument arctangent function,
3. Geometry and Robotics: The arctangent function plays a role in geometry and robotics for calculating angles, such as determining the orientation of a robot based on its coordinates or finding angles between points in space.
4. Image Processing: Arctangent is used in edge detection and image
gradient calculations to find the direction of intensity gradients in
images. The angle calculated via
- Advantages: With a range between
- Limitations: In some applications, the limited output range of