Software implementation of Multivariate Logistic Regression can lead to NaN problem . This is primarily due to the fact that although Sigmoid, SoftMax and tan hyperbolic functions are bounded infinitely differentiable functions, their exponential nature makes them grow very quickly. Our study shows that a function involving arctan (or tan inverse), which is independent of exponential function, can work better in case of Multivariate Logistic Regressions.
In statistics, the Multivariate Logistic Regressions is used to model the probability of occurrence of a certain class or event such as pass/fail, win/lose/tie, etcetera. Each object being detected with a probability between 0 and 1 and the sum adding to one. Exponential function defined on real field is a strictly increasing functions satisfying
x < exp(x), for all x ∈ ℝ .
Thus, exponential function goes to infinity even in small abscissa value. For e.g exp(2000) is undetermined.
For Multivariate Logistic Regressions if we use Newton Raphson or Newton CG method to determine the coefficients, we need Jacobian and Hessian matrix respectively. For both Jacobian and Hessian matrix we need first and second order partial derivatives of the probability density function. So if we use exponential in the Pdf ( Probability density function), the Jacobian and Hessian matrix will involve division of exponentials (compositions of sigmoid and other exponentials) as shown below
Although we know that the above expression is mathematically true but in coding language for very large value of x, the above division gives NAN.
Now look at the sigmoid function which is defined as follows:
F(x) is bounded on ℝ and infinitely differentiable function. Refer to Exhibit 1 below
Now let us consider the arctan function on Real field as shown in Exhibit 2 below
Note that arctan is a continuously differentiable function on Real field satisfying:
tan-1(x) ∈ [-π/2, π/2], for all x ∈ ℝ .
It is a bounded function on reals and takes value in the interval [-π/2, π/2].
So, now let us modify the arctan function to fulfil the requirement of multivariate logistic regression. Let us assume the new function to be f(x) on ℝ .defined by
Notice that f(x) is also a bounded continuously differentiable function satisfying
0 ≤ f(x) ≤ 1, for all x ∈ ℝ .
Thus f(x) satisfies every general condition for logistic regressions but it never grows exponentially.
Comparing Exhibit 1 and Exhibit 3, it follows that the graphical structure of f(x) is nearly same as F(x) but unlike sigmoid, f(x) is independent of exponential function. Thus the division of exponentials will never occur during Jacobian and Hessian matrix calculation which in turn will not lead to NAN issue in computer division routine..
Note, that the density of arctan function is much more than Sigmoid and other exponential approximator functions. Therefore, if we use f(x) instead of Sigmoid then rate of convergence of the Newton Raphson or Newton CG may be slower, or it may take twice or even thrice (even more) the time taken to converge as compared to Sigmoid function.
If the training data contain numbers whose absolute values are between 0 and 1 (this can be achieved through normalization), then sigmoid function may give a better solution than tan inverse but may suffer from the NAN issue.
So any function which acts like Sigmoid but independent of exponential and also less denser than tan inverse will give better solution in Multivariate Logistic Regression method without the occurrence of NAN problem.