Here are few reasons:
- When properly set over intervals such as #tex2html_wrap_inline2827#, piecewise
rational functions can approximate the function #tex2html_wrap_inline2829# over the same
interval.
- Evaluation of #tex2html_wrap_inline2831# functions may be much faster than
evaluation of #tex2html_wrap_inline2833#.
- If training and convergence are not significantly worse for
#tex2html_wrap_inline2835# nets then their overall training will be faster.
- For sequential implementations, the time to evaluate #tex2html_wrap_inline2837# (the
activation function) and its first derivative, #tex2html_wrap_inline2839#, may not
be very important in large networks; but parallel implementations can
be more sensitive to the effort required to evaluate #tex2html_wrap_inline2841# and
#tex2html_wrap_inline2843#. Thus, we expect a faster parallel implementation when
#tex2html_wrap_inline2845# functions are used as activation functions.