

Hello,
I've been experimenting with the use of exponential functions as activation functions of neurons of my neural network.
Previously I've created and trained an MLP network which can relatively well predict the consumption of natural gas, using the sigmoid as the activation function. The output was a value in the range <0, 1>, and then I used that value as an input to a (nonlinear) function that equals 0 when input is 0, and infinity when input is 1. The result of the transformation was the prediction divided by the average of my training set data, which made normal network outputs in the range [0.3, 0.7].
This transformation introduced an error in prediction of its own, so I decided that the neural network should return the exact result, except perhaps scaled linearly. On the Neural Network FAQ (ftp://ftp.sas.com/pub/neural/FAQ.html) I read that if I need a positive unbounded output, I should use the exponential function as the activation function, so I used f(x) = e^x.
The derivative of this function is d(e^x) / dx = e^x again, so I updated my formulas for weight correction accordingly (they are now of the form deltaw = LR * o * (t  o); for output layer, and deltaw = LR * o * sum; for hidden layers, where LR is the learn rate, o is the actual output, and t is the desired output). The problem is that even with a tiny learn rate, weight values and neuron outputs explode and end up as infinite after just a few examples have been shown to the network.
Am I correct in assuming that the problem is the slope of the exponential function (a small change in input results in a drastic change in the output)? Will I be any better off if I use square root as the activation function (since the problem there is opposite  a large change in input results in a small change in output)?
Is there, perhaps, a better way to scale my prediction properly when using the sigmoid as the activation function?
Please feel free to share your thoughts if you think I'm doing something wrong. This is the first neural network that I'm making (actually it's 24 first neural networks, one for each hour of the day) and I'm open to suggestions.
Thanks, Nikola 