Author 
Topic: ROC area under the curve decreases when the hidden neuron number is increased 

gdenunzio 
Posted: 02Feb11 08:51 



Hi all, my experience with neural networks is mostly a practical one, and is not really so deep, so I am surprised of the behavior of my ANN.
It is a feedforward backprop MLP used for binary classification. 14 inputs, 1 output, 1 HL, sigmoid transfer functions. The data set for learning and testing contains very few positive (say class = 1) feature vectors (some hundreds) and tens of thousands of negatives (class = 0). A very unbalanced (and noisy) data set.
During learning, I am using the early stop technique (90% of the training set for real training, 10% for validation). The ROC curve is then calculated on the test set and the area under the curve (auc) is computed.
In the training set, I am inserting about twice as many negatives (randomly chosen) as positives.
I have been working with 10 hidden neurons (hn), obtaining about 0.70 as the auc, not so much but enough for starting.
Now, I decided to explore the "best" number of hn, and varied it from 1 to 100, every time calculating the auc. I was expecting to find an increasing auc at small and increasing hn number (initial underfitting) followed by saturation or even auc decrease (risk of overfitting and loss of generalization).
I was surprised when I observed that the behavior was immediately decreasing (for increasing hn number), and just 1 hn was better than 2.., 10 and so on (the dependence of auc in the number of hd was more or less linear).
I was even more surprised when I removed the hidden layer and the auc still (slightly) increased, making me think that my noisy problem was a trivial linear one.
Moreover, in the FAQ of comp.ai.neuralnets, I find: "You may not need any hidden layers at all. Linear and generalized linear models are useful in a wide variety of applications (McCullagh and Nelder 1989). And even if the function you want to learn is mildly nonlinear, you may get better generalization with a simple linear model than with a complicated nonlinear model if there is too little data or too much noise to estimate the nonlinearities accurately. "
What is the expert's opinion? I am in an almost linear situation, or maybe other problems are playing a role, say the large number of false positives compared with the positive one, so that the negatives that I insert in my training set is not significant enough?
Thank you for your patience.
Sincerely Giorgio 


pejman 
Posted: 03Feb11 18:25 



Having low number of hidden neurons is not a bad thing, especially if you are dealing with a linear problem, as you noted. But an extreme imbalance of training data could negatively impact the learning.
Consider ways to offset the imbalance. E.g. artificial rows or even duplicates. 

