www.ShoppingPodder.com

Leading Computer Shopping,
News and information


Part of the Identityscape.com network...

getxfactor.com jmoodmusic.com smartbusinesschoices.com mintdepot.com lowfaresalways.com evangelicalview.com shoppingpodder.com soproudlywehail.com webnews.ws currenthumor.com

 

 

example of cross validation
   Shopping Podder - the Best of Computer Postings! Forum Index -> Computer Artificial Intelligence - Neural Nets  
View previous topic :: View next topic  
Author Message
Guest







PostPosted: Wed Nov 19, 2008 12:10 am    Post subject: example of cross validation Reply with quote

Greetings,

I have gone through the FAQ as well as some other literature including
contributions from Greg on various questions . I would like to know
whether my understanding about cross-validation and application is
correct . I appreciate your help on this.

Situation: I have a very small data set say less than 70 and predict 2
outputs using 6 inputs with ANN.
Rejected Choice: Split test with division of data into 3 parts with
validation set guiding the early stopping to avoid overfitting.
Prefered choices: bootstrapping, leave-k-out (considering the data
size, leave-one-out). I have thought of opting for leave-one-out

Method:-
- I set ANN architecture (no. of hidden nodes/layer(=1) etc.). No of
epoch/iteration for training
- I develop ANN for 70 times each time leaving one data for cross-
validation.
- I sum the error for cross validation of all the ANN models/take
average over the whole models and outputs(2 nos in my case).
- I present the result in terms of R-square values for training and
cross-validation phases

Ambiguities/Question:
- How to decide No. of epoch for training? As a usual practitioner of
split-test, I find therein the stopping point is pretty
straightforward: stop based on the performance of validation data
set.
My assumption: run the model with different number of epoch size/
iteration: the one with lowest average error should guide the
selection of epoch size/no of iterations.

Enlightenment on the practical use of Leave-one-out approach for ANN
application problem would be highly appreciated.

Thanks
Soms
Back to top
Greg Heath
Guest






PostPosted: Thu Nov 20, 2008 11:30 am    Post subject: Re: example of cross validation Reply with quote

On Nov 18, 7:10 pm, soms.sha...@gmail.com wrote:
Quote:
Greetings,

I have gone through the FAQ as well as some other literature including
contributions from Greg on various questions . I would like to know
whether my understanding about cross-validation and application is
correct . I appreciate your help on this.

Situation: I have a very small data set say less than 70 and predict 2
outputs using 6 inputs with ANN.
Rejected Choice: Split test with division of data into 3 parts with
validation set guiding the early stopping to avoid overfitting.
Prefered choices: bootstrapping, leave-k-out (considering the data
size, leave-one-out). I have thought of opting for leave-one-out

I am not a fan of LOO. You can use M trials of 10-fold XVAL with
Ntst/Nval/Ntst ~ 56/7/7. For each fold you can use both Errtst
at the minimum of Errval and Errval at the minimum of Errtst.
Therefore, you will average 20*M error rates instead of 10*M.

A modification by Tony Plate considers the 45 (10*9/2) different
val/tst pairs obtained from the 10 subsets in each trial.
Therefore, you will average 90*M error rates.

If you plot the cumulative mean and mean+/-stdv vs fold, it
will become obvious when M is sufficiently large.

Quote:
Method:-
- I set ANN architecture (no. of hidden nodes/layer(=1) etc.). No of
epoch/iteration for training
- I develop ANN for 70 times each time leaving one data for cross-
validation.
- I sum the error for cross validation of all the ANN models/take
average over the whole  models and outputs(2 nos in my case).
- I present the result in terms of R-square values for training and
cross-validation phases

I use adjusted R-square for the training MSE.

Quote:
Ambiguities/Question:
- How to decide No. of epoch for training? As a usual practitioner of
split-test, I find therein the stopping point is pretty
straightforward: stop based on the performance of validation data
set.
My assumption: run the model with different number of epoch size/
iteration: the one with lowest average error should guide the
selection of epoch size/no of iterations.

Enlightenment on the practical use of Leave-one-out approach for ANN
application problem would be highly appreciated.

Since you have rejected Early-Stopping, you have to train
to convergence. This typically requires setting thresholds
for one or more of the following stopping criteria:

1. No. of epochs
2. MSE
3. DMSE (MSE(k+1)-MSE(k))
4. DMSE/MSE

When not using Early-Stopping it is often useful to use
Weight-Decay.

Hope this helps.

Greg
Back to top
Guest







PostPosted: Thu Nov 20, 2008 4:13 pm    Post subject: Re: example of cross validation Reply with quote

Thanks Greg for your enlightenment!

I will go with k-fold x-validation. Nevertheless, I wish to have my
fundamentals on the loo straight.

loo:
Since in loo we do not have early stopping criteria based on
validation data set as in other approaches, we go for some sort of
thresholds as you suggested. Clearly, we may be able to select number
of hidden nodes based on the error of validation set. But since our
stopping criteria allows the network to go until the threshold is
reached, does this prevent overfitting? Or, does it mean we opt for
the one with lowest level of overfitting (if any). Of course, we might
go for MacKay's regularized ANN for not allowing the overfitting.

kfold:
From your suggestion as dividing the 70 points data into 56-7-7
(ntrain,nval,ntst), I assume that for each fold we have kind of
identical situation as that of static validation (split test). We go
for early stopping based on the validation data set. But since we are
using validation data from different parts of the dataset, the
generalization of ANN will be better. Am I right?

bootstrap: Is there any preferred choice between k-fold/loo and
bootstrap validation?

Thanks
Soms




On Nov 20, 3:30 am, Greg Heath <he...@alumni.brown.edu> wrote:
Quote:
On Nov 18, 7:10 pm, soms.sha...@gmail.com wrote:

Greetings,

I have gone through the FAQ as well as some other literature including
contributions from Greg on various questions . I would like to know
whether my understanding about cross-validation and application is
correct . I appreciate your help on this.

Situation: I have a very small data set say less than 70 and predict 2
outputs using 6 inputs with ANN.
Rejected Choice: Split test with division of data into 3 parts with
validation set guiding the early stopping to avoid overfitting.
Prefered choices: bootstrapping, leave-k-out (considering the data
size, leave-one-out). I have thought of opting for leave-one-out

I am not a fan of LOO. You can use M trials of 10-fold XVAL with
Ntst/Nval/Ntst ~ 56/7/7. For each fold you can use both Errtst
at the minimum of Errval and Errval at the minimum of Errtst.
Therefore, you will average 20*M error rates instead of 10*M.

A modification by Tony Plate considers the 45 (10*9/2) different
val/tst pairs obtained from the 10 subsets in each trial.
Therefore, you will average 90*M error rates.

If you plot the cumulative mean and mean+/-stdv vs fold, it
will become obvious when M is sufficiently large.

Method:-
- I set ANN architecture (no. of hidden nodes/layer(=1) etc.). No of
epoch/iteration for training
- I develop ANN for 70 times each time leaving one data for cross-
validation.
- I sum the error for cross validation of all the ANN models/take
average over the whole  models and outputs(2 nos in my case).
- I present the result in terms of R-square values for training and
cross-validation phases

I use adjusted R-square for the training MSE.

Ambiguities/Question:
- How to decide No. of epoch for training? As a usual practitioner of
split-test, I find therein the stopping point is pretty
straightforward: stop based on the performance of validation data
set.
My assumption: run the model with different number of epoch size/
iteration: the one with lowest average error should guide the
selection of epoch size/no of iterations.

Enlightenment on the practical use of Leave-one-out approach for ANN
application problem would be highly appreciated.

Since you have rejected Early-Stopping, you have to train
to convergence. This typically requires setting thresholds
for one or more of the following stopping criteria:

1. No. of epochs
2. MSE
3. DMSE (MSE(k+1)-MSE(k))
4. DMSE/MSE

When not using Early-Stopping it is often useful to use
Weight-Decay.

Hope this helps.

Greg
Back to top
Greg Heath
Guest






PostPosted: Fri Nov 21, 2008 5:23 am    Post subject: Re: example of cross validation Reply with quote

CORRECTED FOR THE HEINOUS SIN OF TOP-POSTING!

On Nov 20, 11:13 am, soms.sha...@gmail.com wrote:
Quote:
On Nov 20, 3:30 am, Greg Heath <he...@alumni.brown.edu> wrote:
On Nov 18, 7:10 pm, soms.sha...@gmail.com wrote:

Greetings,

I have gone through the FAQ as well as some other literature including
contributions from Greg on various questions . I would like to know
whether my understanding about cross-validation and application is
correct . I appreciate your help on this.

Situation: I have a very small data set say less than 70 and predict 2
outputs using 6 inputs with ANN.
Rejected Choice: Split test with division of data into 3 parts with
validation set guiding the early stopping to avoid overfitting.
Prefered choices: bootstrapping, leave-k-out (considering the data
size, leave-one-out). I have thought of opting for leave-one-out

I am not a fan of LOO. You can use M trials of 10-fold XVAL with
Ntst/Nval/Ntst ~ 56/7/7. For each fold you can use both Errtst
at the minimum of Errval and Errval at the minimum of Errtst.
Therefore, you will average 20*M error rates instead of 10*M.

A modification by Tony Plate considers the 45 (10*9/2) different
val/tst pairs obtained from the 10 subsets in each trial.
Therefore, you will average 90*M error rates.

If you plot the cumulative mean and mean+/-stdv vs fold, it
will become obvious when M is sufficiently large.

Method:-
- I set ANN architecture (no. of hidden nodes/layer(=1) etc.). No of
epoch/iteration for training
- I develop ANN for 70 times each time leaving one data for cross-
validation.
- I sum the error for cross validation of all the ANN models/take
average over the whole models and outputs(2 nos in my case).
- I present the result in terms of R-square values for training and
cross-validation phases

I use adjusted R-square for the training MSE.

Ambiguities/Question:
- How to decide No. of epoch for training? As a usual practitioner of
split-test, I find therein the stopping point is pretty
straightforward: stop based on the performance of validation data
set.
My assumption: run the model with different number of epoch size/
iteration: the one with lowest average error should guide the
selection of epoch size/no of iterations.

Enlightenment on the practical use of Leave-one-out approach for ANN
application problem would be highly appreciated.

Since you have rejected Early-Stopping, you have to train
to convergence. This typically requires setting thresholds
for one or more of the following stopping criteria:

1. No. of epochs
2. MSE
3. DMSE (MSE(k+1)-MSE(k))
4. DMSE/MSE

When not using Early-Stopping it is often useful to use
Weight-Decay.

Thanks Greg for your enlightenment!

I will go with k-fold x-validation. Nevertheless, I wish to have my
fundamentals on the loo straight.

loo:
Since in loo we do not have early stopping criteria based on
validation data set as in other approaches,

Come to think of it, there is no reason why you couldn't have M
trials of LOO with Early-Stopping. Splits would be of the form
Ntrn/Nval/Ntst with Ntst = 1. Different trials would have different
random draws of Ntrn and Nval for the same test case.

Obviously, however, it would be highly impractical because you
would have to design N*M nets.

Quote:
we go for some sort of
thresholds as you suggested. Clearly, we may be able to select number
of hidden nodes based on the error of validation set.

What validation set? Are you suggesting first using

Ntrn/Nval = (N-Ntrn)/Ntst =0

to deternine H. Then repartition the data and use

Ntrn/Nval = 0/Ntst = 1

to estimate generalization error?

Quote:
But since our
stopping criteria allows the network to go until the threshold is
reached, does this prevent overfitting?

If you did use validation subsets to determine H, I would
suggest simultaneously using them to estimate a stopping
epoch.

Quote:
Or, does it mean we opt for
the one with lowest level of overfitting (if any). Of course, we might
go for MacKay's regularized ANN for not allowing the overfitting.

See my suggestion on weight-decay

Quote:
kfold:
From your suggestion as dividing the 70 points data into 56-7-7
(ntrain,nval,ntst), I assume that for each fold we have kind of
identical situation as that of static validation (split test). We go
for early stopping based on the validation data set. But since we are
using validation data from different parts of the dataset, the
generalization of ANN will be better. Am I right?

Better than what?

Quote:
bootstrap: Is there any preferred choice between k-fold/loo and
bootstrap validation?

I prefer 10-fold XVAL and do not recommend LOO (has
a high variance).

Sometimes, in the spirit of bootstrapping, I add duplicate
copies of the training cases to the training set to obtain

Ntrn = N/Nval = 0.1*N/Ntst = 0.1*N

and, as explained previously, sometimes I obtain two
simultaneous error estimates: Errtst at the min of Errval
AND Errval at the min of Errtst.

Hope this helps.

Greg
Back to top
Guest







PostPosted: Fri Nov 21, 2008 6:20 pm    Post subject: Re: example of cross validation Reply with quote

On Nov 20, 9:23 pm, Greg Heath <he...@alumni.brown.edu> wrote:
Quote:
CORRECTED FOR THE HEINOUS SIN OF TOP-POSTING!
Thanks. I learned this rule of newsposting. Will not repeat the same..


Quote:

On Nov 20, 11:13 am, soms.sha...@gmail.com wrote:



On Nov 20, 3:30 am, Greg Heath <he...@alumni.brown.edu> wrote:
On Nov 18, 7:10 pm, soms.sha...@gmail.com wrote:

Greetings,

I have gone through the FAQ as well as some other literature including
contributions from Greg on various questions . I would like to know
whether my understanding about cross-validation and application is
correct . I appreciate your help on this.

Situation: I have a very small data set say less than 70 and predict 2
outputs using 6 inputs with ANN.
Rejected Choice: Split test with division of data into 3 parts with
validation set guiding the early stopping to avoid overfitting.
Prefered choices: bootstrapping, leave-k-out (considering the data
size, leave-one-out). I have thought of opting for leave-one-out

I am not a fan of LOO. You can use M trials of 10-fold XVAL with
Ntst/Nval/Ntst ~ 56/7/7. For each fold you can use both Errtst
at the minimum of Errval and Errval at the minimum of Errtst.
Therefore, you will average 20*M error rates instead of 10*M.

A modification by Tony Plate considers the 45 (10*9/2) different
val/tst pairs obtained from the 10 subsets in each trial.
Therefore, you will average 90*M error rates.

If you plot the cumulative mean and mean+/-stdv vs fold, it
will become obvious when M is sufficiently large.

Method:-
- I set ANN architecture (no. of hidden nodes/layer(=1) etc.). No of
epoch/iteration for training
- I develop ANN for 70 times each time leaving one data for cross-
validation.
- I sum the error for cross validation of all the ANN models/take
average over the whole  models and outputs(2 nos in my case).
- I present the result in terms of R-square values for training and
cross-validation phases

I use adjusted R-square for the training MSE.

Ambiguities/Question:
- How to decide No. of epoch for training? As a usual practitioner of
split-test, I find therein the stopping point is pretty
straightforward: stop based on the performance of validation data
set.
My assumption: run the model with different number of epoch size/
iteration: the one with lowest average error should guide the
selection of epoch size/no of iterations.

Enlightenment on the practical use of Leave-one-out approach for ANN
application problem would be highly appreciated.

Since you have rejected Early-Stopping, you have to train
to convergence. This typically requires setting thresholds
for one or more of the following stopping criteria:

1. No. of epochs
2. MSE
3. DMSE (MSE(k+1)-MSE(k))
4. DMSE/MSE

When not using Early-Stopping it is often useful to use
Weight-Decay.

Thanks Greg for your enlightenment!

I will go with k-fold x-validation. Nevertheless, I wish to have my
fundamentals on the loo straight.

loo:
Since in loo we do not have early stopping criteria based on
validation data set as in other approaches,

Come to think of it, there is no reason why you couldn't have M
trials of LOO with Early-Stopping. Splits would be of the form
Ntrn/Nval/Ntst with Ntst = 1. Different trials would have different
random draws of Ntrn and Nval for the same test case.

Obviously, however, it would be highly impractical because you
would have to design N*M nets.

we go for some sort of
thresholds as you suggested. Clearly, we may be able to select number
of hidden nodes based on the error of validation set.

What validation set? Are you suggesting first using

Ntrn/Nval = (N-Ntrn)/Ntst =0

to deternine H. Then repartition the data and use

Ntrn/Nval = 0/Ntst = 1

to estimate generalization error?

-For each fold of 10 folds
Divide N into Ntrn and Ntst.
Divide Ntrn further into Ntrn1/Nval
Decide H based on early stopping from Nval's error evolution with
epochs. The ANN parameters are based on Ntrn1 but Nval helps avoiding
overfitting.
Apply that to Ntst

-Get the Ntst results of all the folds to estimate generalization
error.

Quote:
But since our
stopping criteria allows the network to go until the threshold is
reached, does this prevent overfitting?

If you did use validation subsets to determine H, I would
suggest simultaneously using them to estimate a stopping
epoch.

Or, does it mean we opt for
the one with lowest level of overfitting (if any). Of course, we might
go for MacKay's regularized ANN for not allowing the overfitting.

See my suggestion on weight-decay

kfold:
From your suggestion as dividing the 70 points data into  56-7-7
(ntrain,nval,ntst), I assume that  for each fold we have kind of
identical situation as that of static validation (split test). We go
for early stopping based on the validation data set. But since we are
using validation data from different parts of the dataset, the
generalization of ANN will be better. Am I right?

Better than what?

Better than usual static split test where a single ANN with fixed
dataset for each group is used.

Quote:

bootstrap: Is there any preferred choice between k-fold/loo and
bootstrap validation?

I prefer 10-fold XVAL and do not recommend LOO (has
a high variance).

Sometimes, in the spirit of bootstrapping, I add duplicate
copies of the training cases to the training set to obtain

Ntrn = N/Nval = 0.1*N/Ntst = 0.1*N

and, as explained previously, sometimes I obtain two
simultaneous error estimates: Errtst at the min of Errval
AND Errval at the min of Errtst.

 Hope this helps.

Greg
Back to top
Greg Heath
Guest






PostPosted: Sat Nov 22, 2008 8:25 pm    Post subject: Re: example of cross validation Reply with quote

On Nov 21, 1:20 pm, soms.sha...@gmail.com wrote:
Quote:
On Nov 20, 9:23 pm, Greg Heath <he...@alumni.brown.edu> wrote:
CORRECTED FOR THE HEINOUS SIN OF TOP-POSTING!

Thanks. I learned this rule of newsposting. Will not repeat the same..

Well, you just did it again! (;>)

Quote:
On Nov 20, 11:13 am, soms.sha...@gmail.com wrote:

On Nov 20, 3:30 am, Greg Heath <he...@alumni.brown.edu> wrote:
On Nov 18, 7:10 pm, soms.sha...@gmail.com wrote:

Greetings,

I have gone through the FAQ as well as some other literature including
contributions from Greg on various questions . I would like to know
whether my understanding about cross-validation and application is
correct . I appreciate your help on this.

Situation: I have a very small data set say less than 70 and predict 2
outputs using 6 inputs with ANN.
Rejected Choice: Split test with division of data into 3 parts with
validation set guiding the early stopping to avoid overfitting.
Prefered choices: bootstrapping, leave-k-out (considering the data
size, leave-one-out). I have thought of opting for leave-one-out

I am not a fan of LOO. You can use M trials of 10-fold XVAL with
Ntst/Nval/Ntst ~ 56/7/7. For each fold you can use both Errtst
at the minimum of Errval and Errval at the minimum of Errtst.
Therefore, you will average 20*M error rates instead of 10*M.

A modification by Tony Plate considers the 45 (10*9/2) different
val/tst pairs obtained from the 10 subsets in each trial.
Therefore, you will average 90*M error rates.

If you plot the cumulative mean and mean+/-stdv vs fold, it
will become obvious when M is sufficiently large.

Method:-
- I set ANN architecture (no. of hidden nodes/layer(=1) etc.). No of
epoch/iteration for training
- I develop ANN for 70 times each time leaving one data for cross-
validation.
- I sum the error for cross validation of all the ANN models/take
average over the whole  models and outputs(2 nos in my case).
- I present the result in terms of R-square values for training and
cross-validation phases

I use adjusted R-square for the training MSE.

Ambiguities/Question:
- How to decide No. of epoch for training? As a usual practitioner of
split-test, I find therein the stopping point is pretty
straightforward: stop based on the performance of validation data
set.
My assumption: run the model with different number of epoch size/
iteration: the one with lowest average error should guide the
selection of epoch size/no of iterations.

Enlightenment on the practical use of Leave-one-out approach for ANN
application problem would be highly appreciated.

Since you have rejected Early-Stopping, you have to train
to convergence. This typically requires setting thresholds
for one or more of the following stopping criteria:

1. No. of epochs
2. MSE
3. DMSE (MSE(k+1)-MSE(k))
4. DMSE/MSE

When not using Early-Stopping it is often useful to use
Weight-Decay.

Thanks Greg for your enlightenment!

I will go with k-fold x-validation. Nevertheless, I wish to have my
fundamentals on the loo straight.

loo:
Since in loo we do not have early stopping criteria based on
validation data set as in other approaches,

Come to think of it, there is no reason why you couldn't have M
trials of LOO with Early-Stopping. Splits would be of the form
Ntrn/Nval/Ntst with Ntst = 1. Different trials would have different
random draws of Ntrn and Nval for the same test case.

Obviously, however, it would be highly impractical because you
would have to design N*M nets.

we go for some sort of
thresholds as you suggested. Clearly, we may be able to select number
of hidden nodes based on the error of validation set.

What validation set? Are you suggesting first using

Ntrn/Nval = (N-Ntrn)/Ntst =0

to deternine H. Then repartition the data and use

Ntrn/Nval = 0/Ntst = 1

to estimate generalization error?

-For each fold of 10 folds
Divide N into Ntrn and Ntst.

Randomly divide N into 10 subsets of size ~0.1*N

Randomly choose 1 subset for testing (Ntst = 0.1*N)
leaving the remainimg 9 subsets for design
(Ndes = 0.9*N)

Quote:
Divide Ntrn further into Ntrn1/Nval

From Ndes, randomly choose 1 subset for validation
(Nval = 0.1*Nl) leaving the remaining 8 for training
(Ndes = Ntrn+Nval, Ntrn = 0.8*N)

Note that there are are 10*9/2 = 45 ways to choose
pairs of subsets for validation and testing. Traditional
XVAL uses only 10 of them ... each subset being used once
as a test set.

Now, which 10 should be chosen for validation sets?
Theoretically, since all subsets are randomly chosen,
it is sufficient that the 10 val/tst pairs are unique.
However, practice always strays from theory. Therefore,
as long as the data set is not huge, it is no big deal
to use more than 10 ... even all 45!

Quote:
Decide H based on early stopping from Nval's error evolution with
epochs. The ANN parameters are based on Ntrn1 but Nval helps avoiding
overfitting.

Once H and epochmax are determined,

Quote:
Apply that to Ntst

-Get the Ntst results of all the folds to estimate generalization
error.

Average the 10 Errtst values obtained at the
mins of the corresponding Errvals.

Note that if H is known apriori, i.e., not detemined fron
val sets, then the val/tst pairs can be interchanged.
Consequently, you could use Errval at the mins of Errtst
and add 10 more terms to the average!

Quote:
But since our
stopping criteria allows the network to go until the threshold is
reached, does this prevent overfitting?

If you did use validation subsets to determine H, I would
suggest simultaneously using them to estimate a stopping
epoch.

Or, does it mean we opt for
the one with lowest level of overfitting (if any). Of course, we might
go for MacKay's regularized ANN for not allowing the overfitting.

See my suggestion on weight-decay

kfold:
From your suggestion as dividing the 70 points data into  56-7-7
(ntrain,nval,ntst), I assume that  for each fold we have kind of
identical situation as that of static validation (split test). We go
for early stopping based on the validation data set. But since we are
using validation data from different parts of the dataset, the
generalization of ANN will be better. Am I right?

Better than what?

Better than usual static split test where a single ANN with fixed
dataset for each group is used.

Yes, XVAL should be superior for smaller data sets where simple
spitting into trn/val/tst yields Ntst that is too small for
accurate weight estimates and/or Nval and Ntst that are too
small for precise error estimates.

Quote:
bootstrap: Is there any preferred choice between k-fold/loo and
bootstrap validation?

I prefer 10-fold XVAL and do not recommend LOO (has
a high variance).

Sometimes, in the spirit of bootstrapping, I add duplicate
copies of the training cases to the training set to obtain

Ntrn = N/Nval = 0.1*N/Ntst = 0.1*N

and, as explained previously, sometimes I obtain two
simultaneous error estimates: Errtst at the min of Errval
AND Errval at the min of Errtst.

Hope this helps.

Greg-
Back to top
Display posts from previous:   
   Shopping Podder - the Best of Computer Postings! Forum Index -> Computer Artificial Intelligence - Neural Nets  
Page 1 of 1
All times are GMT

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum