| View previous topic :: View next topic |
| Author |
Message |
Tim Frink Guest
|
Posted: Wed Nov 12, 2008 6:49 pm Post subject: Performance estimation for classification algorithm |
|
|
Hi,
I'm using supervised machine learning to classify my data. The
approach I use as classifier is a decision tree (but could by any
other)- After constructing an appropriate decision tree, I would like
to measure the model's performance. What are standard measures in the
domain of statistics and artificial intelligence domain to estimate
performance of a classification algorithm?
So far, I've used a leave-one-out cross validation (due to the small
number of examples in the learning set which is about 400) to evaluate
the accuracy (classification error), i.e. how many examples in the test set
were incorrectly predicted. However, I don't think that this is sufficient
for a reliable performance evaluation. What else should I measure?
I'm not sure if a significance test would provide helpful information.
In my text book, they use the significance test to compare two
different classification algorithm w.r.t. to their absolute error
(they determine by a cross validation). Can a significance test be
also exploited to make performance assumption about a single classifier?
If so, what hypothesis should be tested?
Thank you.
Regards,
Tim
[ comp.ai is moderated ... your article may take a while to appear. ] |
|
| |
|
Back to top |
Ted Dunning Guest
|
Posted: Fri Nov 14, 2008 2:31 am Post subject: Re: Performance estimation for classification algorithm |
|
|
On Nov 12, 4:49 am, Tim Frink <plfr...@yahoo.de> wrote:
| Quote: | ... constructing an appropriate decision tree, I would like
to measure the model's performance.
|
You are pretty much on the right track with your leave one out.
Most people would be happier with 10x1 cross validation, but leave one
out done well can be as good.
What you have to worry about, though, is duplicates or near duplicates
in your data set. Those can give you a very unrealistic estimate.
I have seen this problem, for instance, in news wire classification
tasks where many of the documents were small revisions.
| Quote: | So far, I've used a leave-one-out cross validation (due to the small
number of examples in the learning set which is about 400) to evaluate
the accuracy (classification error), i.e. how many examples in the test set
were incorrectly predicted.
|
You should not only compute estimate performance you should also
estimate the error bars on that estimate.
Something that is often not mentioned it that you may have some
symmetries or scaling invariance properties in your problem that will
allow you to inflate your data set by replicating data points using
these invariants. This can dramatically improve your classification
process.
With your small data set, I think you can also benefit from
alternative classifiers that are inherently robust against over-
training. Take a look at random forests or SVM or Bayesian logistic
regression, for instance.
| Quote: | I'm not sure if a significance test would provide helpful information.
In my text book, they use the significance test to compare two
different classification algorithm w.r.t. to their absolute error
(they determine by a cross validation).
|
You can do this, but I think it is better to just get good estimates
of the distribution of your performance estimate. Then you can do all
kinds of Monte-Carlo estimates about things like how likely one model
is to outperform all others by sampling from the performance
estimates. This is generally simpler than getting a non-controversial
significance test, especially since you are doing lots of exploratory
analysis in a data mining setting. In fact, you can even using the
raw cross-validation to get this estimate and that can take into
account the correlation of the learning algorithm performance.
So, in my book, it is critical to take the point of significance tests
seriously (you don't quite know the performance you will get on unseen
data), but the assumptions of significance tests are all about
frequentist sampling arguments and you are inherently violating those
assumptions with data mining. Also, interpretation of a significance
test can be difficult when you want to take actions such as selecting
one model of many. I prefer direct estimates of probabilities like P
(model 1 is at least 5% better than model 2). That kind of estimate
makes it much easier to explain results and motivate action.
Good luck. Post a summary fo your results!
[ comp.ai is moderated ... your article may take a while to appear. ] |
|
| |
|
Back to top |
Milind Joshi Guest
|
Posted: Wed Dec 03, 2008 12:57 am Post subject: Re: Performance estimation for classification algorithm |
|
|
Tim Frink wrote:
| Quote: | Hi,
I'm using supervised machine learning to classify my data. The
approach I use as classifier is a decision tree (but could by any
other)- After constructing an appropriate decision tree, I would like
to measure the model's performance. What are standard measures in the
domain of statistics and artificial intelligence domain to estimate
performance of a classification algorithm?
So far, I've used a leave-one-out cross validation (due to the small
number of examples in the learning set which is about 400) to evaluate
the accuracy (classification error), i.e. how many examples in the test set
were incorrectly predicted. However, I don't think that this is sufficient
for a reliable performance evaluation. What else should I measure?
I'm not sure if a significance test would provide helpful information.
In my text book, they use the significance test to compare two
different classification algorithm w.r.t. to their absolute error
(they determine by a cross validation). Can a significance test be
also exploited to make performance assumption about a single classifier?
If so, what hypothesis should be tested?
Thank you.
Regards,
Tim
|
Tim,
2 most common measures used to measure (not estimate) classification
performance are precision and recall.
http://en.wikipedia.org/wiki/Precision_and_recall
Precision and recall are a tad bit more complex to compute for
problems where you have more than 2 classes, but they are pretty good
starters. Of course, an assumption we make is that you do know the
classification categories and have done so manually before-hand, so
are able to compute those measures by comparing the results of your
classification algorithm against the correctly classified set. You
could also build precision-recall curves and look for the best set of
parameters.
Problem is that both precision and recall won't tell you much about
what to expect on an unseen test set, and also, by tuning your
classification to get the best possible values for precision or recall
may result in over-training, or a system that is unable to deal with
unseen different cases well enough. Think about over-training as "hard-
coding" your engine to produce the best possible results on your known
test set.
Generally, over-training is seen to be a bad thing, but I have seen
examples where an over-trained classification engine was the best bet.
So, the general idea is to get a "representative" training set, which
could be a challenge in many cases.
| Quote: | From my knowledge of the industry, the way it is done in the "field"
is to mostly rely on current measures as somehow related to |
reliability, and then go through continuous rounds of sample
collection and classifier re-training.
There are a few resources I have come across for estimation.
Reliability estimation of a statistical classifier
Pandu Ranga Rao Devarakota, Bruno Mirbach, Bjorn Ottersten
http://portal.acm.org/citation.cfm?id=1326360.1326437&jmp=abstract&coll=GUIDE&dl=GUIDE&CFID=12919427&CFTOKEN=32122383#abstract
Estimation of Classifier Performance
K. Fukunaga, R. R. Hayes
http://portal.acm.org/citation.cfm?id=69121.69127&coll=GUIDE&dl=GUIDE&CFID=12919427&CFTOKEN=32122383
Estimating the predictive accuracy of a classifier
BENSUSAN Hilan ; KALOUSIS Alexandros
http://cat.inist.fr/?aModele=afficheN&cpsidt=1020184
Classification performance prediction using parametric scattering
feature models
CHIANG Hung-Chih ; MOSES Randolph L. ; POTTER Lee C.
http://cat.inist.fr/?aModele=afficheN&cpsidt=781221
Improving the Practice of Classifier Performance Assessment
N. M. Adams D. J. Hand
http://neco.mitpress.org/cgi/content/abstract/12/2/305
Kernel Methods for Classification and Signal Separation.
Arthur Gretton
http://www.kyb.mpg.de/publication.html?publ=2224
Hopefully this helps.
Best Regards,
Milind Joshi
IDEA TECHNOSOFT INC.
http://www.ideatechnosoft.com
[ comp.ai is moderated ... your article may take a while to appear. ] |
|
| |
|
Back to top |
|