Well-Trained PETs: Improving Probability Estimation Trees

  • Pedro Domingos
  • Foster Provost

Decision trees are one of the most effective and widely used classification methods.  However, many applications require class probability estimates and probability estimation trees (PETs) have the same attractive features as classification trees (e.g., comprehensibility, accuracy and efficiency in high dimensions and on large data sets).  Unfortunately, decision trees have been found to provide poor stability estimates.  Several techniques have been proposed to build more accurate PETs, but, to our knowledge, there has not been a systematic experimental analysis of which techniques actually improve the probability estimates, and by how much.  In this paper we fist discuss why the decision-tree representation is not intrinsically inadequate for probability estimation.  Inaccurate probabilities are partially the result of decision-tree induction algorithms that focus on maximizing classification accuracy and minimizing tree size (for example visa reduced-error pruning).  Larger trees can be better for probability estimation, even if the extra size is superfluous for accuracy maximization.  We then present the result of a comprehensive set of experiments, testing a variety of different methods for improving PETs.  The results show, somewhat surprisingly, that alternative pruning methods do not improve the probabilities.  In contrast, the experiments show that using a simple, common smoothing method – the Laplace correction – uniformly improves probability estimates.  In addition, bagging substantially improves probability estimates and is even more effective for this purpose than for improving accuracy.  We conclude that PETs, with these simple modifications, should be considered when class probability estimates are required.