Active Feature-Value Acquisition

  • Prem Melville
  • Foster Provost
  • Maytal Saar-Tsechansky

Most induction algorithms for building predictive models take as input training data in the form of feature vectors.  Acquiring the values of features may be costly, and simply acquiring all values may be wasteful or prohibitively expensive.  Active feature-value acquisition (AFA) selects features incrementally in an attempt to improve the predictive model most cost-effectively.  This paper presents a framework for AFA based on estimating information value.  Although straightforward in principle, estimations and approximations must be made to apply the framework in practice.  We present an acquisition policy, sampled expected utility (SEU), that employs particular estimations to enable effective ranking of potential acquisitions in settings where relatively little information is available about the underlying domain.  We then present experimental results showing that, compared with the policy of using representative sampling for feature acquisition, SEU reduces the cost of producing a model of a desired accuracy and exhibits consistent performance across domains.  We also extend the framework to a more general modeling setting in which feature values as well as class labels are missing and are costly to acquire.