Active Inference and Learning for Classifying Streams

  • Josh Attenberg
  • Foster Provost

In this position paper we introduce Active Inference, a paradigm for intelligently requesting human labels for inference and learning in situations with a finite budget for applying human resources for labeling cases.  Many machine learning systems are applied to a stream of instances that can repeat, such as queries entered in a search engine or web pages for potential ad impressions.  When a particular instance x can be subject to classification more than once, we have an additional complication to the budgeted learning setting.  In such applications, frequently the distributions will be non-uniform; for instance, in the above applications the distributions p (x) over examples are highly skewed and thus a few x’s result in a large percentage of the actual cases for prediction.  In such settings, it may be beneficial to allocate a human “labeling” budget selectively perform direct inference, requesting human labels on a selected subset of the instances to be provided to an end system in an effort to reduce misclassification cost on the x’s with the highest expected utility.  In estimating the utility of labeling a particular x, one must consider three factors: misclassification cost, the probability of encountering x, p (x), and the value x and its associated label may bring for (active) learning.  We will discuss the illustrative application of machine learning for safe advertising, where there is a limited budget for acquiring ground-truth labels for labeling web-pages.