On Applied Research in Machine Learning

  • Ron Kohavi
  • Foster Provost

Common arguments for including applications papers in the Machine Learning literature are often based on the papers’ value for advertising success stories and for morale boosting.  For example, high-profile applications can help to secure funding for future research and can help to attract high caliber students.  However, there is another reason why such papers are of value to the field, which is, arguably, even more vital.  Application papers are essential in order for Machine Learning to remain a viable science.  They focus research on important unsolved problems that currently restrict the practical applicability of machine learning methods.

Much of the science” of Machine Learning is a science of engineering.1  By this we mean that it is dedicated to creating and compiling verifiable knowledge related to the design and construction of artifacts.  The scientific knowledge comprises theoretical arguments, observational categorizations, empirical studies, and practical demonstrations.  The artifacts are computer programs that use data to build models that are practically or theoretically useful.  Because the objects of study are intended to have practical utility, it is essential for research activities to be focused (in part) on the elimination of obstacles that impede their practical application.

Most often these obstacles take the form of restrictive simplifying assumptions commonly made in research.  Consider as an example the assumption, common in classifier er learning research, that misclassi fication errors have equal costs.  The vast majority of classifier learning research in Machine Learning has been conducted under this assumption, through the use of classification accuracy as the primary (or sole) evaluation metric.  Is this a reasonable assumption under which we should be operating?  The answer is unclear.  It is difficult to imagine a real-world classification problem where error costs are equal, and researchers come in from the field time after time citing problems dealing with unequal misclassification costs.  Nevertheless, we continue to press on with research on increasing classification accuracy.  In the Machine Learning literature isolated studies suggest that it is possible to weaken this assumption and still learn effectively (Turney 1997), but there have been no comprehensive studies.

This is but one small example of a common simplifying assumption that may be too strong.  Of course it is not clear that even a very solid applications paper pointing out the in-applicability of this assumption would be sufficient to convince the field to shift its scientific paradigm (Kuhn 1970).  In fact, with respect to this particular example, it seems that research trails practice: commercial tools are now available that can be trained with sensitivity to error costs, even though the Machine Learning literature has not addressed how to do so well.  However, if application-oriented papers were common in the Machine Learning literature, and many of them cited a particular assumption as being too strong, then one would hope that there would be sufficient pressure to study its applicability in greater
detail.