Relational Learning Problems and Simple Models

  • Claudia Perlich
  • Sofus Macskassy
  • Foster Provost
In recent years, we have seen remarkable advances in algorithms for relational learning, especially statistically based algorithms.  These algorithms have been developed in a wide variety of different research fields and problem settings.  It is important scientifically to understand the strengths, weaknesses, and applicability of the various methods.  However, we are stymied by a lack of a common framework for characterizing relational learning.
What are the dimensions along which relational learning problems and potential solutions should be characterized?  Jensen (1998) outlined dimensions that are applicable to relational learning, including various measures of size, interconnectivity and variety; items to be characterized include the data, the (true) model, the background knowledge, and so on.  Additionally, individual research papers will characterize aspects of relational learning that they are considering and are ignoring.  However, there are few studies or even position papers that examine various methods, contrasting them along common dimensions (one notable exception being the paper by Jensen and Neville (2002b)).
It also is not clear whether straightforward measures of size, interconnectivity, or variety will be the best dimensions.  In this paper we argue that other sorts of dimensions are at least as important. In particular, the aforementioned dimensions characterize the learning problem (i.e., the training data and the true model).  Equally important are characteristics of the context for using the learned model—which have important implications for learning. For illustration, let us discuss three context characteristics, and their implications for studying relational learning algorithms.