Aggregation-Based Feature Invention and Relational Concept Classes

  • Claudia Perlich
  • Foster Provost

Model induction from relational data requires aggregation of the values of attributes of related entities.  This paper makes three contributions to the study of relational learning.  (1) It presents a hierarchy of relational concepts of increasing complexity, using relational schema characteristics such as cardinality, and derives classes of aggregation operators that are needed to learn these concepts.  (2) Expanding one level of the hierarchy, it introduces new aggregation operators that model the distributions of the values to be aggregated and (for classification problems) the differences in these distributions by class.  (3) It demonstrates empirically on a noisy business domain that more-complex aggregation methods can increase generalization performance.  Constructing features using target-dependent aggregations can transform relational prediction tasks so that well-understood feature-vector-based modeling algorithms can be applied successfully.