Aggregation for Predictive Modeling with Relational Data

  • Claudia Perlich
  • Foster Provost

Most data mining and modeling techniques have been developed for data represented as a single table, where every row is a feature vector that captures the characteristics of an observation.  However, data in most domains are not of this form and consist of multiple tables with several types of entities.  Such relational data are ubiquitous; both because of the large number of multi-table relational databases kept by businesses and government organizations, and because of the natural, linked nature of people, organizations, computers, and etc.  Relational data pose new challenges for modeling and data mining, including the exploration of related entities and the aggregation of information from multi-sets (“bags”) of related entities.