• Venkateswarlu Kolluri
  • Foster Provost

One of the defining challenges for the KDD research community is scaling up data mining algorithms to mine very large collections of data.  This article summarizes, categorizes, and compares existing work on scaling up data mining algorithms.  In order to provide focus and specific details, we concentrate on algorithms that build decision trees and rule sets; the issues and techniques generalize to other types of data mining.  We discuss the important issues related to scaling up and highlight similarities among scaling techniques by categorizing them into three main approaches.  We describe in detail the characteristic features of each category, using specific examples as needed, and we compare and contrast different constituent techniques.