Because the knowledge discovery process is ill-defined, iterative, and requires intense interaction, algorithm flexibility is crucial. In this paper, we present a straight forward, heuristic generate-and-test search algorithm for knowledge discovery. An analysis of the literature shows that this basic algorithm underlies many of the systems that have had practical success in data mining and knowledge discovery over the past twenty years. We argue that this search algorithm has persevered because it is flexible and well behaved as background knowledge is introduced in various forms—exactly what is needed to support the ill-defined knowledge discovery process. We illustrate this by showing how the basic algorithm can incorporate background knowledge implicitly, via a variety of” interestingness” criteria. We then show that the same basic algorithm applies to an extended representation including explicit background knowledge. We discuss the trade off between efficiency and expressiveness, and show how to speed up mining in the presence of explicit background knowledge. We conclude that this rule-space search algorithm is a good choice for supporting research into the rest of the knowledge discovery process, and argue that it sets the stage well for increased involvement of information systems researchers.
© Copyright 2024 Foster Provost. All Rights Reserved.