Suspicion Scoring Based on Guilt-by-Association, Collective Inference, and Focused Data Access

  • Sofus Mcskassy
  • Foster Provost

We describe a guilt-by-association system that can be used to rank entities by their suspiciousness.  We demonstrate the algorithm on a suite of data sets generated by a terrorist world simulator developed under a DoD program.  The data sets consist of thousands of people and some known links between them.  We show that the system ranks truly malicious individuals highly, even if only relatively few are known to be malicious ex ante.  When used as a tool for identifying promising data-gathering opportunities, the system focuses on gathering more information about the most suspicious people and thereby increases the density of linkage in appropriate parts of the network.  We assess performance under conditions of noisy prior knowledge (score quality varies by data set under moderate noise), and whether augmenting the network with prior scores based on profiling information improves the scoring (it doesn’t). Although the level of performance reported here would not support direct action on all data sets, it does recommend the consideration of network-scoring techniques as a new source of evidence in decision making. For example, the system can operate on networks far larger and more complex than could be processed by a human analyst.