Intelligent Information Triage

  • Vasant Dhar
  • Haym Hirsh
  • Sofus Macskassy
  • Foster Provost
  • Ramesh Sankaranarayanan

In many applications, large volumes of time-sensitive textual information require triage: rapid, approximate prioritization for subsequent action.  In this paper, we explore the use of prospective indications of the importance of a time-sensitive document, for the purpose of producing better document filtering or ranking.  By prospective, we mean importance that could be assessed by actions that occur in the future.  For example, a news story may be assessed (retrospectively) as being important, based on events that occurred after the story appeared, such as a stock price plummeting or the issuance of many follow-up stories.  If a system could anticipate (prospectively) such occurrences, it could provide a timely indication of importance.  Clearly, perfect prescience is impossible.  However, sometimes there is sufficient correlation between the content of an information item and the events that occur subsequently.  We describe a process for creating and evaluating approximate information-triage procedures that are based on prospective indications.  Unlike many information retrieval applications for which document labeling is a laborious, manual process, for many prospective criteria it is possible to build very large, labeled, training corpora automatically.  Such corpora can be used to train text classification procedures that will predict the (prospective) importance of each document.  This paper illustrates the process with two case studies, demonstrating the ability to predict whether a news story will be followed by many, very similar news stories, and also whether the stock price of one or more companies associated with a news story will move significantly following the appearance of that story.  We conclude by discussing how the comprehensibility of the learned classifiers can be critical to success.