Predicting Citation Rates for Physics Papers: Constructing Features for an Ordered Probit Model

  • Sofus Macskassy
  • Claudia Perlich
  • Foster Provost

Gehrke et al. introduce the citation prediction task in their paper “Overview of the KDD Cup 2003” (in this issue).  The objective was to predict the change in the number of citations a paper will receive-not the absolute number of citations.  There are obvious factors affecting the number of citations including the quality and the topic of the paper, and the reputation of the authors.  However it is not clear which factors might influence the change in citations between quarters, rendering the construction of predictive features a challenging task.  A high quality and timely paper will be cited more often than a lower quality paper, but that does not suggest the change in citation counts.  The selection of training data was critical, as the evaluation would only be on papers that received more than 5 citations in the quarter following the submission of results.  After considering several modeling approaches, we used a modified version of an ordered probit model.  We describe each of these steps in turn.