Scalable Supervised Dimensionality Reduction using Clustering

  • Brian Dalessandro
  • Claudia Perlich
  • Foster Provost
  • Troy Raeder
  • Ori Stitelman

The automated targeting of online display ads at scale requires the simultaneous evaluation of a single prospect against many independent models.  When deciding which ad to show to a user, one must calculate likelihood-to convert scores for that user across all potential advertisers in the system.  For modern machine-learning-based targeting, as conducted by Media6Degrees (m6d), this can mean scoring against thousands of models in a large, sparse feature space.  Dimensionality reduction within this space is useful, as it decreases scoring time and model storage requirements.  To meet this need, we develop a novel algorithm for scalable supervised dimensionality reduction across hundreds of simultaneous classification tasks.  The algorithm performs hierarchical clustering in the space of model parameters from historical models in order to collapse related features into a single dimension.  This allows us to implicitly incorporate feature and label data across all tasks without operating directly in a massive space.  We present experimental results showing that for this task our algorithm outperforms other popular dimensionality-reduction algorithms across a wide variety of ad campaigns, as well as production results that showcase its performance in practice.