Data-Driven Investment Strategies for Peer-to-Peer Lending

  • Maxime Cohen
  • Daniel Guetta
  • Kevin Jiao
  • Foster Provost

We develop a number of data-driven investment strategies that demonstrate how machine learning and data analytics can be used to guide investments in peer-to-peer loans.  We detail the process starting with the acquisition of (real) data from a peer-to-peer lending platform all the way to the development and evaluation of investment strategies based on a variety of approaches.  We focus heavily on how to apply and evaluate the data science methods, and resulting strategies, in a real-world business setting.  The material presented in this article can be used by instructors who teach data science courses, at the undergraduate or graduate levels.  Importantly, we go beyond just evaluating predictive performance of models, to assess how well the strategies would actually perform, using real, publicly available data.  Our treatment is comprehensive and ranges from qualitative to technical, but is also modular—which gives instructors the flexibility to focus on specific parts of the case, depending on the topics they want to cover.  The learning concepts include: data cleaning and ingestion, classification/probability estimation modeling, regression modeling, analytical engineering, calibration curves, data leakage, evaluation of model performance, basic portfolio optimization, evaluation of investment strategies, and using Python for data science.