Data Science and its Relationship to Big Data and Data-Driven Decision Making

  • Tom Fawcett
  • Foster Provost

Companies have realized they need to hire data scientists, academic institutions are scrambling to put together data science programs, and publications are touting data science as a hot—even ‘‘sexy’’—career choice.  However, there is confusion about what exactly data science is, and this confusion could lead to disillusionment as the concept diffuses into meaningless buzz.  In this article, we argue that there are good reasons why it has been hard to pin down exactly what is data science.  One reason is that data science is intricately intertwined with other important concepts also of growing importance, such as big data and data-driven decision making.  Another reason is the natural tendency to associate what a practitioner does with the definition of the practitioner’s field; this can result in overlooking the fundamentals of the field.  We believe that trying to define the boundaries of data science precisely is not of the utmost importance.  We can debate the boundaries of the field in an academic setting, but in order for data science to serve business effectively, it is important (i) to understand its relationships to other important related concepts, and (ii) to begin to identify the fundamental principles underlying data science.  Once we embrace (ii), we can much better understand and explain exactly what data science has to offer.  Furthermore, only once we embrace (ii) should we be comfortable calling it data science.  In this article, we present a perspective that addresses all these concepts.  We close by offering, as examples, a partial list of fundamental principles underlying data science.