Machine Learning with Big Data: Challenges and Approaches

Author(s): Alexandra L'Heureux ; Katarina Grolinger ; Hany F. ElYamany ; Miriam Capretz
Publisher: IEEE - Institute of Electrical and Electronics Engineers, Inc.
Volume: PP
Page(s): 1
ISSN (Online): 2169-3536
DOI: 10.1109/ACCESS.2017.2696365



The Big Data revolution promises to transform how we live, work, and think by enabling process optimization, empowering insight discovery and improving decision-making. The realization of this grand potential relies on the ability to extract value from such massive data through data analytics; machine learning is at its core because of its ability to learn from data and provide data driven insights, decisions, and predictions. However, traditional machine learning approaches were developed in a different era and thus are based upon multiple assumptions, such as the dataset fitting entirely into memory, what unfortunately no longer holds true in this new context. These broken assumptions, together with the Big Data characteristics are creating obstacles for the traditional techniques. Consequently, this paper compiles, summarizes, and organizes machine learning challenges with Big Data. In contrast to other research that discusses challenges, this work highlights the cause-effect relationship by organizing challenges according to Big Data Vs or dimensions that instigated the issue: Volume, Velocity, Variety or Veracity. Moreover, emerging machine learning approaches and techniques are discussed in terms of how they are capable of handling the various challenges with the ultimate objective of helping practitioners select appropriate solutions for their use cases. Finally, a matrix relating the challenges and approaches is presented. Through this process, this study provides a perspective on the domain, identifies research gaps and opportunities, and provides a strong foundation and encouragement for further research in the field of machine learning with Big Data.