Fig. 1: Word Cloud from the answers to “Main developments in Data Science and Analytics in 2018 and key trends in 2019”
In order to answer this question, we must look at how Data Science began:
How Did Data Science Begin?
The term “Data Science” was coined at O’Reilly’s Strata Conference in 2011. For the next few years, it was unclear where the field of data science was headed; however, in the past five years, there has been tremendous change and improvement in the structuring of the field. When Kaggle, the popular platform for data scientists, started in 2010, people from several fields including analytics, bioinformatics, machine learning, statistics, and econometrics joined the community to help solve business problems that helped companies derive value and increase revenue in the long-term. As more problems were being solved in this manner, business started seeing the value of having data science teams in their organization—and since then, the field has taken off into astronomical growth.
What is the Data Science Process?
Data Science has earned its success so much that when businesses start to approach a problem, the first few approaches will most likely involve looking at the data and asking, “what does data science say”? One of the biggest accomplishments in the field has been the successful implementation of the end-to-end Data Science Process. There have been several attempts to create such a standard process for data science, e.g., CRISP-DM (Cross Industry Standard Process for Data Mining) and TDSP (Team Data Science Project). In general, they begin with a business question and end with the implementation of the model.
Business Understanding – Business component and the objective is to determine, understand and map the problem
Data Preparation – Business component with the goal of identifying, collecting, assessing and vectorizing data
Data Munging – Statistical component using descriptive statistics and correlation analysis (among other tools) and featuring reduction techniques
Model Training – Statistical component that can involve trying more than one machine learning technique
Model Evaluation – Statistical component assessing and presenting model performance (e.g. Tri-fold Partition)
Model Deployment – Computer Science component applying reproducibility, model documentation, and publishing model as a web service
Model Tracking – Computer Science component monitoring, maintaining, and testing end product
Some Insights and Predictions into the Future of Data Science:
1. Sub-disciplines in Data Science will evolve:
With the overwhelming growth and boom in data in the world, data science will need to be branched into clear sub-disciplines in order to keep up with emerging technologies and problems that are yet to come. Sub-disciplines of Data Science include Data Visualization, Data Storytelling, Data mining, Data engineering, Machine learning, Deep Learning, Artificial Intelligence, etc.
2. Diving Deeper into Artificial Intelligence:
2018 saw an explosion into the field of artificial intelligence and much curiosity and interest in neural networks and machine learning. There was an increase in the implementation of data science in the health industry.
3. Big Data and Data Science-Driven by Business:
Business leaders will become proficient in identifying when data science initiatives and big data will come in handy and can drive business outcomes.
4. Data Ethics & Privacy:
When working with a copious amount of data, there will be an increased focus on handling the data ethically (especially after last year’s several data breaches and mishandling personal information). As more information becomes available and globally integrated, this will become a concern to all including individuals, companies, and governments.