Back to blog

Powering Data Science: How DataOps Can Enrich Your Activities and Deliver More For the Business

#TrueDataOps is a valuable catalyst in helping organizations take their use of data science, artificial intelligence and machine learning to the next level, expanding the scope of analytics way beyond business intelligence.

Any company carrying out sophisticated analytics and data science is most likely a step ahead of its competition already. As the use of data science becomes more widespread, however, you can take further steps to ensure what you’re doing in data science, to meet business needs today, will continue evolving in scalable, predictable and stable ways.

In my experience, the more mature organizations in data science terms already use tools like Dataiku and DataRobot while the less mature ones do not. Those tools are about automating and streamlining stuff you’ve already been doing. On the other hand, I’ve been involved in projects that are about moving the organization itself into a place where it can start using and then develop data science to adapt to changing business requirements.

I use a number of levels to describe the different stages in an organization’s analytics evolution. You may recognize your own organization somewhere along the line.

 

Level 0

Level Zero, from a data science perspective, is about achieving some type of BI environment: an enriched platform of data that business analysts use to see, for example, key performance indicators, and helping the business make better decisions.

 

Level 1

Level One takes things a step further: the process of ‘proving’ data science works for you. This is about asking ‘Does data science work for our business, and how would we apply the predictions that we gain?’ – given you are not, say, a Facebook or a Google. You may hire one or more data scientists, you have a reasonable amount of data to play with, and want to know what data science can deliver. As such, you can draw on basic Python tools like Pandas, NumPy, SciPy or sklearn, which should be enough to solve basic issues and provide fundamental use cases. 

 

Level 2

Level Two is when things start getting sophisticated, with more technical people performing data science, and making it easier by building ‘pipelines’ to handle different stages of the process. These organizations will probably do their own Docker containers, with trained models built into the container; what goes into a production pipeline is the actual predictive type work that’s written out to a Snowflake table.

 

Level 3

Level Three is when you want to go further still, and so look to the aforementioned tools like DataRobot and Dataiku. However, there is a way to include true data science at every level: DataOps. 

 

Conclusion

On my travels, I often come across data scientists doing predictions on data sets they’ve built themselves. The problem is, not all data scientists are strong in SQL. So on the one side, you have data sets and KPIs being used by analysts in BI, looking historically, and on the other side you have data scientists projecting forward – but the data sets are not exactly the same. This means the predictions being made, once they get feedback, don’t necessarily tie-in together. Less DataOps, more data oops.

By contrast, a massive benefit of the #TrueDataOps approach is ensuring data is entirely consistent across the organization, managing and orchestrating that process on your behalf—which includes feeding into whatever data science tool you’re using, whether homegrown, R, basic Python tools, or more sophisticated approaches. #TrueDataOps provides that consistent pipeline from Level Zero BI through to Level Four full-on data science.

This means you gain consistency from a business reporting perspective as well as a predictive reporting perspective, which is extremely important. One thing missing from a lot of conversations around data science is ensuring your data science predictions are consistent with your BI reporting. This is a way to achieve that, and you don’t need to start from scratch. 

The #TrueDataOps concept ties it all together, bringing consistency of environment and visibility of data across production, development, QA, staging, test - and ultimately your data science models – providing full access to all the data that your people need. In short, you gain the power to see more, predict better, and ultimately take the business forward in smarter ways. 

 

A hugely experienced data scientist and author of The Enrichment Game: A Story About Making Data More Powerful, Cincinnati-based Doug ‘The Data Guy’ Needham is a Senior Solutions Architect with DataOps.live.