Breaking down the barriers to data science

Florian Douetteau, CEO of AI and machine learning platform Dataiku, discusses how code-free environments are paving the way for new innovation business-wide.

  • Tuesday, 21st December 2021 Posted 3 years ago in by Phil Alsop

With the increasing adoption of data science, machine learning, and AI platforms, what often goes unreported is the update of broader business use of these platforms. It’s a misconception that they are just for data scientists and analytics teams. In fact, some of the transformative analytical-driven business results that come from these platforms are driven by marketing teams or finance departments. 

 

There are many approaches to helping companies manage and make use of their data so that business users can access it just as easily as data scientists. Perhaps the biggest and most successful move data platform providers have made is providing code-free tools that average non-coding employees are comfortable with. People in all positions across organisations are leveraging data (via data exploration, visualisation, and more) to answer business questions in a way that doesn’t need to involve code. 

 

Some organisations call these citizen data scientists, but the reality is that they are simply data and business-savvy people who are not formally trained data scientists, but who recognise the power and potential in breaking down data silos and democratising data access across teams.

 

It’s great news for the global data science platform market, but how can we ensure that we don’t drop the ball on this one, and how can we continue to grow the number of business people working with data every day, especially in more traditional (non-digital native) companies that don’t necessarily have the same resources to hire data talent as the Fortune 500? The answer lies both in understanding how technology can help, and implementing a truly inclusive AI strategy. 

 

A Common Ground for Data Experts and Explorers

Smart data ingestion, processing dates and times, clearing complex text fields, combining datasets - even creating new machine learning models - these are all examples of tasks that can be done code free on many platforms. 

 

Does it mean that no one will learn to code anymore? Of course not, but it does mean that the work of non-data scientists can be incorporated in data science projects in meaningful ways. 

 

It’s the beginning of a fundamental shift in mindset around data tooling, and with code-free tooling evolving rapidly, we’ll continue to see a bigger breadth of people that have access to data to work with it on a day-to-day basis. 

 

AutoML and Augmented Analytics 

Low code or code-free also means that data scientists can focus on building cool things, and not spending hours of maintenance and days of work just to ensure everything is kept running. Most data science teams will have a real mix of people: some will want to use a more code-free approach, and others will want to dive into code. Today’s platforms should offer enough flexibility to let people whip up a quick model, or get very involved ‘under the hood.’ 

 

AutoML, which has long since been touted as the ‘future of AI’ is helping to provide this flexibility. In fact, a few years ago Gartner estimated that by 2025, 50% of data scientist activities will be automated by AI, easing the acute talent shortage.”

 

The rapid acceleration of AutoML has spurred the application of automation to the whole data-to-insights pipeline, from cleaning the data to tuning algorithms through feature selection and feature creation to operationalisation. At this larger scale, paired with an increasing volume of data, AutoML is producing more insights in less time. 

 

At a very high level, AutoML is about using machine learning techniques to automatically do machine learning. Yet AutoML can have a broader scope. Its development has spurred the application of automation to the whole data-to-insights pipeline, from cleaning the data to tuning algorithms through feature selection and feature creation, even operationalisation.

 

It’s also expanding the capabilities of the citizen data scientists, and allowing entire companies to design and implement machine learning models while easily scaling them out in production without failure or interruptions, with complete transparency. 

 

Inclusive AI

One of the most important parts of data democratisation in an organisation is ensuring an inclusive AI strategy is in place. 

 

Inclusive AI means not restricting the use of data or AI systems to specific teams or roles, but rather equipping and empowering everyone at the company to make day-to-day decisions, as well as larger process changes, with data at the core. The more people are involved in AI processes, more often than not, the better the outcome.

It also means integrated documentation and knowledge sharing for increased communication around bias, responsibility, interpretability, and model fairness.

Ultimately, crafting an AI strategy that is inclusive allows for the democratisation of data use across entire companies, lines of business, and profiles – technical or non-technical. This is the key to unlocking everyday AI.