Six common mistakes in data preparation

By Adam Wilson, CEO, Trifacta.

  • Friday, 28th September 2018 Posted 6 years ago in by Phil Alsop
Data and analytics continue to be a number one investment priority for CIOs. Time and time again, research has shown that big data, advanced analytics and artificial intelligence have the potential to dramatically improve an organisation’s performance.

According to Gartner, businesses that embrace data and analytics at a transformational level enjoy increased agility, better integration with partners and suppliers, and easier use of advanced predictive and prescriptive forms of analytics. This all translates to competitive advantage and differentiation.

But according to recent research amongst data professionals and my own observations working closely with global organisations trying to use data to their best advantage, many fundamental mistakes happen very early on in the data preparation stages which hold progress back. Fortunately though, these issues can be easily overcome…

 1.        Spending excess time on data preparation – New figures suggest that 60 percent of IT professionals spend more than half of their time at work on data quality assurance, clean-up or preparation. Based upon Glassdoor salary estimates and IDC's estimation there are 18 million IT operations and management professionals globally, meaning organisations are spending more than $450 billion on data preparation. That is a lot of resource to spend before proving the value of the project. Instead, organisations should explore more cost effective and efficient ways to reverse this pattern. Intelligent  data preparation platforms are one method.

 2.        Over-reliance on IT departments – Many data and analytics teams rely heavily on their IT department to source the data they need to run their projects. To be exact, fifty-nine per cent of data analysts say they are dependent on IT resources to prepare or access data. Given IT departments are often focused on anything from keeping the day-to-day operations running, through to legislations and to launching new products and services; this can cause significant delays to what should be fast-moving projects. Eighty-two per cent of analysts believe they would be able to drive increased value from their analysis projects with a decreased dependency on IT. To overcome this issue, big data solutions must become more ‘people friendly’ so that they can be used by non-IT experts across different lines of business.

3.         Preparing data without context of the use case – An in-depth understanding of the business use case at the preparation stage for any analytics initiative is crucial. This is another challenge when outsourcing data requirements to IT; while they have the technical capabilities, they often lack context and details to identify relevant information about the data. Without doing so, an organisation can spend untold cycles in an attempt to achieve the best iternation of  data required to make a project successful. Knowing upfront what is important to a particular use case means a business can maximise the outcome of the analysis.

 4.        Involving data scientists in preparation – We should keep front of mind that data scientists are highly-trained powerhouses doing complicated work which generates real value. The average salary for a data scientist in London is around ?65,000 per annum, which makes them a precious commodity to be used strategically. However, they can spend the bulk of their time preparing data instead of the complex work they were hired to do. Sixty per cent of IT professionals rightly consider themselves overqualified to be spending a large proportion of their time preparing data. Many of them go on to explain that their time would be better spent modelling, finding insights or designing programmes, but until their time is freed up from the pain of data preparation, how can this be possible?

 

5.  Running preparation processes manually – Manual data preparation tools like Microsoft Excel can hinder collaboration and efficiency, but remain popular among analysts and IT professionals alike: 37 per cent of data analysts and 30 percent of IT professionals use it more than other tools to prepare data. This reliance on manually driven data preparation tools will continue to delay data initiatives and deter new insights. It is a no-brainer. Organisations like financial services provider Deutsche B?rse have explored how data preparation platforms accelerate these processes to fast-track new data-led product development.

 6.        Overlooking data quality issues –  When preparing data, it’s essential to ensure that the outcome is consistent, conforms, complete and current. Teams should be checking every data set against these 4 C’s of data quality and identify any issues early, and often. Remediating issues of data quality can significant impact on the end analysis. For example, marketing lead data is far more valuable when it has been enriched with external data to complete missing values. Or  consider the difference of outdated versus up-to-date data when predicting sales and calculating  margins. In both instances, improperly prepared data can have a huge impact.

Data preparation still proves to be the biggest bottleneck for the majority of organizations, costing billions of dollars and slowing down time to insight. Many of the simple mistakes holding businesses back can be overcome quickly, by exploring more cost-effective, efficient and insightful ways to approach data preparation.