The invention of the Light bulb has a powerful efficiency story. Long before Edison patented his commercial light bulb, scientists were demonstrating that electricity can create light. Its early prototypes were consuming lots of power and producing low light. Today we can use thirty times more efficient light bulbs, and it’s counting up to its natural limits. By 2035, this increase will help the US save Two times more energy than what solar power will produce. Significant savings, right? So, increased technological efficiency has the potential to help humanity. What about the efficiency of our business data? (1)
We are creating a significant amount of data every year (It’s expected to be 175,8 ZB in 2025 (2)). It is enormous but not surprising; The bigger question is how we will benefit from its potential.
The Rethink Data survey in Seagate’s report indicates that 39% of the Enterprises think that “Making collected data usable” is a barrier to leveraging data’s full potential. (3) The same study shows that Enterprises capture 56% of all their potential data. And only 57% of this data is used in the organization. So, we already lost half of the total available data because we could not capture it, and we lose half of the remaining because we don’t use it. Enterprises use just 32% of the total data potential. Can we assume that this 32% is being used correctly? Are we using it with 100% efficiency?
Based on the amount of control exercised over the data available, organizations can be grouped into different types.
Some of the organizations store their raw data and wait to use it someday.
Some others transform their data and convert them into a structured format.
Some of those organizations use their data to learn what happened yesterday or month to date. Some organizations ask “Why” to their data and try to extract reasons from it.
Some of them use the old but gold technique of regression analysis or the new and shiny machine learning algorithms to their data and try to predict what will happen at the end of the year.
And very few organizations try to understand how they can structure an autonomous analytics platform to do all of the analysis to support the decision-making process.
Why does your organization collects data? My organization would collect data to learn patterns in the business, find the reasons for any unexpected changes, or check if its actual numbers match the financial plans. To use your data in your decision-making process, you need to be sure that your data is accurate, right? Accurate and complete, consistent, unique, and timely (IIBA – Guide to Business Data Analytics).
An autonomous analytics platform can detect an opportunity as soon as it sees it (4). But to see it, these details should be readily available in the dataset. An Autonomous system studies patterns and trends. Let’s say there is a new campaign or a new carrier for your logistics division. It can let you know if the difference is statistically significant. But what if there is a problem behind this particular metric? Let’s look at an example situation below
Let’s say you have two marketing campaigns, and you want to test which one of them is performing better. You published both marketing campaigns on the three most popular social media platforms. You checked your executive dashboard in the morning and saw that Campaign B was more attractive than Campaign A. Campaign B had a higher click-through rate than Campaign A (Campaign B has 7.10%, but Campaign A has 6.2%). You might feel that Campaign B is a clear winner but let’s take a deeper look. On analyzing individual campaign levels, you look at the data points as shown below:
Impression A | Impression B | Click A | Click B | Click Tr. Rate A | Click Tr. Rate B | |
Social Media 1 | 8,000 | 1,000 | 450 | 50 | 5.63% | 5.00% |
Social Media 2 | 1,000 | 950 | 70 | 60 | 7.00% | 6.32% |
Social Media 3 | 1,000 | 8,050 | 100 | 600 | 10.00% | 7.45% |
Total | 10,000 | 10,000 | 620 | 710 | 6.20% | 7.10% |
Probably, after you drill down to the social media platform level, you wouldn’t push for Campaign B. Because you would notice that even though aggregated CTR is higher for Campaign B, Campaign A looks like a better-performing option for individual platforms.
In my opinion, you must demand the most granular breakdown from your report. For example, to apply a better marketing strategy, you might need geographic, demographic, psychographic and, behavioural segmentation.
Social Media 1/Age Groups Drill-Down
Impression A | Impression B | Click A | Click B | Click Tr. Rate A | Click Tr. Rate B | |
18-24 | 7,450 | 50 | 400 | 3 | 5.37% | 6.00% |
25-45 | 500 | 400 | 47 | 10 | 9.40% | 2.50% |
46-70 | 50 | 550 | 3 | 37 | 6.00% | 6.73% |
Total | 8,000 | 1,000 | 450 | 50 | 5.63% | 5.00% |
The first drill-down that showed you the aggregated values could mislead. What about the second level of drill-down? As you see here, at some level, aggregated data still mislead us. This result says there is no single truth about the campaign comparison. Maybe you should apply each campaign for the age group for which it is working well. But, be sure that geographic drill-down can continue to surprise you. Of course, I exaggerate the numbers here, but a similar situation can be seen in the real world. Actually, a campaign can work well in a particular province and a specific age group in the real world. But, sometimes, it is because of something entirely out of our plans and assumptions.
This pattern is not specific to marketing data. We can generate more examples with the same paradox (Simpson’s Paradox) for logistics data (damage rate for Carrier A/B, route, product type, packaging type), merchandising data (Average Lead time for Supplier A/B, warehouse locations, products, season) so on and so forth.
Consider that your organization is lucky to have this granular level of details. Let’s say you have 100 products, and your KPIs are Revenue and Order Count (Just two). Let’s use the example above. You have two campaigns, and you want to get data related to age groups and social media channels. As a basic calculation, you will have. 100 Products X 2 KPIs X 3 Age Groups X 3 Social Media channels X 2 Campigns = 3,600 Variations. Let’s not forget that you will also have more criteria if you have more KPIs and more products. Anodot’s article gives an example with more criteria.
When there is a lot of detail, monitoring and detecting problems and identifying opportunities can be very challenging. No human has the capacity to process all these details in real-time, but an autonomous analysis system can. However, an autonomous system is not a magic wand. Its fuel is more granular and high-quality data. In your industry, one or more of your competitors might have started to collect more granular data and are feeding it into their systems. Every little opportunity that they catch will bring them an increased market share.
An enterprise has to implement its data strategy with a consideration of the long-term goals. Setting and maintaining sophisticated data analytics systems like autonomous analytics requires a well-planned data strategy. Carefully planning initial data strategy answers whether they need a centralized, decentralized or hybrid data analytics department in their organizations. (5)
Talent requirements will become apparent after the long-term goal definition. This plan prevents organizations from failing in their first data science projects. Because they hire a data scientist to work on machine learning and AI areas of the project but didn’t consider other roles like data engineers, data analysts, business analysts, BI analysts, and BI developers. The next step will be planning the data architecture, governance, metadata management and data security. Even if your organization has all these in practice, ensuring they are ready for advanced analytics and data science implementations will keep your organization a step ahead.
Acknowledgements
In writing this post, I would like to thank Satrujeet Rath for the proofreading and revision suggestions.
Sources: