Is a weak data strategy sabotaging your analytics initiatives?
Artificial Intelligence and its applications are growing in visibility as early adopters publicize their successes. This surge in popularity is highlighting a major problem: leaky and inefficient data strategies.
80% of the world’s created data is stored by enterprises, but only 1% of that information is used to inform business decisions. The rest languishes in data silos or ages out of usefulness before it can be processed.
While these shortcomings aren’t new, the true extent of the problem becomes visible only when companies begin to integrate AI into their workflows. Suddenly they see conflicting data streams for one domain, a single unreliable source for another, and no way to quickly disperse information to the departments who need it.
A solid data strategy is the tool needed to smooth out these wrinkles.
However, 34% of marketers surveyed by the CMO Council last year admitted that their data strategy isn’t embraced throughout the leadership team. 24% have no data strategy at all. They may appoint a CDO but don’t take the additional step of outlining clear directives.
This oversight is the first step to failure for analytics initiatives.
Data science is a growing field with a lot of moving parts. There is a clear benefit to integrating it into enterprise workflows, but it can be difficult to know which practices are enterprise-ready and which will add needless complexity to your organization.
This guide will help you evaluate your current data strategy, assess how well it aligns with your corporate goals, and construct a data science plan that will propel your company to the head of its field.
Audit Your Data Strategy
Before anything else happens, conduct an audit of your current data strategy. The more thorough the audit, the more opportunity there is to spot wasted effort and lost opportunities. Use these guided questions to make sure you don’t miss anything vital.
What data do you have and how do you access it?
Identify every method you currently have of collecting information. This includes marketing data, website metrics, feedback collection and analysis procedures- essentially, anything that measures how your company is performing and how consumers are interacting with your brand.
The more detail you use here, the better. Don’t forget to include data from embedded analytics. Even if you haven’t begun adding analytics on your own, most websites that were built within the last decade will have at least some tools that track site activity.
A growing number of social media and marketing apps also offer embedded analytics. Take a look at exactly what is available.
- Are you using analytics tools to track your users’ activity? Which ones?
- Can you track an individual’s path through your website, or does your software only collect aggregate data from all users?
- Do you have defined Key Performance Indicators (KPIs)?
- How much say do you have in what metrics are tracked? Are you able to customize collection streams to highlight KPIs?
- Do you have a central dashboard for viewing and manipulating your data? If not, how many programs does a user have to log into when creating reports?
How is data stored?
Cloud storage is the future of data warehousing, but not everyone has made the switch yet. Some companies have in-house systems they’ve invested heavily in, and uploading everything to the cloud doesn’t make sense for them.
There are also “hybrid warehouses” where some incoming data flows get tagged for in-house storage while others are directed to the cloud. Figure out which system your organization uses.
- How much data do you save? Do you save everything or only data related to KPIs?
- Are your storage limits imposed by internal decisions or available space?
- What are your backup procedures? How can you recover your data in the event of a loss?
- Do you have an employee or contractor assigned to oversee your data storage?
- How will you know if your data storage fails or becomes corrupted?
- Who can see your data? What are the security procedures used to keep unauthorized users from using/altering data?
How is your data being used?
What are you doing with your collected data? This isn’t the place to gloss over gaps in current procedures.
Businesses who don’t effectively use their data will be losing $1.2 trillion to their competitors every year by 2020, so if there is room for improvement here it only benefits your organization to point it out.
- What is your data telling you about clients, market conditions, and workflow efficiency?
- Are you using AI techniques such as machine learning?
- Does your data generate actionable insights?
- Do you act on the insights generated by your data?
- How are you using data to optimize marketing, increase customer satisfaction, and improve internal processes?
Conduct a Needs Assessment
The needs assessment consists of two stages: planning solutions for errors found during the strategy audit and determining what emerging technologies can be most effectively integrated.
During the assessment some problems with the current data strategy should become apparent. It will be obvious if there is no backup, for example, or if huge amounts of data are currently being unused.
Other issues may take longer to realize. These tend to be workflow problems, workarounds that staff has created in order to function in a dysfunctional data environment.
The two most common are data silos and Shadow IT.
A data silo is a collection or storage system that is only accessible to one group within an organization. Drawing data from these systems for use in other places adds several extra layers of work for employees.
Data silos can be formed accidentally when the higher leadership is unaware of a resource one department has and how it can be applied to the company at large.
For this reason, CDOs and CIOs need to cultivate reciprocal relationships with their subordinate managers.
There should be a climate where those managers feel empowered to share their ideas and processes without being accused of “getting bogged down in details”.
This gives C-level execs the chance to see opportunities for applying those processes in other departments.
Every meeting doesn’t need to be a class on how a department runs; an in-depth update once or twice a month is generally sufficient to stay on top of developing procedures.
Sometimes, data silos are intentionally created by IT departments in the interest of security. Security versus flexibility is one of the greatest conflicts in data science.
It’s critical to ensure information (especially protected customer information and strategy-sensitive data) is kept from unauthorized use, but at the same time too many silos make completing even the basic tasks complicated.
Difficulties in accessing data needed to operate leads to the second most common “data dysfunction”: Shadow IT. This consists of any systems, applications, and procedures adopted by non-IT staff without IT consultation.
Despite its ominous name, Shadow IT isn’t caused by a desire to hurt the company.
Employees become frustrated with inefficient workflows or limited capabilities and act to “fix” those problems.
They install enterprise software (or sometimes write their own) that automates as much of their “housekeeping” tasks as possible in order to give themselves more time to focus on their primary jobs.
Allowing key decision makers to champion new technology has the benefit of increasing flexibility and offloading some of the IT workload. It isn’t without risk, however.
Unapproved software can expose the organization’s data to the security risks the IT manager created the data silo to avoid.
Also, without central coordination resources are wasted on redundant or conflicting software.
For more information about these hazards that IT face, read last month’s blog post: The Dangers of Shadow IT in Mobile App Development.
Evaluating new data science applications
How well is your current data strategy delivering results? If you aren’t using Artificial Intelligence-based analytics software, there’s room for improvement.
AI allows for faster analysis of unstructured data which makes up an estimated 80% of the world’s data.
Including AI in your data strategy is the first step to introducing it into your business strategy. For inspiration, here are some of the technologies being used by top level enterprises today.
Just as machine learning is an application of Artificial Intelligence, deep learning is an sub-discipline of machine learning.
Some industry publications describe it as an evolution, but this is misleading as machine learning is still a vibrant and growing field.
Both machine and deep learning teach software to make increasingly more accurate choices about data based on past experience with little to no human input.
Deep learning focuses more on the creation of deep neural networks: vast collections of data that help refine the program’s definitions of categories.
Imagine a company wants to design a program to screen customer images posted to a social media site for inappropriate content.
A series of initial algorithms is written to define what “inappropriate” means. The program uses these algorithms to approve or flag incoming content.
In the beginning, though, the software will make a lot of mistakes while it attempts to understand the provided instructions. Patterns of color and unusual body positions could cause false positives.
Deep learning shortens the training period by feeding an enormous amount of prepared data through the algorithm during the preparation phase.
Because the program can access a deep pool of pre-screened exemplars to check its results, it doesn’t have the same learning curve as machine learning processes that must construct their own models.
Results returned by deep learning algorithms reach usable levels of accuracy faster.
On an enterprise level, deep learning is most beneficial in cases where a company already has a store of sorted data to apply.
Current applications of the technology include predicting the outcome of legal cases, navigating self-driving cars and guidance systems for the visually impaired, automatically generating reports in response to unstructured triggers (such as the text of a complaint email), and providing more challenging virtual opponents for computer and video games.
Data mining and predictive analytics are often used interchangeably, though in reality data mining is a process that powers predictive analytics.
It’s one of the techniques that creates the framework that predictive analytics uses to generate its predictions. Modern applications of data mining use machine learning to refine their output.
There are different categories of data mining depending on the desired end state.
- Association is used to find connections between events (ie, customers look at these two websites during the same visit).
- Path analysis is the logical continuation of that process in which the typical order of events is defined (customers look at these FAQ pages before choosing the “Contact” page).
- Data clustering also groups data by proximity but without assuming causation (for unknown reasons, the most customers come from these six cities).
- Classification sorts data into classes based on differentiating factors (customers who have made a purchase in the past vs customers who only browse the site).
Data mining helps companies find previously unknown patterns in their data. Opportunities for growth are often overlooked because the data obscures them.
Marketing is the highest profile enterprise application of data mining (determining when and how to implement marketing campaigns) but other departments can take advantage of it as well.
For instance, data mining can serve as a virtual feedback panel for product designers.
Knowing what features of an app users interact with most and which are ignored helps plan updates and refine upcoming products to more accurately align with customer needs.
In a market where 86% of customers will pay more for a better experience, improving responsiveness is a significant competitive edge.
Gone are the days when companies could afford to wait for the quarterly sales report to evaluate their performance.
The growth of the online economy has created a constantly shifting environment with opportunities disappearing as quickly as they arise, and organizations without the ability to continually visualize their operational data will lose ground to their better-prepared competitors.
For this reason, streaming analytics is one of the most critical analytics technologies to include when building a data strategy.
Data from multiple sources are analyzed mid-stream before being directed to data warehouses. Executives can check a central dashboard with living graphs and charts, providing a real time snapshot of operations.
Streaming analytics can be used to set off alerts when certain KPIs pass relevant levels.
If response to a certain advertising campaign suddenly spikes in an area, management can assess whether the activity is positive or negative and react accordingly.
Fast and satisfying reactions are key to reducing the impact of errors on public relations.
Examples of this concept are all over the news.
United Airlines is still a source of ridicule for their weak response to misguided staff while American Airlines, who had a similarly publicized incident between a flight attendant and a passenger, was able to react in time to minimize the damage to their reputation.
Look at the difference in search results on each of these events:
The difference in media reporting on these two events makes a strong case for being able to quickly evaluate public response based on unstructured data.
In-house data management team vs. Outsourcing
How much of your analytics will be conducted in-house and what will be outsourced?
If you have unusual or very complex requirements, you will need a dedicated team including engineers, programmers, and at least one statistician.
With the rise of embedded analytics in enterprise software, however, investing in data science doesn’t necessarily require engineers and programmers.
Most businesses won’t need a high-science analytics team. A contractor can streamline and coordinate your analytics programs and recommend new software that is a good fit for your company.
Structuring Good Data Strategy
With all this information at hand, it’s time to create the data strategy itself. This can be the most contentious part of the process.
C-level executives with different areas of responsibility have different ideas about what the plan should look like and who should be responsible for its adoption and upkeep.
43% of enterprise leaders feel that getting everyone to agree to the same data strategy is too hard.
In truth, good data strategy prevents more problems than it causes. It removes the uncertainty around data management by outlining expectations of all involved parties.
After adoption of the data strategy different departments of the organization will find it much easier to rely on data coming from other branches since they have more trust in the collection and management procedures.
There is no standard template for enterprise data strategy. Your business is unique, even among your competitors, and what works for another company might not compliment your existing operations.
There are certain elements that every data strategy should address in order to be considered complete.
Before any data is collected, there needs to be a storage system in place. The main decision here is whether to build local storage or contract for cloud storage.
Local storage will often have faster connection speeds, and you will have complete control over functions such as backups and access control.
You can also manually disconnect local storage from the internet in case of a network attack. Setting up local storage comes with a large up-front investment, though. Also, you will have to arrange for maintenance and security personnel to protect that investment.
Cloud storage side-steps the costs of building and managing local servers. The provider handles maintenance and improvements as part of the cost, which is typically structured as a subscription.
It’s possible to purchase more storage as your business grows without being delayed by construction. Keeping data stored on the cloud protects it against on-site accidents, too.
These advantages come at the cost of less control over the details of data storage and a slightly slower connection speed.
The vast majority of businesses won’t notice an appreciable difference in connectivity between local and cloud storage, so for most people cloud storage is the best solution.
Collection and Exploitation
Although collection and exploitation are different domains, the rise of embedded analytics has tied them together.
An increasing number of products that used to simply collect data are now processing it as well, and few companies are willing to invest in new software that doesn’t include some form of analytics.
Planning for collection includes deciding what information you need to track. Be specific, but don’t feel the need to ration your KPIs. Data science needs data to work.
The more relevant information you have, the more ROI you can realize from data science programs. Don’t forget to address gaps in your data infrastructure revealed during the audit stage.
Exploitation covers everything from which embedded analytics programs will be utilized to the new data science applications you plan to adopt.
What do you want your data to do? What goals should your CDO be working towards?
While these will by nature be loosely defined, try to narrow it down more than “growth”. A better exploitation goal would be “increase growth in X market” or “improve the customer acquisition funnel”.
Describe your executive expectations for adoption of data science enterprise-wide. This section should include a detailed plan for how data should be disseminated throughout the company.
Incorporating data science into existing workflows is most efficiently done on a rolling phased basis.
That is, identify the first few steps to improving your data usage, then periodically reassess and add new steps as the old ones are completed.
Provide metrics to help managers assess their data science integration.
A data strategy should be dynamic as well as specific. Make sure there are guidelines for adjusting plans to fit new information, but don’t change requirements on a weekly basis.
Alterations to your strategy should always be data-driven and push towards well-defined goals.
Governance and security
Determine who owns each data asset within the company. Who is responsible for overseeing it? Who can make changes? Who can retrieve data? How will your data be protected from external malicious actors and internal negligence? What measures are in place to comply with relevant privacy laws or HIPAA regulations?
There’s an executive trend towards democratizing data so that it’s accessible by every department.
That provides an incredible amount of flexibility and encourages innovation on an individual level, but there are some security concerns involved.
Decide what level of access each category of employee will have based on what you deem an acceptable balance of risk.
Resist the urge to centralize your entire data governance to the CDO. Assign key data governors at each level of authority, all the way from the CDO to individual departments.
This is a good way to balance freedom of data versus security, in fact.
Each governor can assess their section’s need for specific data more easily than the CDO, and having everything flow through that local governor provides a measure of accountability.
Sound data strategy and the resulting increase in data utilization translates into profit: Fortune 1000 companies who increase their data utilization by a mere 10% can add up to $65 million to their net income.
By assessing and improving your company’s data strategy, you can position yourself to take advantage of new AI technologies and win a share of that increased profit.