Separating Machine Learning from Data Science

machine learning vs data science

The enterprise applications of machine learning are weaving themselves into the fabric of everyday business.

Still, the concept itself is hazily understood.

Over the last month we have shared posts intended to clear up the confusion between machine learning and other related topics like predictive analytics.

This article continues that trend by tackling one of the least helpful misapplications: when machine learning and data science are mistaken for each other.

Laying the Groundwork

Machine learning is a branch of artificial intelligence where, instead of writing a specific formula to produce a desired outcome, an algorithm “learns” the model through trial and error.

It uses what it learns to refine itself as new data becomes available.

Data Science is an umbrella term that includes everything needed to extract meaningful insights from data (gathering, scrubbing and preparing, analyzing, forming predictions) in order to answer questions or make predictions.

It includes areas like:

  • Data mining: The process of examining large amounts of data to find meaningful patterns
  • Data scrubbing: Finding and correcting incomplete, unformatted, or otherwise flawed data within a database
  • ETL (Extract, Transform, Load): a collective term for the process of pulling data from one database and importing it into another
  • Statistics: Collecting and analyzing large amount of numerical data, particularly to establish the quantifiable likelihood of a given occurrence
  • Data visualization: Presenting data in a visual format (charts, graphs, etc) to make it easier to understand and spot patterns
  • Analytics: A multidisciplinary field that revolves around the systematic analysis of data

What Falls Under the “Data Science Umbrella?”

“Data scientists are kind of like the new Renaissance folks, because data science is inherently multidisciplinary.”

Those words from John Foreman, MailChimp’s VP of Product Management, sum up the problem with trying to draw the boundaries of data science.

It’s a vast concept, describing intent more than a specific discipline.

There are, however, four fields generally agreed to cover the majority of data science where they intersect: mathematics, computer science, domain expertise, and communications.

  • Mathematics: Mathematics forms the core of data science. Data scientists need to know enough math to choose and refine the models they use in analysis, especially if they plan to work in machine learning. Understanding the math behind their formulas gives them the ability spot errors and weigh the significance of results.Also, while there are some data points that can be easily read without a heavy math background (conversions, website views, engagement rates, etc), others require specialized knowledge to understand. For example, time series data is very common in business intelligence but hard for casual users to interpret.Mathematical subdisciplines often studied by data scientists include:
    • Statistics (including multivariate testing, cross-validation, probability)
    • Linear Algebra
    • Calculus
  • Computer science: Data science may be older than computers, but the powerful effect of the digital revolution can’t be denied. Computers let data scientists process vast amounts of data and perform incredibly complex calculations at a speed that allows data to be used within a reasonable timeframe.Some of the areas where computer science intersects with data science:
    • System design optimization
    • Cleaning/scrubbing data
    • Graph theory and distributed architectures
    • Programming databases
    • Artificial Intelligence and machine learning
  • Domain knowledge: Data science is a targeted practice. It’s used to generate insights about some specific topic. The data has to be contextualized before it can be put to use, and doing so effectively requires an in-depth knowledge of that topic.Today data science is being applied in nearly every domain. Perhaps some of the most interesting uses can be found in fields like business and health care.
    • Health care
      • Data-driven preventative health care
      • Disease modeling and predicting outbreaks
      • Improving diagnostic techniques
      • DNA sequencing and genomic technologies
    • Business intelligence
  • Communications: Communications is often forgotten when discussing data science, but communication is relevant at nearly every stage of the data science process. It’s a critical link between theory and practice. Data has little value unless it can be applied to solve problems or answer questions, and it can’t be applied until someone other than the data scientist understands it. On the flip side of that statement, data scientists need to know what questions they’re trying to answer in order to choose the best analytical strategies.Though communications are often grouped with domain knowledge, it’s helpful to separate them to emphasize their importance. Here are a few data science-oriented applications of communications:
    • Data science evangelism (spreading awareness about the uses of data science)
    • Clarifying what is needed/desired from data
    • Presenting results in a useful way
    • Data visualization (graphs, charts, models)

The Data Science Process

If separating data science into the above disciplines were easy, though, it wouldn’t be its own field.

In reality each discipline is woven throughout the process with a large degree of flexibility in the combination of techniques used.

Here’s a general, very broad-scope view of the data science process and the disciplines that affect each stage.

  1. Data is collected and stored. Computer science
  2. Questions are asked. (What is needed from the data? What problems does the user hope to solve?) Communications, Domain knowledge
  3. Data is cleaned and prepared for analysis. Math, Computer science
  4. Data enrichment takes place. (Do you have enough data? How can it be improved?) Computer science, Math, Communications, Domain knowledge
  5. A data scientist decides which algorithms and methods of analysis will best answer the question or solve the problem. Math, Computer science
  6. Data is analyzed via Artificial Intelligence/machine learning, statistical modeling, or another method. Math, Computer science
  7. The results are measured and evaluated for value/merit. Math
  8. The validated results are brought to the end user. Communication, possibly computer science
  9. The end user applies the results of data science to real-world business problems. Business, communication

This list is mainly intended to demonstrate how inextricably combined the component disciplines of data science are in practice.

The data science process is never as straightforward as this; rather, it’s highly iterative. Some of these steps may be repeated many times.

Depending on the results, the scientist might even return to an earlier step and start over.

Where the Confusion Lies

After reading this far, the reasons for the confusion between data science and machine learning have likely become clear.

Machine learning is a method for doing data science more efficiently, so it’s misunderstood to be a direct subdiscipline of data science.

In fact, looking at a list of things data science can accomplish reads like a pitch list for adopting machine learning.

Here are a few common data science applications to illustrate the point:

  • Forecasting/predicting future values
  • Classification and segmentation
  • Scoring and ranking
  • Making recommendations
  • Pattern detection and grouping
  • Detecting anomalies
  • Recognition (image, text, audio, video, facial, …)
  • Generating actionable insights
  • Automation
  • Optimization

The reason for this overlap is that machine learning algorithms are very effective tools for sorting and classifying data.

That makes machine learning popular among data scientists, but it doesn’t have the inherent direction and sense of purpose of data science as a whole.

In simple terms: machine learning is a tool, data science is a field of practice.

Machine Learning Isn’t Necessary for Data Science…

While ML is an efficient way of performing data science, it’s not always the best solution. Sometimes it isn’t needed at all. Two notable cases when machine learning is the wrong tool for a job:

  • The problem can be solved using set formulas or rules. If there’s no interpretation needed and context doesn’t change the data, a mathematical model alone can handle the matter. There’s no point in spending resources on machine learning. It might lead to faster results if there’s a large amount of data, but it won’t produce “better” results.
  • There isn’t a massive amount of data involved. This is a case where machine learning does more harm than good. Machine learning requires data, the more the better. Without a store of prepared data to train the algorithm, it can produce unreliable results. Worse, training on a small or unrepresentative sample yields biased results. When there isn’t enough relevant data on a subject to fuel machine learning, other methods of data science are better options for finding answers.

But It Is a Game-changing Advantage.

Despite these limitations, machine learning offers such a distinct advantage that easy to see why data scientists are adopting it in such large numbers.

There are three main situations where it’s generally the best data science method:

  • There’s too much data for a human expert to process. Some data is perishable. By the time a team of human analysts works through it (even using standard computing methods) it’s aged out of usefulness. Other times data is flowing into a system faster than it can be processed. Machine learning algorithms thrive on massive amounts of data. They improve by processing data, so results actually become more accurate over time.
  • There is ambiguity in the ruleset. Machine learning has a long way to go before it can match the human potential for coping with uncertainty and inconsistency, but it’s made huge strides in drawing meaningful results from ambiguous data.
  • Programming a specific solution isn’t practical. Sometimes the code needed to program a solution is so big that doing so would be inefficient. In these cases, machine learning can be used to streamline the analysis process.

The Bottom Line

It’s definitely possible to do data science without incorporating machine learning.

However, the pace of data production is growing every day.

By 2020, 1.7 megabytes of data will be created every second per living person.

Most of that will be unstructured data.

Machine learning is the best tool for dealing with that volume and quality of data, so it’s likely to be used in data science for the foreseeable future.

How well is your company taking advantage of its data? Contact Concepta to learn how we can turn your data into actionable insights!

Request a Consultation

Download FREE AI White Paper

How Digital Transformation Is Impacting Small and Medium Businesses

digital transformation impacting SMBs

The digital revolution is disrupting the traditional business model for small and medium businesses (SMBs).

On one hand it makes it possible for them to compete with much larger companies, but on the other the investment required can be daunting.

Before setting out to create a digital strategy, it helps to work through what digital transformation actually means and how that affects SMBs.

Changing the Pace of Business

Digital transformation is just that- a total restructuring of operations to function more efficiently in the digital era.

Its effects reach into every area of business.

Here are a few of the core components involved in a successful transformation:

Mobile presence: SMBs used to be fine with a regular website, but now consumers expect a fast, fluid mobile experience.

This is doubly urgent for small businesses.

40% of all mobile searches are for local businesses, and 88% of people who search for a local business will visit either it or a competitor within 24 hours.

60% won’t visit or recommend a business after having trouble with a poorly designed mobile site.

Optimizing for mobile is obviously important- yet 47% of SMBs still don’t have a mobile-friendly website or app.

Enterprise apps: Enterprise apps rework everyday functions- ordering, reporting, marketing, planning- into streamlined processes that can be managed via user-friendly apps.

They cut down on internal confusion since everyone has the most up-to-date information in the palms of their hands.

Implementing enterprise apps at the operational level eliminate redundant operations, increase efficiency, and improves employee morale.

A study by Adobe found that investing in enterprise apps netted companies a 35% ROI on average.

Chatbots: Chatbots are the answer to a major customer dilemma: how can a company provide twenty-four hour customer service at scale without the expense of hiring and training humans agents?

Natural language processing technology has advanced enough that chatbots can handle the majority of routine customer queries.

Technology experts predict that customers will conduct 85% of their brand interactions without speaking to a human at all!

Automation: Digital revolution creates a lot of work, but fortunately much of it can be automated.

Automation involves creating a list of processes that trigger other actions or processes without needing to check with a human first.

One example would be generating order confirmation emails after a customer completes a purchase online.

Automation can also walk a customer through basic troubleshooting after an error is reported.

It’s often confused with artificial intelligence, though automation uses set rules to complete its tasks instead of analytical reasoning.

AI-powered customer management: There’s a marketing adage that 80% of business comes from 20% of customers.

With artificial intelligence, that’s changing forever.

Intelligent profiles formed by automatic customer classification and segmentation help companies identify their best customers.

These profiles also help provide the kind of personalized experience that keeps customers happy and loyal.

Data science and analytics: In many ways, data science is the cornerstone of digital revolution.

Data has been called the new oil, and for good reason.

Increasing data utilization is one of the fastest ways to grow a company, resulting in lower operating expenses and higher revenue.

For more about data science and its impact on the business world, read our white paper: “How Businesses Can Use Data Science and AI to Gain a Competitive Edge.”

The Looming Threat to SMBs

When unchallenged, large companies who refine their digital strategy can satisfy needs once available only through a SMB.

The traditional advantages of small businesses over corporations are personalized service and an inventory of niche products tailored to their local market.

Techniques like intelligent customer profiling give companies that would ordinarily be too large to customize their offerings the insight to do so.

If SMBs aren’t pushing digital transformation themselves, these large companies could steal their client base and push them out of a local market.

The problem is the investment in time and resources required for traditional digital solutions.

Getting maximum efficiency from daily operations like reporting, inventory, or accounting is easy to do with modern software.

However, the type of programs used by corporations aren’t practical for SMBs.

They’re complicated to operate, needing trained staff to maintain their databases, and the typical SMB will only use a fraction of their capabilities.

To complicate the issue, large-scale software is expensive enough that recouping an investment would take too long to merit the expense.

SMB managers often make do by stretching the capabilities of programs like Excel, but that can cause more problems than it solves.

Analytics software inspires a similar dilemma since the rewards of data science and analytics for SMBs are hard to see.

Unsure how to translate their data into actionable results, owners hesitate to invest the time and money to join the digital revolution.

They worry that the potential damages from project failure are much higher for smaller businesses that can’t absorb losses like huge companies.

The Path to Digital Adoption

Fortunately, software developers are beginning to cater to the digital transformation needs of SMBs.

Solutions aimed at SMBs are more widely available.

Owners no longer have to buy management software meant for global corporations in order to digitize their operational needs.

Instead they can choose software with only the features they will use.

For example, a landscaping company can have a unified dispatch and reporting app built that lets managers assign jobs and receive completion reports.

The app costs less than a solution meant for larger companies and meets the company’s requirements more closely.

Analytics is easier now, too.

Rather than hiring an in-house data science team, SMBs can take advantage of off the shelf enterprise analytics software.

This is a popular option among SMB owners.

Last year SMBs used an average of 4.8 apps to manage their operations, up from 3.8 in 2015.

46% are tracking their social media metrics through analytics programs, and 47% use some level of business intelligence software.

SMBs do run into trouble with premade software that doesn’t quite meet their needs.

Some solve the problem by stringing together a collection of apps that each solve a different problem.

The resulting technological complexity can give the impression that data science is too complicated for SMBs, but there are good alternatives to the “patchwork app” system.

Custom programming on an SMB level is surprisingly affordable.

Instead of using a handful of apps to manage outcall scheduling and reporting, for example, a customized business app could combine those functions in one easy to navigate place.

Cloud technologies are making many of the same analytics favored by large companies available to SMBs, too.

By tracking and predicting customer needs, SMBs can implement smart inventory systems that make the most of limited shelf space.

Targeted marketing is another convenient tool for reducing operating costs.

It lowers the price of customer acquisition while raising the value of individual clients through repeat business.

Speaking of lowering costs, data science can reduce overhead in general.

Artificial intelligence and automation takes tedious or repetitive tasks out of human hands, leaving employees free for more skilled projects.

The boost in efficiency makes up for SMB’s comparatively smaller staffs.

Leveling the Playing Field

In a very real sense, SMBs are better positioned to benefit from digital transformation than large companies.

41% of SMBs feel their size is an advantage when overcoming institutional resistance to adopting new technology.

They have less bureaucracy surrounding the decision to change, and they have more to gain by going digital.

When they do commit to digitization, their efforts have a high success rate.

Three quarters of SMBs feel that gains from investing in data science technology met or exceeded their original expectations.

Despite the challenges, digital strategy should be a priority for SMBs.

It’s a game-changer.

Half of industry leaders believe that technology levels the playing field between small businesses and large corporations.

Digital transformation is the most reliable path to maximizing an SMB’s resources to gain an edge against their bigger competitors.

How has the digital revolution affected your business? For advice on fine-tuning your digital strategy or to explore how to begin, get your free consultation with a Concepta expert today.

Request a Consultation

Download FREE AI White Paper

The Executive’s Guide to Assessing and Improving Your Data Strategy

improving your data strategy

Is a weak data strategy sabotaging your analytics initiatives?

Artificial Intelligence and its applications are growing in visibility as early adopters publicize their successes. This surge in popularity is highlighting a major problem: leaky and inefficient data strategies.

80% of the world’s created data is stored by enterprises, but only 1% of that information is used to inform business decisions. The rest languishes in data silos or ages out of usefulness before it can be processed.

While these shortcomings aren’t new, the true extent of the problem becomes visible only when companies begin to integrate AI into their workflows. Suddenly they see conflicting data streams for one domain, a single unreliable source for another, and no way to quickly disperse information to the departments who need it.

A solid data strategy is the tool needed to smooth out these wrinkles.

However, 34% of marketers surveyed by the CMO Council last year admitted that their data strategy isn’t embraced throughout the leadership team. 24% have no data strategy at all. They may appoint a CDO but don’t take the additional step of outlining clear directives.

companies that have a formal customer data strategy
Source: CMO Council

This oversight is the first step to failure for analytics initiatives.

Data science is a growing field with a lot of moving parts. There is a clear benefit to integrating it into enterprise workflows, but it can be difficult to know which practices are enterprise-ready and which will add needless complexity to your organization.

This guide will help you evaluate your current data strategy, assess how well it aligns with your corporate goals, and construct a data science plan that will propel your company to the head of its field.

Audit Your Data Strategy

Before anything else happens, conduct an audit of your current data strategy. The more thorough the audit, the more opportunity there is to spot wasted effort and lost opportunities. Use these guided questions to make sure you don’t miss anything vital.

What data do you have and how do you access it?

Identify every method you currently have of collecting information. This includes marketing data, website metrics, feedback collection and analysis procedures- essentially, anything that measures how your company is performing and how consumers are interacting with your brand.

The more detail you use here, the better. Don’t forget to include data from embedded analytics. Even if you haven’t begun adding analytics on your own, most websites that were built within the last decade will have at least some tools that track site activity.

A growing number of social media and marketing apps also offer embedded analytics. Take a look at exactly what is available.

  • Are you using analytics tools to track your users’ activity? Which ones?
  • Can you track an individual’s path through your website, or does your software only collect aggregate data from all users?
  • Do you have defined Key Performance Indicators (KPIs)?
  • How much say do you have in what metrics are tracked? Are you able to customize collection streams to highlight KPIs?
  • Do you have a central dashboard for viewing and manipulating your data? If not, how many programs does a user have to log into when creating reports?

How is data stored?

Cloud storage is the future of data warehousing, but not everyone has made the switch yet. Some companies have in-house systems they’ve invested heavily in, and uploading everything to the cloud doesn’t make sense for them.

There are also “hybrid warehouses” where some incoming data flows get tagged for in-house storage while others are directed to the cloud. Figure out which system your organization uses.

  • How much data do you save? Do you save everything or only data related to KPIs?
  • Are your storage limits imposed by internal decisions or available space?
  • What are your backup procedures? How can you recover your data in the event of a loss?
  • Do you have an employee or contractor assigned to oversee your data storage?
  • How will you know if your data storage fails or becomes corrupted?
  • Who can see your data? What are the security procedures used to keep unauthorized users from using/altering data?

How is your data being used?

What are you doing with your collected data? This isn’t the place to gloss over gaps in current procedures.

Businesses who don’t effectively use their data will be losing $1.2 trillion to their competitors every year by 2020, so if there is room for improvement here it only benefits your organization to point it out.

  • What is your data telling you about clients, market conditions, and workflow efficiency?
  • Are you using AI techniques such as machine learning?
  • Does your data generate actionable insights?
  • Do you act on the insights generated by your data?
  • How are you using data to optimize marketing, increase customer satisfaction, and improve internal processes?

forrester research

Conduct a Needs Assessment

The needs assessment consists of two stages: planning solutions for errors found during the strategy audit and determining what emerging technologies can be most effectively integrated.

Identifying weaknesses

During the assessment some problems with the current data strategy should become apparent. It will be obvious if there is no backup, for example, or if huge amounts of data are currently being unused.

Other issues may take longer to realize. These tend to be workflow problems, workarounds that staff has created in order to function in a dysfunctional data environment.

The two most common are data silos and Shadow IT.

Data silos

A data silo is a collection or storage system that is only accessible to one group within an organization. Drawing data from these systems for use in other places adds several extra layers of work for employees.

Data silos can be formed accidentally when the higher leadership is unaware of a resource one department has and how it can be applied to the company at large.

For this reason, CDOs and CIOs need to cultivate reciprocal relationships with their subordinate managers.

There should be a climate where those managers feel empowered to share their ideas and processes without being accused of “getting bogged down in details”.

This gives C-level execs the chance to see opportunities for applying those processes in other departments.

Every meeting doesn’t need to be a class on how a department runs; an in-depth update once or twice a month is generally sufficient to stay on top of developing procedures.

Sometimes, data silos are intentionally created by IT departments in the interest of security. Security versus flexibility is one of the greatest conflicts in data science.

It’s critical to ensure information (especially protected customer information and strategy-sensitive data) is kept from unauthorized use, but at the same time too many silos make completing even the basic tasks complicated.

Shadow IT

Difficulties in accessing data needed to operate leads to the second most common “data dysfunction”: Shadow IT. This consists of any systems, applications, and procedures adopted by non-IT staff without IT consultation.

Despite its ominous name, Shadow IT isn’t caused by a desire to hurt the company.

Employees become frustrated with inefficient workflows or limited capabilities and act to “fix” those problems.

They install enterprise software (or sometimes write their own) that automates as much of their “housekeeping” tasks as possible in order to give themselves more time to focus on their primary jobs.

Allowing key decision makers to champion new technology has the benefit of increasing flexibility and offloading some of the IT workload. It isn’t without risk, however.

Unapproved software can expose the organization’s data to the security risks the IT manager created the data silo to avoid.

Also, without central coordination resources are wasted on redundant or conflicting software.

For more information about these hazards that IT face, read last month’s blog post: The Dangers of Shadow IT in Mobile App Development.

Evaluating new data science applications

How well is your current data strategy delivering results? If you aren’t using Artificial Intelligence-based analytics software, there’s room for improvement.

AI allows for faster analysis of unstructured data which makes up an estimated 80% of the world’s data.

Including AI in your data strategy is the first step to introducing it into your business strategy. For inspiration, here are some of the technologies being used by top level enterprises today.

Deep learning

Just as machine learning is an application of Artificial Intelligence, deep learning is an sub-discipline of machine learning.

Some industry publications describe it as an evolution, but this is misleading as machine learning is still a vibrant and growing field.

Both machine and deep learning teach software to make increasingly more accurate choices about data based on past experience with little to no human input.

Deep learning focuses more on the creation of deep neural networks: vast collections of data that help refine the program’s definitions of categories.

Imagine a company wants to design a program to screen customer images posted to a social media site for inappropriate content.

A series of initial algorithms is written to define what “inappropriate” means. The program uses these algorithms to approve or flag incoming content.

In the beginning, though, the software will make a lot of mistakes while it attempts to understand the provided instructions. Patterns of color and unusual body positions could cause false positives.

Deep learning shortens the training period by feeding an enormous amount of prepared data through the algorithm during the preparation phase.

Because the program can access a deep pool of pre-screened exemplars to check its results, it doesn’t have the same learning curve as machine learning processes that must construct their own models.

Results returned by deep learning algorithms reach usable levels of accuracy faster.

On an enterprise level, deep learning is most beneficial in cases where a company already has a store of sorted data to apply.

Current applications of the technology include predicting the outcome of legal cases, navigating self-driving cars and guidance systems for the visually impaired, automatically generating reports in response to unstructured triggers (such as the text of a complaint email), and providing more challenging virtual opponents for computer and video games.

Data mining

Data mining and predictive analytics are often used interchangeably, though in reality data mining is a process that powers predictive analytics.

It’s one of the techniques that creates the framework that predictive analytics uses to generate its predictions. Modern applications of data mining use machine learning to refine their output.

There are different categories of data mining depending on the desired end state.

  • Association is used to find connections between events (ie, customers look at these two websites during the same visit).
  • Path analysis is the logical continuation of that process in which the typical order of events is defined (customers look at these FAQ pages before choosing the “Contact” page).
  • Data clustering also groups data by proximity but without assuming causation (for unknown reasons, the most customers come from these six cities).
  • Classification sorts data into classes based on differentiating factors (customers who have made a purchase in the past vs customers who only browse the site).
oracle data mining
Source: Oracle

Data mining helps companies find previously unknown patterns in their data. Opportunities for growth are often overlooked because the data obscures them.

Marketing is the highest profile enterprise application of data mining (determining when and how to implement marketing campaigns) but other departments can take advantage of it as well.

For instance, data mining can serve as a virtual feedback panel for product designers.

Knowing what features of an app users interact with most and which are ignored helps plan updates and refine upcoming products to more accurately align with customer needs.

In a market where 86% of customers will pay more for a better experience, improving responsiveness is a significant competitive edge.

Streaming analytics

Gone are the days when companies could afford to wait for the quarterly sales report to evaluate their performance.

The growth of the online economy has created a constantly shifting environment with opportunities disappearing as quickly as they arise, and organizations without the ability to continually visualize their operational data will lose ground to their better-prepared competitors.

For this reason, streaming analytics is one of the most critical analytics technologies to include when building a data strategy.

Data from multiple sources are analyzed mid-stream before being directed to data warehouses. Executives can check a central dashboard with living graphs and charts, providing a real time snapshot of operations.

Streaming analytics can be used to set off alerts when certain KPIs pass relevant levels.

If response to a certain advertising campaign suddenly spikes in an area, management can assess whether the activity is positive or negative and react accordingly.

Fast and satisfying reactions are key to reducing the impact of errors on public relations.

Examples of this concept are all over the news.

United Airlines is still a source of ridicule for their weak response to misguided staff while American Airlines, who had a similarly publicized incident between a flight attendant and a passenger, was able to react in time to minimize the damage to their reputation.

Look at the difference in search results on each of these events:

united airlines vs american airlines

The difference in media reporting on these two events makes a strong case for being able to quickly evaluate public response based on unstructured data.

In-house data management team vs. Outsourcing

How much of your analytics will be conducted in-house and what will be outsourced?

If you have unusual or very complex requirements, you will need a dedicated team including engineers, programmers, and at least one statistician.

With the rise of embedded analytics in enterprise software, however, investing in data science doesn’t necessarily require engineers and programmers.

Most businesses won’t need a high-science analytics team. A contractor can streamline and coordinate your analytics programs and recommend new software that is a good fit for your company.

Structuring Good Data Strategy

With all this information at hand, it’s time to create the data strategy itself. This can be the most contentious part of the process.

C-level executives with different areas of responsibility have different ideas about what the plan should look like and who should be responsible for its adoption and upkeep.

43% of enterprise leaders feel that getting everyone to agree to the same data strategy is too hard.

In truth, good data strategy prevents more problems than it causes. It removes the uncertainty around data management by outlining expectations of all involved parties.

After adoption of the data strategy different departments of the organization will find it much easier to rely on data coming from other branches since they have more trust in the collection and management procedures.

There is no standard template for enterprise data strategy. Your business is unique, even among your competitors, and what works for another company might not compliment your existing operations.

There are certain elements that every data strategy should address in order to be considered complete.

Data Storage

Before any data is collected, there needs to be a storage system in place. The main decision here is whether to build local storage or contract for cloud storage.

Local storage will often have faster connection speeds, and you will have complete control over functions such as backups and access control.

You can also manually disconnect local storage from the internet in case of a network attack. Setting up local storage comes with a large up-front investment, though. Also, you will have to arrange for maintenance and security personnel to protect that investment.

Cloud storage side-steps the costs of building and managing local servers. The provider handles maintenance and improvements as part of the cost, which is typically structured as a subscription.

It’s possible to purchase more storage as your business grows without being delayed by construction. Keeping data stored on the cloud protects it against on-site accidents, too.

These advantages come at the cost of less control over the details of data storage and a slightly slower connection speed.

The vast majority of businesses won’t notice an appreciable difference in connectivity between local and cloud storage, so for most people cloud storage is the best solution.

Collection and Exploitation

Although collection and exploitation are different domains, the rise of embedded analytics has tied them together.

An increasing number of products that used to simply collect data are now processing it as well, and few companies are willing to invest in new software that doesn’t include some form of analytics.

Planning for collection includes deciding what information you need to track. Be specific, but don’t feel the need to ration your KPIs. Data science needs data to work.

The more relevant information you have, the more ROI you can realize from data science programs. Don’t forget to address gaps in your data infrastructure revealed during the audit stage.

Exploitation covers everything from which embedded analytics programs will be utilized to the new data science applications you plan to adopt.

What do you want your data to do? What goals should your CDO be working towards?

While these will by nature be loosely defined, try to narrow it down more than “growth”. A better exploitation goal would be “increase growth in X market” or “improve the customer acquisition funnel”.

Data Integration

Describe your executive expectations for adoption of data science enterprise-wide. This section should include a detailed plan for how data should be disseminated throughout the company.

Incorporating data science into existing workflows is most efficiently done on a rolling phased basis.

That is, identify the first few steps to improving your data usage, then periodically reassess and add new steps as the old ones are completed.

Provide metrics to help managers assess their data science integration.

A data strategy should be dynamic as well as specific. Make sure there are guidelines for adjusting plans to fit new information, but don’t change requirements on a weekly basis.

Alterations to your strategy should always be data-driven and push towards well-defined goals.

Governance and security

Determine who owns each data asset within the company. Who is responsible for overseeing it? Who can make changes? Who can retrieve data? How will your data be protected from external malicious actors and internal negligence? What measures are in place to comply with relevant privacy laws or HIPAA regulations?

There’s an executive trend towards democratizing data so that it’s accessible by every department.

That provides an incredible amount of flexibility and encourages innovation on an individual level, but there are some security concerns involved.

Decide what level of access each category of employee will have based on what you deem an acceptable balance of risk.

Resist the urge to centralize your entire data governance to the CDO. Assign key data governors at each level of authority, all the way from the CDO to individual departments.

This is a good way to balance freedom of data versus security, in fact.

Each governor can assess their section’s need for specific data more easily than the CDO, and having everything flow through that local governor provides a measure of accountability.

information governance landscape


Sound data strategy and the resulting increase in data utilization translates into profit: Fortune 1000 companies who increase their data utilization by a mere 10% can add up to $65 million to their net income.

By assessing and improving your company’s data strategy, you can position yourself to take advantage of new AI technologies and win a share of that increased profit.

Concepta can help assess your data strategy. Contact us today for a consultation!
Request a Consultation