Over the past few years machine learning has continued to prove its worth to enterprise.
Over 70% of CIOs are pushing digital transformation efforts, with the majority of those focusing specifically on machine learning.
Almost the same number (69%) believe decisions powered by data are more accurate and reliable than those made by humans.
Still, some companies struggle to get value from their machine learning processes. They have trouble finding talent, and their projects are slow to reach ROI.
The problem isn’t with machine learning – it’s with the company’s approach.
The Pitfalls of Reinventing the Wheel
Sometimes companies get so caught up in new technology that they forget what business they’re in.
They don’t need to build complex data science systems or experiment with new types of algorithms or push machine learning as a science forward.
What they need is to extract actionable insights from their data. Companies should be aware of and maintain their data infrastructure, but that isn’t their primary focus. Their focus is running their core business.
However, the majority of companies approach machine learning with a misguided idea of what makes it work.
They assume their specific business needs mean they have to start from scratch, to build a machine learning solution from the ground up.
As a result, they wind up building the wrong kind of infrastructure for their machine learning project. One common place this flawed infrastructure shows is in the type of talent chosen.
Companies go straight for high level data engineers who build machine learning software.
That’s a large – and often costly – mistake. In an enterprise context, data engineers aren’t as useful as applied machine learning experts with experience in turning data into decisions.
Imagine a business traveler looking for the fastest route to a meeting in a new town. Would they have better luck getting directions from a civil engineer or a taxi driver?
The civil engineer knows how to build functional roads, but they don’t necessarily know a specific city’s streets or layout.
The taxi driver knows how to use the streets to get results: arriving at the meeting in time despite traffic, construction, and seasonal issues.
This might sound like a silly example, but it’s exactly what businesses do when setting up machine learning programs.
They focus too much on the “how” (building data systems) and not enough on the why (what business goals the system needs to fulfill).
In other words, they think they need civil engineers when what they really need is a good seasoned taxi driver.
The result is wasted resources and higher program failure rates. A big enough failure can also risk future projects when leaders blame the technology rather than the flawed execution.
Why Companies Get Stuck in A Rut
There’s a very good reason why otherwise smart people make mistakes with machine learning: it’s complicated.
Artificial intelligence and machine learning are incredibly complex topics with thousands of subdisciplines and applications.
There is no “catch all” job description for someone who can do all kinds of machine learning.
Those few people with experience in several phases of the data-to-decisions pipeline are high-level, in-demand experts who very probably won’t take an average enterprise position.
On top of this, executives aren’t always sure what type of talent they need because they aren’t clear on what their data science needs are.
They hire data engineers, give them vague directions to “increase efficiency”, then get frustrated when they don’t see results.
Even the best machine learning system can’t create value without working towards a goal.
Getting More by Doing Less
Laying the groundwork for successful machine learning is a case of “less is more”.
Don’t get caught up in high-level, experimental machine learning which seeks to advance the science unless there’s a good business reason (and for enterprise purposes, there almost never is).
A PhD in artificial intelligence and experimental mathematics is not necessary to run a productive enterprise machine learning program.
Instead, find the right experts: statisticians, data intelligence experts, applied machine learning engineers, and software developers with experience in machine learning software.
The truth is, most businesses won’t need to build a machine learning program from scratch. There are many tried and tested solutions available that can be customized to fit a specific company’s needs.
Better yet, they’ve been tested by others at their expense. These tools remove the need for those high-level machine learning construction experts.
Practical talent choices and existing machine learning tools can make the difference between project success and failure.
Using them helps companies get to data quality assurance and usable results faster, meaning the project reaches ROI sooner. The project is more likely to succeed, and future projects will have an easier time winning support within the company.
In short, don’t hire the civil engineer to build roads when there are several existing routes to get where the company is going. The taxi driver is usually the better choice for the job.
Staying on Target
Most importantly, remember the core business and focus on tools that support that instead of distracting from it.
Always build machine learning systems around business objectives. Have specific issues or opportunities to address with each tool, and be sure everyone on the team understands the goal.
When machine learning is treated as a tool rather than a goal, companies are much more likely to see value from their investment.
There’s a wealth of machine learning tools out there to use- but sometimes it’s hard to manage incoming data from different software. Concepta can help design a solution to put your data in one place. Schedule a free consultation to find out how!
Read on to learn more about predictive analytics, what it offers for enterprise, and how it can drive a measurable increase in revenue.
What Does “Predictive Analytics” Mean?
Predictive analytics is the practice of analyzing past and present data to make predictions about future events.
Technically it can describe any process that seeks to identify the most likely scenario by drawing parallels between historical and current conditions, then placing those conclusions in a modern context.
Analysts look at what happened in the past when conditions were similar and how certain events played out. They assign more weight to factors which have tended to be more influential or which have greater potential for an extreme outcome.
Most people assume predictive analytics is a recent invention, something that arose after computers established their role in enterprise.
In reality, businesses have been relying on it since the late 1600s, when Lloyd’s of London began applying the practice to their shipping insurance estimates.
In pre-artificial intelligence days, people used statistical modelling and complex mathematical equations to perform predictive analytics.
Many updated versions of those models are still in use for industrial shipping and route planning.
Non-intelligent methods have limitations, though. They only consider factors users decide to include, so there’s a heavy likelihood of human error and bias.
It also takes time to perform the calculations. By the time they’re done and put into use the data is already becoming outdated.
The modern style of predictive analytics incorporates artificial intelligence and machine learning. It allows users to include many more variables for a broader, more comprehensive analysis.
The process highlights unexpected connections between data and weighs factors with greater accuracy. All of this can be done with a short enough timespan to create timely, reliable insights.
The Intersection of Science and Enterprise
AI and machine learning have exciting potential, but like all emerging technology they require investment. Global brands like Google might have the resources to shrug off a failed project.
Successful enterprise leaders aren’t reckless with their IT budgets. They focus on actual business goals and source technology that addresses those instead of trying out popular new tools.
A smart executive’s first question about a new tool should be, “How does this benefit the company?”
Predictive analytics has an answer for that question: growing revenue through refined opportunity scoring.
The Value of Opportunity Scoring
Opportunity scoring involves assigning a value to a sales lead, project, goal, or other potential course of action in order to determine how much relative effort to spend in pursuit of that opportunity.
Scoring opportunities allows a company to get a greater return on their time and money. No company can put the same investment into every customer or chance that crosses their path.
They shouldn’t even try. 80% of revenue comes from 20% of clients, so it makes sense to prioritize that 20%.
It’s something every business does, even if there isn’t a standardized process in place. High value sales leads are called frequently, given more flexibility with concessions, and assigned better sales representatives.
High value projects get bigger teams, more access to resources, larger budgets, and scheduling priority.
The trick is deciding what opportunities have the most potential- and that’s where predictive analytics come into play.
Manual scoring is where people assign scores, either personally or using a statistical model, based on their own set of influential characteristics.
There are a number of problems with this method that can slow businesses down or leave them with unreliable scores.
Opportunity scoring is incredibly valuable- when it’s reliable. Manual scores have a wide margin of error. They depend on direct user input with no ability to suggest other relevant factors.
Unlike intelligent scoring, manual methods can’t easily be used to find unexpected commonalities among high-return accounts.
The problem with this approach is that executives can’t realistically imagine or keep track of everything that might influence their company.
There’s too much data to consider, and it changes constantly. All manual opportunity scoring is therefore based on aging, less useful data. On top of that, it’s easy to make mathematical mistakes even with a computer program’s assistance.
Because users choose and weigh the contributing factors, manual scoring is highly susceptible to human bias.
Preconceptions about social categories, personal characteristics, industrial domains, and other identifying factors can be given too much (or too little) weight. It allows for unhelpful biases to be introduced into the sales cycle.
Most of the time the result is simply less helpful scores, but it does occasionally create a public relations issue if the scoring system leads to operational discrimination.
For example, some real estate brokers have run into a problem where one racial group wasn’t shown houses in a certain area. The company’s scoring suggested those groups were less likely to buy there.
The realtors thought they were following good business practices by relying on their internal customer scoring, but those scores were skewed by biases about economic stability rather than actual data.
When the situation came to light it created a public impression that the realtors had the same bias. Suddenly they had bigger worries than opportunity scoring.
Creating and maintaining a manual scoring process takes a significant chunk of time. It’s not a system that responds well or quickly to change.
That’s a problem in today’s hyper-connected world, where an event in the morning could start influencing sales across the country that afternoon. Opportunities wind up passing before they’re even recognized.
Not everyone remembers to consider “hidden costs” of ineffective processes, like higher labor.
There’s human nature to consider, as well. Sales teams can tell these scoring systems aren’t effective.
It’s not uncommon for teams to resist wasting time they could be working leads. They don’t like putting their commissions at risk with bad information, but still need some kind of guidance to help manage their lead interactions.
More often than not, they operate with outdated scores or “gut instincts”. That in turn frustrates managers who have invested in the inefficient manual scoring process.
The conflicting pressures create an uncomfortable working environment that drives unwanted turnover among the most valuable sales agents.
Even the most sophisticated manual scoring program can only account for things that have been specifically input into the equations.
They require humans to think of and assign value to every possible factor. This tends to enforce a “business as usual” mindset over more profitable responsive operations.
On the flip side, predictive opportunity scoring is among the leading AI-based drivers of revenue in an enterprise context. It has the edge over other methods in several areas.
There are two central reasons behind the higher reliability of intelligent scoring. First and foremost, it reduces the impact of human error.
Machine learning algorithms perform calculations the same way every time. Even as they adjust themselves in response to new data, the underlying math is more reliable than calculations done by humans.
It’s also important that predictive analytics is purely data-driven rather than focusing on “traditional knowledge” about what makes an opportunity valuable.
Artificial intelligence expands a computer’s capacity to judge the relevance of seemingly unconnected events.
Predictive analytics leverages that capacity to identify characteristics shared by highly productive courses of action. Those commonalities are then used to weigh future opportunities with a greater degree of accuracy.
Predictive analytics fed by a constant stream of new information allow predictive opportunity scoring tools to update scores in real time (or at least near-real time).
They highlight opportunities in time for companies to take action. Advance warning also leads to better advance planning when it comes to inventory, staffing, and marketing.
Intelligent scoring tools react to actual circumstances based on data instead of presumptions. They consider all data available for a situation from an impartial standpoint (providing, of course, that any developer bias is accounted for during design).
The majority of enterprise analytics software is designed to be user-friendly and easy to operate. It connects to a company’s data sources, meaning there’s usually very little data entry required. This frees the sales team to focus on their accounts and other high value activities.
Optimized Opportunity Scoring = More Revenue
Predictive opportunity scoring is backed by sound theory. The practical benefits are just as solid, with several applications proving their enterprise value today:
There’s a saying that 80% of revenue comes from 20% of clients. Lead scoring helps identify that 20% so sales teams can focus on the most valuable customers.
Agents close more contract- and higher value ones- when they know where to profitably spend their time rather than chasing a weak prospect for a low bid.
Intelligent lead scoring is a dynamic process, meaning it regularly reevaluates leads based on changing circumstances.
A client marked as a low priority lead moves up as indicators show rising interest. Agents then step in at the right point in the purchase cycle with a targeted incentive to encourage a sale (or even an upsell).
Customized attention has the added benefit of increasing repeat business, because former clients aren’t bombarded with high-pressure closing tactics when they aren’t ready to buy again.
Enterprise benefit: Better closing ratios, greater PLV, higher average order value
Specific demographics can be targeted with tailored campaigns to increase their lifetime value, keep them from falling out of the sales cycle, or meet another specific business goal.
Enterprise benefit: Higher ROI on advertising campaigns
Information can travel around the world near-instantly, but inventory still has to be physically moved.
That puts a natural drag on rearranging supply levels in different regions – and with it, a cap on how much companies can exploit regional fluctuations in demand.
This causes lower sales in general, but the effect is most striking during an unexpected surge in demand.
Predictive analytics can spot early indicators of a potential spike while excitement is still building.
Executives have the warning necessary to shift goods and team members where they’re needed most. Amazon uses this method to stage their merchandise for faster shipping of commonly ordered items.
Enterprise benefit: Higher overall sales revenue
Opportunities for growth
Opportunities aren’t always about marketing campaigns or scoring leads. Sometimes executives want guidance when choosing between growth strategies.
Predictive analytics and opportunity scoring are useful here as well, answering questions like:
Which stores will be most valuable over the next year?
Where should the company expand?
Should more franchises be sold in a specific area?
What ventures are most likely to succeed?
Is a specific project worth the investment?
There’s no guarantee that any course of action is the best, but incorporating data cuts out many of the risk factors that lead to failure (such as hype over new technology or personal bias).
Enterprise benefit: Faster, more sustainable growth
Putting Data to Work
At the end of the day, data is only valuable when it serves real-world business goals.
Opportunity scoring is one of the most proven ways to extract value from data. It’s also one of the most accessible, since embedded analytics are built into the majority of modern enterprise software.
With so much to gain at a relatively low investment point, those who haven’t adopted yet should be giving predictive analytics a closer look.
Are you frustrated by trying to navigate multiple streams of data? One of the most common pain points of data intelligence initiatives is reconciling data from the enterprise software used by different departments. Concepta can unite data from programs like Salesforce, MailChimp, and other analytics programs to put data where it’s needed most- in your hands!
Artificial intelligence, machine learning, neural networks – these terms are used together so often that they run together. Differentiating them from each other can be confusing for executives who are just trying to explore the business applications of AI. So what’s the difference? Machine learning and neural networks are separated by both application and scale.
Fundamentals of Machine Learning
Machine learning is a field of Artificial Intelligence concerned with training algorithms to interpret large amounts of data with as little human guidance as possible. The algorithms process data, then use what they “learned” from those calculations to adjust themselves in order to make better decisions on the next batch of data.
There are two general types of machine learning. In supervised learning, an algorithm is fed a selection of labelled data for training purposes. It’s used to find known patterns within unknown data. One application would be identifying human faces in a batch of photos.
Under unsupervised learning, an algorithm gets no labelled training data and is only told whether an outcome is desirable or not. This method is best used to spot unknown patterns in known data. For example, unsupervised learning can find common characteristics in customers of a certain store that would guide marketing decisions.
Neural networks are arranged in layers of computational units (like neurons). Each connects with the layers before and after. Because they’re influenced by neighboring layers neural networks can learn in a non-linear fashion.
In practice neural networks can be used for many the same sorts of tasks where machine learning is a good fit. That includes:
Natural Language Processing
Intelligent opponents in games
Anatomy of Neural Networks
A typical “feed forward” neural network, where data flows in one direction, has three main parts.
Input Layer: Signals arrive to be processed.
Hidden Layer: This layer applies a specific rule which changes as the network “learns”. It isn’t an input or output layer; rather, it’s a filter or distillation layer. Most neural networks has multiple hidden layers. Each has its own distinct rule which it applies before sending the signal on, and each serves as the “input layer” for the next layer in line.
Output Layer: Here’s where the “results” or conclusions from the collective calculations are provided.
Where the Difference Lies
As mentioned earlier, the difference between machine learning and neural networks is one of application and scale. Machine learning is a process while a neural network is a construct. Simply put, neural networks are used to do machine learning, but since there are other methods of machine learning the terms can’t be used interchangeably.
Because neural networks are used in deep learning, they can be considered an evolution of machine learning. It is something of a simplistic view, but neural networks operate on a larger and more complex scale than basic machine learning. Thousands of algorithms can be layered together into a single network for more complex calculations.
Neural networks do have advantages over other machine learning methods. They’re well-suited for generalization and finding unseen patterns. There are no preconceptions about the input and how it should be arranged, which is a problem with supervised learning. Also, neural networks excel at learning non-linear rules and defining complex relationships between data.
Machine learning has only recently established its value to business, and neural networks are still being developed. That said, they’re the closest machines have come to mimicking the level of flexible thinking humans possess. It should be exciting to see how far this technology goes and where it leads the business intelligence world.
Machine learning-powered enterprise analytics are becoming a core facet of business intelligence. Sometimes, though, juggling data feeds from various software can be more distracting than helpful. Concepta specializes in building customizable dashboards that unite analytics streams to put your data where you need it, when you want it. Set up your free consultation to find out how!
Last year, Google Deepmind took a giant step forward in proving the value of deep learning when the latest version of their Go-playing computer program, AlphaGo Zero, beat the previous model after only three days of self-training.
This is an impressive feat by itself. The implications for business and enterprise analytics, however, are more exciting.
There are two parts to this. First, the algorithm weighs the value of as many possible future states as it has time and power to consider.
Next, it selects the best next action based on its current state.
Systems undergoing reinforcement learning aren’t given training sets.
They have no human advice about whether each move is good or bad, just whether the end state is ideal.
Reinforcement learning useful for creating generic decision processes which can theoretically be applied to different domains.
This is actually the goal of Google Deepmind: they’re trying to create generic deep learning algorithms that can be set to analyze any type of situation.
AlphaGo Zero Takes AI To The Next Level
Computers have been beating humans at chess since IBM’s Deep Blue defeated chess great Garry Kasparov in 1997.
Go is considered harder for computers since there are many more moves to be considered even from the start.
Until the original AlphaGo defeated European Go champion Fan Hui in 2015, no AI had yet beat a highly-ranked human player on a standard-sized board.
AlphaGo Zero pushed Google’s success further. It defeated the original AlphaGo system as well as the number one human player in the world, Ke Jie. Zero made several large technological leaps forward to achieve this.
Scientists were most intrigued by Zero’s self training. Unlike earlier systems which were given recorded games to study, Zero started only with the rules of Go. It reached a world champion level of play entirely through reinforcement learning.
Though perhaps less exciting, it’s important to note that Zero was also less resource-intensive than other AlphaGo systems.
The original AlphaGo used a “policy network” to select the next move and a ”value network” to predict the winner of the game from each position.
Zero combined these two, using a single network. It was able to do this with only 4 tensor processing unit (TPUs).
For comparison, the first AlphaGo used 176 TPUs and the previous system used 48.
Despite using fewer resources and having to teach itself Go strategy from scratch, AlphaGo Zero matured incredibly fast.
It took only three days of self-play to reach world champion level.
The Implications For Enterprise
What does this mean for business? The minds behind AlphaGo Zero put it most eloquently: “Our results comprehensively demonstrate that a pure reinforcement learning approach is fully feasible, even in the most challenging of domains.”
Machine learning algorithms needs to be trained to work. Traditionally human scientists supply labelled datasets to guide algorithms in their development.
It’s often tedious, expensive, or even impossible to supply training data for a specific situation, though.
Pure reinforcement learning could open a whole new universe of AI applications.
Marketing: Right now humans rate, code, and train marketing algorithms, but with reinforcement learning those algorithms could game out marketing strategies alone and supply targeted direction.
Healthcare: Machine learning algorithms show promise in detecting disease and risk factors using eye and skin scans. So many factors are involved that humans would have a hard time building a training set. Reinforcement learning could improve algorithms as well as increasing human understanding of how diseases present.
Manufacturing: No single corporate operating policy can be ideal for every situation. Reinforcement learning algorithms could be set loose on each individual factory or plant to optimize routines, achieving the maximum output for each set of circumstances.
AlphaGo Zero is evidence that working towards general-use deep learning algorithms using reinforcement learning is a realistic approach.
Both the Chinese Go program Fine Art and its Japanese counterpart Zen have been able to duplicate Zero’s results (though neither has surpassed them), proving that Zero isn’t a fluke.
Google is shifting focus from AlphaGo Zero to putting lessons learned from its development into practical use.
It will be exciting to see what they make of this incredible breakthrough for artificial intelligence.
While pure reinforcement learning remains in development, there are hundreds of sophisticated enterprise analytics programs on the market. Schedule a free consultation to discover the right business intelligence solutions to expand your business and explore how to unify them in one easily accessible place.
Artificial Intelligence is finding its way into every corner of data science. It’s an incredible asset for handling large amounts of data.
There is still room for other methods, though.
Data scientists were working long before artificial intelligence became a functional option, and there are applications that don’t especially benefit from contemporary AI.
Data Science vs Artificial Intelligence
Artificial Intelligence describes both the theory and the practice of creating advanced computer systems capable of simulating human intelligence.
The eventual goal is to design a system that can react reasonably to unexpected events without having to be programmed for each possible circumstance.
While artificial intelligence hasn’t quite reached that level yet, it has produced some impressive results in areas like:
Natural Language Processing
Data Science is the general practice of using scientific methods to find patterns in or draw insights from data.
It covers everything from preparing raw data for use to presenting results from analytics.
The relationship between the two is that artificial intelligence powers more efficient and accurate data science.
Data Science Without AI
While artificial intelligence has exciting potential, in relation to data science it’s seen only as a very promising tool.
There are areas of data science where scientists can get usable results from other methods.
The more popular non-AI data science practices include:
Regression explains relationships between variables.
Linear regression is the most commonly practiced statistical technique in the data science industry.
It finds the best linear relationship between a dependent and an independent variable in order to predict a target variable. Regression is used to:
Determine previously unknown correlation between factors
Decision Trees are hierarchical graphs of every possible option and the cascading consequences of choosing each option.
They’re a widely used method of inductive inference, straightforward in design and simple for non-data scientists to understand.
Besides predicting chance outcomes, decision trees are helpful in:
Time Series Analysis
Time Series Analysis involves analyzing time series data in order to find meaningful statistics.
A time series is a set of data points arranged by time or frequency, usually taken at regular intervals.
It’s a practical way to study events over time; for example, tracking seasonal shopping patterns in relation to control factors in order to prepare for surges in demand.
Time series analysis is often used for:
Stock market analysis
Long-term climate predictions
Visualization is the communications branch of data science, where data is translated into a visual context.
Making sense of highly complex data is complicated; having it displayed visually gives users the bigger-picture view needed to inform decisions.
Rather than being one specific technique; there are a number of different visualization tools that mainly fall into two categories:
Static visualization doesn’t change with time. It must be specifically edited to include new information. Some examples are printed charts or graphs.
Dynamic visualizations are based on a dataset that may change, like a graph that displays the results of streaming analytics on a dashboard.
Visualization is enormously useful for finding patterns, discovering unexpected correlation between features that may lead to other lines of inquiry, and explaining complex data to non-data scientists.
Data Science in the Future
The problem with the traditional approach to data science is that there’s so much data to be analyzed and so few data scientists to do it.
Non-intelligent methods are very talent- and time-intensive. They can take too long to produce results when answers are time-sensitive.
Artificial intelligence does require data scientists to set up and monitor, but the algorithms can power real-time analytics usable by non-technicians.
That shortens the gap between when data is produced and when it’s interpreted into actionable insights.
For that reason artificial intelligence will become more- not less- connected as the field matures.
Is your data science strategy producing the results you need? Concepta can help you implement an intelligent system to turn your data into decisions. Contact us for your free consultation!
The enterprise applications of machine learning are weaving themselves into the fabric of everyday business.
Still, the concept itself is hazily understood.
Over the last month we have shared posts intended to clear up the confusion between machine learning and other related topics like predictive analytics.
This article continues that trend by tackling one of the least helpful misapplications: when machine learning and data science are mistaken for each other.
Laying the Groundwork
Machine learning is a branch of artificial intelligence where, instead of writing a specific formula to produce a desired outcome, an algorithm “learns” the model through trial and error.
It uses what it learns to refine itself as new data becomes available.
Data Science is an umbrella term that includes everything needed to extract meaningful insights from data (gathering, scrubbing and preparing, analyzing, forming predictions) in order to answer questions or make predictions.
It includes areas like:
Data mining: The process of examining large amounts of data to find meaningful patterns
Data scrubbing: Finding and correcting incomplete, unformatted, or otherwise flawed data within a database
ETL (Extract, Transform, Load): a collective term for the process of pulling data from one database and importing it into another
Statistics: Collecting and analyzing large amount of numerical data, particularly to establish the quantifiable likelihood of a given occurrence
Data visualization: Presenting data in a visual format (charts, graphs, etc) to make it easier to understand and spot patterns
Analytics: A multidisciplinary field that revolves around the systematic analysis of data
What Falls Under the “Data Science Umbrella?”
“Data scientists are kind of like the new Renaissance folks, because data science is inherently multidisciplinary.”
Those words from John Foreman, MailChimp’s VP of Product Management, sum up the problem with trying to draw the boundaries of data science.
It’s a vast concept, describing intent more than a specific discipline.
There are, however, four fields generally agreed to cover the majority of data science where they intersect: mathematics, computer science, domain expertise, and communications.
Mathematics: Mathematics forms the core of data science. Data scientists need to know enough math to choose and refine the models they use in analysis, especially if they plan to work in machine learning. Understanding the math behind their formulas gives them the ability spot errors and weigh the significance of results.Also, while there are some data points that can be easily read without a heavy math background (conversions, website views, engagement rates, etc), others require specialized knowledge to understand. For example, time series data is very common in business intelligence but hard for casual users to interpret.Mathematical subdisciplines often studied by data scientists include:
Computer science: Data science may be older than computers, but the powerful effect of the digital revolution can’t be denied. Computers let data scientists process vast amounts of data and perform incredibly complex calculations at a speed that allows data to be used within a reasonable timeframe.Some of the areas where computer science intersects with data science:
System design optimization
Graph theory and distributed architectures
Artificial Intelligence and machine learning
Domain knowledge: Data science is a targeted practice. It’s used to generate insights about some specific topic. The data has to be contextualized before it can be put to use, and doing so effectively requires an in-depth knowledge of that topic.Today data science is being applied in nearly every domain. Perhaps some of the most interesting uses can be found in fields like business and health care.
Data-driven preventative health care
Disease modeling and predicting outbreaks
Improving diagnostic techniques
DNA sequencing and genomic technologies
Identifying and quantifying business problems to make data-driven decisions
Communications: Communications is often forgotten when discussing data science, but communication is relevant at nearly every stage of the data science process. It’s a critical link between theory and practice. Data has little value unless it can be applied to solve problems or answer questions, and it can’t be applied until someone other than the data scientist understands it. On the flip side of that statement, data scientists need to know what questions they’re trying to answer in order to choose the best analytical strategies.Though communications are often grouped with domain knowledge, it’s helpful to separate them to emphasize their importance. Here are a few data science-oriented applications of communications:
Data science evangelism (spreading awareness about the uses of data science)
Clarifying what is needed/desired from data
Presenting results in a useful way
Data visualization (graphs, charts, models)
The Data Science Process
If separating data science into the above disciplines were easy, though, it wouldn’t be its own field.
In reality each discipline is woven throughout the process with a large degree of flexibility in the combination of techniques used.
Here’s a general, very broad-scope view of the data science process and the disciplines that affect each stage.
Data is collected and stored. Computer science
Questions are asked. (What is needed from the data? What problems does the user hope to solve?) Communications, Domain knowledge
Data is cleaned and prepared for analysis. Math, Computer science
Data enrichment takes place. (Do you have enough data? How can it be improved?) Computer science, Math, Communications, Domain knowledge
A data scientist decides which algorithms and methods of analysis will best answer the question or solve the problem. Math, Computer science
Data is analyzed via Artificial Intelligence/machine learning, statistical modeling, or another method. Math, Computer science
The results are measured and evaluated for value/merit. Math
The validated results are brought to the end user. Communication, possibly computer science
The end user applies the results of data science to real-world business problems. Business, communication
This list is mainly intended to demonstrate how inextricably combined the component disciplines of data science are in practice.
The data science process is never as straightforward as this; rather, it’s highly iterative. Some of these steps may be repeated many times.
Depending on the results, the scientist might even return to an earlier step and start over.
Where the Confusion Lies
After reading this far, the reasons for the confusion between data science and machine learning have likely become clear.
Machine learning is a method for doing data science more efficiently, so it’s misunderstood to be a direct subdiscipline of data science.
In fact, looking at a list of things data science can accomplish reads like a pitch list for adopting machine learning.
Here are a few common data science applications to illustrate the point:
The reason for this overlap is that machine learning algorithms are very effective tools for sorting and classifying data.
That makes machine learning popular among data scientists, but it doesn’t have the inherent direction and sense of purpose of data science as a whole.
In simple terms: machine learning is a tool, data science is a field of practice.
Machine Learning Isn’t Necessary for Data Science…
While ML is an efficient way of performing data science, it’s not always the best solution. Sometimes it isn’t needed at all. Two notable cases when machine learning is the wrong tool for a job:
The problem can be solved using set formulas or rules. If there’s no interpretation needed and context doesn’t change the data, a mathematical model alone can handle the matter. There’s no point in spending resources on machine learning. It might lead to faster results if there’s a large amount of data, but it won’t produce “better” results.
There isn’t a massive amount of data involved. This is a case where machine learning does more harm than good. Machine learning requires data, the more the better. Without a store of prepared data to train the algorithm, it can produce unreliable results. Worse, training on a small or unrepresentative sample yields biased results. When there isn’t enough relevant data on a subject to fuel machine learning, other methods of data science are better options for finding answers.
But It Is a Game-changing Advantage.
Despite these limitations, machine learning offers such a distinct advantage that easy to see why data scientists are adopting it in such large numbers.
There are three main situations where it’s generally the best data science method:
There’s too much data for a human expert to process. Some data is perishable. By the time a team of human analysts works through it (even using standard computing methods) it’s aged out of usefulness. Other times data is flowing into a system faster than it can be processed. Machine learning algorithms thrive on massive amounts of data. They improve by processing data, so results actually become more accurate over time.
There is ambiguity in the ruleset. Machine learning has a long way to go before it can match the human potential for coping with uncertainty and inconsistency, but it’s made huge strides in drawing meaningful results from ambiguous data.
Programming a specific solution isn’t practical. Sometimes the code needed to program a solution is so big that doing so would be inefficient. In these cases, machine learning can be used to streamline the analysis process.
The Bottom Line
It’s definitely possible to do data science without incorporating machine learning.
However, the pace of data production is growing every day.
By 2020, 1.7 megabytes of data will be created every second per living person.
Most of that will be unstructured data.
Machine learning is the best tool for dealing with that volume and quality of data, so it’s likely to be used in data science for the foreseeable future.
How well is your company taking advantage of its data? Contact Concepta to learn how we can turn your data into actionable insights!