Looking into Luigi: A Workflow Management System Review

Data intelligence relies on a strong, functional data pipeline. However, the workflows that feed those pipelines can be rather arbitrarily complex.

Building, connecting, and maintaining complex workflows add unnecessary work for data engineers.

It’s not an arrangement that works in the fast-paced world of enterprise software.

Fortunately, developers can lower their workload with tools like Luigi.

What Is Luigi?

Spotify created and maintains Luigi, a workflow engine whose philosophy and concepts were inspired by GNU Make.

It’s a Python module that provides a framework for building and running complex pipelines of batch jobs.

What problem does Luigi solve?

Luigi’s main function is to take care of workflow management so developers can focus on other concerns.

It can be used to help build data pipeline tasks like declaring dependencies between tasks or defining the inputs and outputs of each task.

On top of creating data pipeline tasks, Luigi helps run them. It’s a good tool for handling dependencies, providing visualization tools, and handling and reporting failures.

When used with a central scheduler it can also enable distributed execution.

Benefits of Luigi

  • Smoothly resume data workflow after a failure.
  • Parametrize and re-run tasks on a schedule (daily, hourly, or as needed) with the help of an external trigger.
  • Organize code with shared patterns.
  • Command line integration.
  • Small overhead for a task (about 4 lines: class, def requires, def output, def run).
  • Everything is done by inheriting Python classes.
  • Can be extended with other tasks such as Spark jobs, Hive queries, and more.

Strengths of Luigi

Modular code makes software more reliable and easier to main and update.

With Luigi, writing modular code is simple. Developers can easily create complicated dependencies between tasks.

Better yet, managing those dependencies is equally straightforward.

Luigi’s simple API lets users build a build a highly complex tree of dependencies without making it too difficult to understand.

Other team members or outside maintainers can easily interpret the code.

Luigi is highly flexible. It relies on Python, which allows developers the freedom to create tasks that do anything needed.

Connecting components is easy and intuitive.

There’s no external or static configuration for the pipelines, only Python scripts, so everything is dynamic.

Last – but not least – is idempotency. Completed tasks are not run twice, so a failed workflow can be restarted from the middle.

It picks up right where it left off, which produces the same output every time.

Weaknesses of Luigi

One of Luigi’s main weaknesses is the flip-sides of one of its biggest strengths.

Specifically, it can’t re-run partial or old pipelines since it picks up where it left off.

It also has no native support for distributed execution.

Developers need to use a central controller to gain that functionality.

Some have found Luigi’s user interface to be hard to navigate.

This is one of the biggest reasons users move to Airflow, though with some practice the UI issue becomes less noticeable.

The biggest complaints of developers who’ve worked with Luigi revolve around issues with scaling.

There are two reasons for the tool’s scalability issues:

  • The number of Luigi worker processes is limited by the number of cron worker processes currently assigned to the job.
  • The web UI and scheduler run on a single threaded process. If the scheduler is busy or someone else is using the UI, the web UI suffers from frustratingly slow performance.

Comparison

Airflow (Airbnb)

Airbnb uses a lot of data heavy features: price optimization for hosts, property recommendations for guests, and internal tracing features to guide business decisions.

They created Airflow to meet their specific data needs, then decided to open source it in 2015.

It’s flexible and scalable, but users have experienced some problems with time zones, managing the scheduler, and unexpected backfills.

Pinball (Pinterest)

Pinterest created Pinball when they found none of the existing workflow management solutions met their requirements for customizability.

It has a lot of features and scales horizontally very well.

The community is small, though, and it doesn’t have good documentation.

Real-life Application

In practice Luigi is used for ETL (extract, transform, load) operations that feed data intelligence operations.

Luigi handles batch jobs, not streaming, continuous processes.

It’s not a data integration software, but it can be used to orchestrate custom data integration tasks.

Future outlook

Right now, Airflow is a more popular tool for workflow management.

Luigi still has its supporters and there are areas where it has the edge over Airflow and Pinball, but unless it can address its scalability issue it may not be able to maintain its user base going forward.

Every development project has unique needs. At Concepta, we build with tools chosen for each project to create a custom solution for every client. Claim your free consultation to see what we can do for your company!

Request a Consultation

Is JSON Schema the Tool of the Future?

json-schema

JSON Schema is a lightweight data interchange format that generates clear, easy-to-understand documentation, making validation and testing easier.

JSON Schema is used to describe the structure and validation constraints of JSON documents.

Some have called it “the future for well-developed systems that have nested structures”.

There’s some weight to those claims; it’s definitely become a go-to tool for those who get past its steep learning curve.

Reviewing the Basics

JSON, which is the acronym for JavaScript Object Notation, is a lightweight data-interchange format.

It’s easy for humans to read and write, and equally easy for machines to parse and generate.

JSON Schema is a declarative language for validating the format and structure of a JSON Object.

It describes how data should look for a specific application and how it can be modified.

There are three main parts to JSON Schema:

JSON Schema Core

This is the specification where the terminology for a schema is defined.

Schema Validation

The JSON Schema validation is a document which explains how validation constraints may be defined. It lists and defines the set of keywords which can be used to specify validations for a JSON API.

Hyper-schema

This is where keywords associated with hyperlinks and hypermedia are defined.

What Problem Does JSON Schema Solve?

Schemas in general are used to validate files before use to prevent (or at least lower the risk of) software failing in unexpected ways.

If there’s an error in the data, the schema fails immediately. Schemas can serve as an extra quality filter for client-supplied data.

Using JSON Schema solves most of the communication problems between the front-end and the back-end, as well as between ETL (Extract, Transform and Load) and data consumption flows.

It creates a process for detailing the format of JSON messages in a language both humans and machines understand. This is especially useful in test automation.

Strengths of JSON Schema

The primary strength of JSON Schema is that it generates clear, human- and machine-readable documentation.

It’s easy to accurately describe the structure of data in a way that developers can use for automating validation.

This makes work easier for developers and testers, but the benefits go beyond productivity.

Clearer language allows developers to spot potential problem faster, and good documentation leads to more economical maintenance over time.

Weaknesses of JSON Schema

JSON Schema has a surprisingly sharp learning curve.

Some developers feel it’s hard to work with, dismissing it as “too verbose”. Because of the criticism, it isn’t well known.

Using JSON Schema makes projects grow quickly. For example, every nested level of JSON adds two levels of JSON Schema to the project.

This is a weakness common to schemas, though, and depending on the project it may be outweighed by the benefits. It’s also worth considering that JSON Schema has features which keep the size expansion down.

For example, objects can be described in the “definitions section” and simply referenced later.

What Else Is There?

Some developers prefer to use Mongoose, an Object Document Mapper (ODM) that allows them to define schemas, then create models based on those schemas.

The obvious drawback is that an extra abstraction layer delivers a hit to performance.

Another option is Joi, a validation library used to create schemas for controlling JavaScript objects. The syntax is completely different, though, and Joi works best for small projects.

Sometimes developers jump into a new MongoDB with a very flexible schema. This inevitably dooms them to “schema hell”, where they lose control as the project grows.

When JSON Schema Is the Right Choice

Performance is undeniably important. However, there are times when the cost of recovering from mistakes is far higher than the cost of taking the speed hit that comes with schema validation.

For those times the performance drop isn’t large enough to justify the risk of bad data entering the system, and that’s where JSON Schema comes into play.

JSON Schema is proving itself as a development option, but there’s no single “best tool” for every project. Concepta takes pride in designing a business-oriented solution that focuses on delivering value for our clients. To see what that solution might look like for your company, reserve your free consultation today!

Request a Consultation

Predicting Profit: Growing Revenue by Scoring Opportunities with Data Intelligence

artificial-intelligence-applications

Of all the artificial intelligence applications making their way into enterprise, one of the most effective is predictive analytics.

It has the potential to transform the decision-making process from something based on “gut instincts” and historical trends to a data-driven system that reflects actual current events.

Those who have adopted it are outperforming their competitors at every turn.

Some organizations still hesitate to launch their own predictive initiatives. Artificial intelligence has been a buzzword for decades, and many times in the past it failed to deliver on its promises.

Executives with an eye on the bottom line might avoid any kind of AI technology out of understandable skepticism.

That hesitation could be holding these companies back. Predictive analytics has matured into a viable enterprise tool. It’s being used all over the world to find opportunities for growth.

The impact is so striking that 71% of businesses are increasing their use of enterprise analytics over the next three years.

Read on to learn more about predictive analytics, what it offers for enterprise, and how it can drive a measurable increase in revenue.

What Does “Predictive Analytics” Mean?

Predictive analytics is the practice of analyzing past and present data to make predictions about future events.

Technically it can describe any process that seeks to identify the most likely scenario by drawing parallels between historical and current conditions, then placing those conclusions in a modern context.

Analysts look at what happened in the past when conditions were similar and how certain events played out. They assign more weight to factors which have tended to be more influential or which have greater potential for an extreme outcome.

Most people assume predictive analytics is a recent invention, something that arose after computers established their role in enterprise.

In reality, businesses have been relying on it since the late 1600s, when Lloyd’s of London began applying the practice to their shipping insurance estimates.

In pre-artificial intelligence days, people used statistical modelling and complex mathematical equations to perform predictive analytics.

Many updated versions of those models are still in use for industrial shipping and route planning.

Non-intelligent methods have limitations, though. They only consider factors users decide to include, so there’s a heavy likelihood of human error and bias.

It also takes time to perform the calculations. By the time they’re done and put into use the data is already becoming outdated.

The modern style of predictive analytics incorporates artificial intelligence and machine learning. It allows users to include many more variables for a broader, more comprehensive analysis.

The process highlights unexpected connections between data and weighs factors with greater accuracy. All of this can be done with a short enough timespan to create timely, reliable insights.

The Intersection of Science and Enterprise

AI and machine learning have exciting potential, but like all emerging technology they require investment. Global brands like Google might have the resources to shrug off a failed project.

For the majority of organizations, though, there has to be a reasonable expectation of profit to consider launching a technology initiative.

Successful enterprise leaders aren’t reckless with their IT budgets. They focus on actual business goals and source technology that addresses those instead of trying out popular new tools.

A smart executive’s first question about a new tool should be, “How does this benefit the company?”

Predictive analytics has an answer for that question: growing revenue through refined opportunity scoring.

The Value of Opportunity Scoring

Opportunity scoring involves assigning a value to a sales lead, project, goal, or other potential course of action in order to determine how much relative effort to spend in pursuit of that opportunity.

Scoring opportunities allows a company to get a greater return on their time and money. No company can put the same investment into every customer or chance that crosses their path.

They shouldn’t even try. 80% of revenue comes from 20% of clients, so it makes sense to prioritize that 20%.

It’s something every business does, even if there isn’t a standardized process in place. High value sales leads are called frequently, given more flexibility with concessions, and assigned better sales representatives.

High value projects get bigger teams, more access to resources, larger budgets, and scheduling priority.

The trick is deciding what opportunities have the most potential- and that’s where predictive analytics come into play.

Manual Versus Machine Learning

Two main types of opportunity scoring methods are in widespread use today: those that are done manually and those that take advantage of machine learning.

Manual scoring is where people assign scores, either personally or using a statistical model, based on their own set of influential characteristics.

There are a number of problems with this method that can slow businesses down or leave them with unreliable scores.

Inaccurate

Opportunity scoring is incredibly valuable- when it’s reliable. Manual scores have a wide margin of error. They depend on direct user input with no ability to suggest other relevant factors.

Unlike intelligent scoring, manual methods can’t easily be used to find unexpected commonalities among high-return accounts.

The problem with this approach is that executives can’t realistically imagine or keep track of everything that might influence their company.

There’s too much data to consider, and it changes constantly. All manual opportunity scoring is therefore based on aging, less useful data. On top of that, it’s easy to make mathematical mistakes even with a computer program’s assistance.

Subjective

Because users choose and weigh the contributing factors, manual scoring is highly susceptible to human bias.

Preconceptions about social categories, personal characteristics, industrial domains, and other identifying factors can be given too much (or too little) weight. It allows for unhelpful biases to be introduced into the sales cycle.

Most of the time the result is simply less helpful scores, but it does occasionally create a public relations issue if the scoring system leads to operational discrimination.

For example, some real estate brokers have run into a problem where one racial group wasn’t shown houses in a certain area. The company’s scoring suggested those groups were less likely to buy there.

The realtors thought they were following good business practices by relying on their internal customer scoring, but those scores were skewed by biases about economic stability rather than actual data.

When the situation came to light it created a public impression that the realtors had the same bias. Suddenly they had bigger worries than opportunity scoring.

Inefficient

Creating and maintaining a manual scoring process takes a significant chunk of time. It’s not a system that responds well or quickly to change.

That’s a problem in today’s hyper-connected world, where an event in the morning could start influencing sales across the country that afternoon. Opportunities wind up passing before they’re even recognized.

Not everyone remembers to consider “hidden costs” of ineffective processes, like higher labor.

Tedious data entry takes time and focus away from more productive sales activities without a corresponding return on value due to the lower accuracy.

There’s human nature to consider, as well. Sales teams can tell these scoring systems aren’t effective.

It’s not uncommon for teams to resist wasting time they could be working leads. They don’t like putting their commissions at risk with bad information, but still need some kind of guidance to help manage their lead interactions.

More often than not, they operate with outdated scores or “gut instincts”. That in turn frustrates managers who have invested in the inefficient manual scoring process.

The conflicting pressures create an uncomfortable working environment that drives unwanted turnover among the most valuable sales agents.

Rigid

Even the most sophisticated manual scoring program can only account for things that have been specifically input into the equations.

They require humans to think of and assign value to every possible factor. This tends to enforce a “business as usual” mindset over more profitable responsive operations.

On the flip side, predictive opportunity scoring is among the leading AI-based drivers of revenue in an enterprise context. It has the edge over other methods in several areas.

Reliable

There are two central reasons behind the higher reliability of intelligent scoring. First and foremost, it reduces the impact of human error.

Machine learning algorithms perform calculations the same way every time. Even as they adjust themselves in response to new data, the underlying math is more reliable than calculations done by humans.

It’s also important that predictive analytics is purely data-driven rather than focusing on “traditional knowledge” about what makes an opportunity valuable.

Artificial intelligence expands a computer’s capacity to judge the relevance of seemingly unconnected events.

Predictive analytics leverages that capacity to identify characteristics shared by highly productive courses of action. Those commonalities are then used to weigh future opportunities with a greater degree of accuracy.

Responsive

Predictive analytics fed by a constant stream of new information allow predictive opportunity scoring tools to update scores in real time (or at least near-real time).

They highlight opportunities in time for companies to take action. Advance warning also leads to better advance planning when it comes to inventory, staffing, and marketing.

Objective

Intelligent scoring tools react to actual circumstances based on data instead of presumptions. They consider all data available for a situation from an impartial standpoint (providing, of course, that any developer bias is accounted for during design).

Efficient

The majority of enterprise analytics software is designed to be user-friendly and easy to operate. It connects to a company’s data sources, meaning there’s usually very little data entry required. This frees the sales team to focus on their accounts and other high value activities.

Optimized Opportunity Scoring = More Revenue

Predictive opportunity scoring is backed by sound theory. The practical benefits are just as solid, with several applications proving their enterprise value today:

Lead Scoring

There’s a saying that 80% of revenue comes from 20% of clients. Lead scoring helps identify that 20% so sales teams can focus on the most valuable customers.

Agents close more contract- and higher value ones- when they know where to profitably spend their time rather than chasing a weak prospect for a low bid.

Intelligent lead scoring is a dynamic process, meaning it regularly reevaluates leads based on changing circumstances.

A client marked as a low priority lead moves up as indicators show rising interest. Agents then step in at the right point in the purchase cycle with a targeted incentive to encourage a sale (or even an upsell).

Customized attention has the added benefit of increasing repeat business, because former clients aren’t bombarded with high-pressure closing tactics when they aren’t ready to buy again.

Enterprise benefit: Better closing ratios, greater PLV, higher average order value

Targeted marketing campaigns

When executives know where advertising has the most potential impact, they have more options for strategic spending.

Specific demographics can be targeted with tailored campaigns to increase their lifetime value, keep them from falling out of the sales cycle, or meet another specific business goal.

Enterprise benefit: Higher ROI on advertising campaigns

Inventory management

Information can travel around the world near-instantly, but inventory still has to be physically moved.

That puts a natural drag on rearranging supply levels in different regions – and with it, a cap on how much companies can exploit regional fluctuations in demand.

This causes lower sales in general, but the effect is most striking during an unexpected surge in demand.

Predictive analytics can spot early indicators of a potential spike while excitement is still building.

Executives have the warning necessary to shift goods and team members where they’re needed most. Amazon uses this method to stage their merchandise for faster shipping of commonly ordered items.

Enterprise benefit: Higher overall sales revenue

Opportunities for growth

Opportunities aren’t always about marketing campaigns or scoring leads. Sometimes executives want guidance when choosing between growth strategies.

Predictive analytics and opportunity scoring are useful here as well, answering questions like:

    • Which stores will be most valuable over the next year?
    • Where should the company expand?
    • Should more franchises be sold in a specific area?
    • What ventures are most likely to succeed?
    • Is a specific project worth the investment?

There’s no guarantee that any course of action is the best, but incorporating data cuts out many of the risk factors that lead to failure (such as hype over new technology or personal bias).

Enterprise benefit: Faster, more sustainable growth

Putting Data to Work

At the end of the day, data is only valuable when it serves real-world business goals.

Opportunity scoring is one of the most proven ways to extract value from data. It’s also one of the most accessible, since embedded analytics are built into the majority of modern enterprise software.

With so much to gain at a relatively low investment point, those who haven’t adopted yet should be giving predictive analytics a closer look.

Are you frustrated by trying to navigate multiple streams of data? One of the most common pain points of data intelligence initiatives is reconciling data from the enterprise software used by different departments. Concepta can unite data from programs like Salesforce, MailChimp, and other analytics programs to put data where it’s needed most- in your hands!

Request a Consultation

Nissan’s Layered Approach to Data Science: Cutting Costs While Maximizing Sales

Layered-Approach

The most convenient thing about data intelligence is that the same resources gathered in one part of enterprise can also be used by another.

Data on sales patterns can be applied to supply chain optimization or marketing efforts, and nearly everything informs intelligent customer profiles. To get the most out of their data, companies need to maximize its usage across departments.

Consider auto manufacturer Nissan. They’ve created an intuitive, futuristic experience for drivers while lowering their operational costs.

How? By implementing a layered approach to data science that spreads data utilization across the operational structure from sales to manufacturing and maintenance.

Pulling Sources Together

Nissan is emerging as a leader in turning data into actionable business insight. They use a large percentage of their available data, which comes from sources like:

  • Regional sales data (sorted by vehicle model, color, and type)
  • Website activity
  • Consumer interactions with online “vehicle design” features
  • Marketing campaigns
  • Social media
  • Dealer feedback
  • Warranty information
  • Vehicle status reports from GPS/system monitoring functions
  • Driving data

To avoid privacy issues and protect drivers, Nissan anonymizes most vehicle-generated data. For example, instead of noting “this specific vehicle had a computer fault” they track the percentage of vehicles which throw the same fault.

Putting Data to Work

Data is at the heart of Nissan’s growth strategy. Asako Hoshino, Senior Vice President of their Japan Marketing and Sales Division, put it best at a speech at a 2016 Ad Week conference:

“You can’t just be bold, because your success rate will not increase. You have to couple boldness with science. It has to be grounded in science, and it has to be a data set that will underline and support the big decisions you make.”

Nissan uses their data to increase sales in carefully targeted ways. They run the usual sales tracking by region and vehicle, but they also seek out additional details.

Potential customers looking for a test drive fill out an online request form that gives Nissan location-specific data about popular colors, models, and features. This feeds into a tailored inventory for the region and guides dealership placement. It also helps to create highly targeted advertising.

Advertising is another area where Nissan excels. They use advanced visualization tools to make real-time performance metrics on their marketing campaigns accessible to senior leadership.

The data builds a dynamic profile of customers, suggesting which incentives might work best in certain markets and which tend to .

Like much of Nissan’s data structure, marketing data has wider applications. It’s used to create research and design initiatives that deliver features customers actually want.

Some features matter more to consumers than others, but there’s room to show off new technology while still keeping the features that drive sales. Data highlights these opportunities for technological distinction.

Technology is a big pull for today’s drivers, especially when it saves them time and money. Nissan pushes data-centric “connected car” features like predictive maintenance, advanced navigation software, remote monitoring of features, and over-the-air updates that take a lot of the guesswork out of vehicle ownership.

Increasing sales is only half the benefit of data science. Nissan has reduced their operational costs as well. Predictive maintenance- using data to service equipment before it breaks down- keeps their manufacturing process working smoothly.

That’s essential in a market where cars need to be more customized but still built to high standards on a short timeline.

Drivers have busy lives as well, which is why Nissan has a customer-facing application of their predictive maintenance data. They track aggregated vehicle data to detect potential flaws and plan repairs before they become expensive recalls (or worse, cause accidents).

When a vehicle does come in to a dealership for repairs, technicians can use the onboard data to quickly and easily verify warranty claims. This saves the driver time while lowering investigation costs and preventing unwarranted repairs.

Measurable Results

In 2011 Nissan set a goal to achieve 10% market share in North America. Nissan North America reached 10.2% market share in February of 2017.

They relied heavily on data science for guidance, specifically in providing targeted inventory and marketing to smaller regions while giving local leaders the right analytics tools to plan their own sales campaigns.

What Nissan Does Right (And What Others Can Learn)

Breaking down data silos

Data silos had been a major hindrance to Nissan’s data science efforts. In late 2016 to early 2017 the company began to address this by employing Apache Hadoop to create a “data lake”. The data lake holds 500TB of data, all potentially accessible for analytics.

Using data in multiple ways

Data is usable by key leaders throughout the company and can be referenced wherever needed. This leads to data-driven decision making at every level. It has the side effect of lowering the individual “cost” of data since it’s reused multiple times.

Encouraging internal adoption throughout the business

Data can be transformative – but only if it’s used. Nissan North America invited key data users from a variety of business areas to an educational internal event on data. They held workshops on their data platform and visualization tools, encouraged networking between IT and end users, and provided resources for further training.

As a result, active users of the analytics platform went from 250 to 1500 by end of its first year. IT saw fewer data requests, most of which were asking IT to add verified sources instead of looking up information.

Creating a layered approach to data science looks intimidating, but it can be as simple as uniting reporting streams in a single place. Concepta’s developers can design a dashboard solution tailored to your company’s unique needs, presenting real-time streaming data through dynamic visualizations. Set up a free consultation to find out more!

Request a Consultation

 

How Data Science Can Help Your Enterprise Generate More Revenue

data-science-revenue

Data science is a dry term for a surprisingly cool field. When used right it acts like a team of digital detectives, sifting through a company’s data to ferret out inefficiencies and spot opportunities in time to act.

“Used right” is the key phrase here. Data science is a complex field, and finding a path to revenue presents a challenge for companies trying to modernize their digital strategy. Sometimes it’s hard to see past the hype to the actual business value of investing in data science.

To help cut through the noise, here’s a clear, results-focused look at exactly how data science generates revenue for enterprise.

Laser-focused marketing campaigns

When it comes to marketing campaigns, there’s no such thing as too much data. Over 80% of senior executives want detailed analysis of every campaign, but they often lack the time or data to gain real insight into campaign performance.

Source: Concepta, Inc

Data science addresses both those concerns. Artificial intelligence and machine learning methods cut down on the time necessary to process data while better data management provides the fuel to feel analytics.

The right combination of data science techniques can help track how campaigns are doing by market and by demographic within that market. This includes information as general as click-through rates to sorting the time spent on a company’s page by the originating site.

Armed with this information, marketers can refine the ads they push to each market based on what works, not what should work based on broad demographics. They can even identify customers who failed to convert late in the process. About 70% of these customers will convert after being retargeted.

The results are impressive. Using data to guide marketing campaigns leads to a 6% increase in profitability over companies that are reluctant to adopt data science.

Better e-mail follow-through

E-mail optimization is probably the most direct example of data science driving revenue.

E-mail is a major source of revenue for enterprise, especially for B2B companies and those that focus on e-commerce. A full 86% of professionals prefer to use e-mail for business correspondence.

The same percentage are happy to receive e-mail from their favorite businesses (providing it doesn’t get excessive).

More than half of CMOs say increasing engagement is their main concern about e-mail marketing this year, but three quarters of them don’t track what happens after e-mails are sent.

Only 23% use data science tools to track e-mail activity. A mere 4% use layered targeting, and 42% use no targeting at all. (Four out of five do perform at least some customer segmentation, though.)

This oversight has a serious effect on the bottom line. 51% of marketers say a lack of quality data is holding their e-mail campaigns back. Without data to guide them, they struggle to evaluate customer satisfaction with the frequency and quality of the company’s e-mails.

Increasing e-mail quality using data science has measurable benefits. When customers make a purchase through links in an e-mail they spend about 38% more than other customers.

80% of retail leaders list e-mail newsletters as the most effective tool in keeping customer retention rates high.

On a smaller scale, personalizing e-mail subject lines increases the open rate by 5%. Triggered messages such as abandoned cart e-mails have an astounding 41% open rate (and remember that 70% retargeting conversion rate from earlier).

Lead management

Sales staff only have so much time, and analog lead assessment methods yield questionable results. Artificial intelligence-powered data science tools can analyze a company’s past sales and customer data to effectively score leads, letting sales staff make the most of their business days. These tools consider factors like:

  • Actual interest in product as demonstrated by events like site visits and social media discussion
  • Position in purchase cycle based on time spent on specific areas of a website
  • Demonstrated potential purchasing power and authority to enter contracts

Using AI in lead management results in 50% more appointments than other methods. Those appointment are shorter and more productive, too, since businesses can target customers who are ready to buy.

The overall reduction in call time averages around 60% without damaging customer satisfaction rates. That’s why 79% of the top sales teams use data science to power their lead management.

Intelligent customer profiling

Knowing who the customer is and what they want is key to both marketing and customer service. Data science removes the potential for human biases about customers. Specifically, it looks for what customers have in common and groups them by that instead of imposing arbitrary demographic boundaries.

Profiling software analyzes all available data on a company and its customers to find previously unnoticed similarities. These hidden connections can be then used to drive revenue in different ways.

They’re particularly good at identifying customers with the highest potential lifetime value or highlighting potential extra services current customers might enjoy.

A great intelligent customer profiling success story in this arena comes from video distributor Giant Media. After using data science to build data-driven customer profiles they found 10,000 new leads across the United States.

500 brands were in their desired New York City market. The software even isolated 118 businesses that matched Giant Media’s idea profile and provided contact information fast enough to enable effective sales calls.

Improving customer experience

One theme keeps popping up in sales and marketing discussions: customer experience is king. It’s predicted to be the primary brand differentiator by 2020. 86% of customers value a good buying experience over cost and will pay more for better service. Once they’ve had that positive experience they’re 15 times more likely to purchase from the same vendor again.

Source: Concepta, Inc

What is considered “good” customer service? Besides obvious factors like reliable customer service and solid quality, personalized service seems to be the key to winning over customers.

Data science provides insights that allow for that personalized service on a large scale. It can offer tailored interactions such as:

  • Suggesting products based on past purchases
  • Retargeting customers at appropriate intervals (for instance, reorder reminders for pet food or garage coupons as a customer’s vehicle hits certain milestone)
  • Reminders around holidays (like Mother’s Day or family birthdays)

In short, creating an outstanding customer experience requires knowing what the customer values and being able to offer it on demand. Data science is invaluable here. Chatbots in particular are useful for providing assistance that customers need, when they need it, and in an accessible format.

Timely sales forecasting

Sales forecasting without modern data science methods takes far too much time. Reports are huge, hard to get through, and don’t arrive in time to help sales staff. As a practical compromise sales staff often rely on wider-scope numbers which are more readily available instead of targeted data on local customers.

Data science – specifically predictive analytics – can provide near-real time information on what’s selling, where it’s selling, and who’s buying it. This prepares companies on a structure level to spot opportunities and make the most of them.

It increases overall enterprise flexibility. Plus, sales staff can use the information to build better pitches, improve relationships with their customers, and generally make better use of their time.

Supply chain management

Managing the supply chain feeds directly into revenue. After all, companies can’t sell what they don’t have. Data science provides insights that enable more efficient internal operations, which leads to better margins. To get specific, insights gained from data science can be used to:

  • Keep enough inventory on hand to meet demand, regardless of season
  • Make deliveries on time despite potential delays
  • Schedule services more accurately so customers can plan their day

Pitt Ohio Freight Company saw a major boost in sales after applying data science to their supply chain problems. They trained algorithms to consider factors like freight weight, driving distance, and historical traffic to estimate the time a driver will arrive at their delivery destination with a 99 percent accuracy rate.

Their customers were highly impressed. Pitt Ohio now enjoys $50,000 more in repeat orders annually, and they’ve reduced the risk of lost customers as well.

Price optimization

Pricing is tricky. The goal is to find a profitable price that the customer is happy to pay so as to ensure repeat business. An enormous number of factors affect pricing, and it’s hard for humans to tell what’s important and what isn’t.

Data science has no such handicap. It can be applied to customer, sales, inventory, and other market data to uncover what actually influences a customer’s willingness to buy at a specific price. Based on that, companies can find the ideal price to make everyone feel satisfied with the purchase.

Airbnb uses a dynamic system based on this concept. The company tracks local events, hotel trends, and other factors to suggest the best price to its hosts.

This is a major part of their business strategy since hosts aren’t usually professional hoteliers; that guidance is necessary to keep hosts happy and listing with Airbnb.

Some hotels have a more complex system for setting prices. Rates used to be uniform across the board. Changes were only triggered by season or maybe rewards club status. Data science opened the doors to a more egalitarian pricing strategy.

Now each user can be shown a customized price based on a number of objective factors.

  • Is the trip for business or pleasure?
  • What rates did the customer receive for past stays?
  • How valuable is the customer as a client?
  • Does the customer have a booking at a competitor which they might be willing to change?
  • Will the customer be using cash or points?
  • Does the customer have past incidents of bad behavior at the family of hotels?

Interestingly, customers who caused expensive trouble during previous stays may be shown higher rates to discourage a booking.

Marriott, who were early adopters of data science in the hospitality industry, is an interesting case. The hotel chain was generating $150-200 million per year in the 1990s by intelligently managing their Revenue Per Available Room, or RevPAR. It’s still growing at a rate of 3% a year.

As a general trend, applying data science to price optimization increases revenue by 5-10%. The most benefit is seen in season-dependent industries such as hospitality.

Looking to the future

Industry leaders are taking note of these benefits. As a result, data science is fast becoming the preferred way to fuel digital transformation efforts. Global revenues for big data and business analytics were up 12.4 percent last year, and commercial purchases of hardware, software and services to support data science and analytics exceeded $210 billion.

Source: Concepta, Inc

Companies who hesitate to adopt data science will soon be left in the dust by their better-prepared competitors. Now is the time to make the business case for integrating data science- before it’s too late.

Not sure where to start? Concepta can advise on and customize powerful data science systems to meet your specific needs. Schedule a free, no hassle consultation to find out how!

Request a Consultation

 

Python Vs R: What Language Should You Use For Data Science?

python-r-data-science

R and Python are the two most popular programming languages for data scientists, and choosing which to focus on is one of the most formative, career-shaping decisions young analysts make.

R and Python have a lot in common: both are free, open source languages developed around the same time (in the 90s) and favored by the data science crowd.

Read on for a quick look at both languages, an overview of the debate, and when each would be the better choice for a specific data science project.

What is Python?

Python is an interpreted programming language used mainly for web applications. It’s high level, robust, and object-oriented.

Python features integrated dynamic semantics and dynamic typing and binding.

Applications written with Python have lower maintenance costs because of the focus on readable syntax.

The language has a fast edit-test-debug cycle which makes it useful for Rapid Application Development.

It supports module and packages, allowing for modular design and reusability of code across projects.

Debugging Python is simple, too. Instead of causing segmentation faults, bad input and bugs raise exceptions.

Data scientists have several good reasons to like Python, including:

  • Simplicity: Python is easy to learn and use, letting data scientists focus on their work rather than wrestling with arcane code.
  • Productivity: A Swiss study of programming languages found Python to be among the most productive languages around.
  • Readability: Python was explicitly designed to be both terse and readable.
  • Support: There are huge support libraries and third-party packages available.

What is R?

R is an open source programming language and software environment with a focus on statistical computing and numerical analysis. It is procedural as opposed to object-oriented.

What sets R apart is its wide variety of statistical and graphic techniques, including clustering, time-series analysis, linear and nonlinear modeling, classical statistical tests, and classification.

It supports matrix arithmetic, with packages that collect R functions into one place.

In addition, users find creating polished plots with scientific symbols and formulae easy with R.

R has a lot to offer data scientists, specifically:

  • Granularity: R offers deep insights into large data sets.
  • Flexibility: There are numerous ways to accomplish a specific goal with R.
  • Visualization: R features superior data visualization tools that help make data approachable by scientists and non-scientists alike.

Common Criticisms on Both Sides

There are drawbacks to each language. Python isn’t the best tool for mobile applications, for example. That doesn’t impact its use for data science much, but there are other considerations.

Python suffers from the slower nature of interpreted languages, which must be executed through an interpreter instead of a compiler.

Also, like all dynamically-typed languages it requires more testing to avoid runtime errors.

Some data scientists have criticized Python’s weak database layer for making it hard to interact with complex legacy data.

The language can be weak with multiprocessor or multicore workings. There’s also the fact that data analysis functions have to be added through packages.

R has its own critics. Many of them relate to complexity. R is harder to learn, and its syntax isn’t as clean as Python’s.

It can’t be embedded in a web browser. R does have more statistical analysis tools than Python, but otherwise there are many fewer libraries.

At scale R’s complexity only grows. Maintenance becomes difficult, and poor memory management causes it to slow down when too many variables are stored.

It’s sometimes slower than Python, though neither is known for speed.

Finally, R is considered less secure than Python.

The risk can be mitigated using container options on Amazon Web Services (AWS) and similar, but developers need to pay special attention to this potential weakness to avoid costly breaches.

Which to Choose and When

Using both languages will give the best results, but that isn’t always practical or even sensible.

To choose the right programming language, data scientists should consider their primary interests and purpose.

R has superior data visualization. It was specifically built with statistics and data analysis in mind. Users have created packages that cover an impressive amount of specialized statistical work.

There are more packages available for tasks like machine learning and analysis. In contrast, Python has limited package options.

Python is a general purposes programming language, so it’s more robust than R. It excels at building analytics tools and services: automating data mining, data munging, and scraping websites.

R has packages for machine learning, but in general Python better supports machine learning and deep neural networks.

So which language should data scientists use? When the task ahead is mainly mathematical and leans heavily towards statistics, use R.

When the task is engineering-heavy or involves experimenting with new methods, use Python.

There is a generous overlap between languages, but following this guideline will steer data scientists in the right direction nine times out of ten.

Are you having trouble interpreting results from your data science software? Does your company have trouble reconciling data from one program with another? Concepta’s developers can build a custom dashboard to put your data where it can do the most good. Schedule a free consultation today!

Request a Consultation