What’s a Convolutional Neural Network? A Look on AI’s Promising Tool

The world creates 16.3 zettabytes of data every year. Considering that a single zettabyte is one trillion gigabytes, that’s more than a pool of data. It’s an ocean, and it’s about to get deeper. Research group IDC predicts that the world will be creating 163 zettabytes by 2025. Some of the rise will come from the Internet of Things (IoT), but a large portion will be photos, comments, blog posts, videos, and other forms of unstructured data.

Data scientists are continually developing Artificial Intelligence techniques to analyze this flood of data while it’s still fresh enough to be useful.

One tool showing promise right now is the convolutional neural network (CNN).

Understanding convolutional neural networks requires a little familiarity with the mathematical definitions of the base terms “neural network” and “convolution”.

What’s an Artificial Neural Network?

An artificial neural network (ANN) is a construct used in machine learning which is modeled after the human neural network. It’s built from layers of nodes (called “neurons”) which can perform operations on data received and transmit that data to the next neuron.

Data passes through the layers of the network to produce a final result. The network “learns” to produce increasingly accurate results by passing adjustments back in a process called back-propagation.

What’s a Convolution?

A convolution measures how much one function overlaps another as the first passes over the second. It’s a mathematical way of “blending” the functions and studying the result.

What’s a Convolutional Neural Network?

A convolutional neural network is a specific type of ANN that applies convolutions to the data passing through. CNNs are composed of a convolutional layer (more commonly multiple layers) followed by fully-connected layers, often with intermediate subsampling layers as well.

There are four main layers of a convolutional neural network between the input and output layers.

  • Convolutional Layer (CONV): The convolution is applied to data from the input layer. This layer’s main purpose is to extract features from the input.
  • Activation Layer (RELU usually, but could also be tanh): This layer determines the final value of a neuron by applying a nonlinear activation function to the results of the convolutional layer. All negative values in the matrix are set to 0 while all other values remain constant. Using an activation layer speeds up the training of neural networks.
  • Pooling Layer (POOL): This is the subsampling layer which looks at the max of all previous values. It indicates if a feature was present in the previous layer, but not where, and allows later CONV layers to work on a larger section of the image or data.
  • Fully-Connected Layer (FC; also called the affine layer): Fully-connected layers are connected to all neurons in the previous network, as in a regular neural network. Using a FC layer isn’t mandatory, but it is an easy way to learn a linear function out of the feature space created by the previous layers.

There are very often multiple convolution and subsampling layers, each set looking for a different thing such as:

  • Edges (computer vision)
  • Sound ranges (audio classification)
  • Word grouping (text classification)

Convolutional neural networks are efficient to run and very fast to train. They’re also simpler to set up as they don’t require a large number of defined weights.

Evolving Applications

Computer Vision

Two factors make CNNs especially very well-suited for image recognition: location invariance and local compositionality.

Location invariance means that it doesn’t matter as much where a thing is as that it’s present and recognized.

For example, when sorting pictures of dogs from a group of Facebook images it’s not necessary to know where the dog is within the picture, just that it’s there.

Local compositionality suggests that things which are near each other are often related.

Humans have a very easy time with this, but machines have not so much. By using convolutions to blend features, a CNN can associate features with nearby features for more accurate identification.

Natural Language Processing

It may seem less intuitive to apply convolutional neural networks to natural language processing (NLP), but data scientists are seeing good results using the “Bag of Words” model to represent the text.

In a Bag of Words, individual words are separated and created as JSON objects along with a JavaScript variable representing frequency. The key is the word, the value is the frequency.

Using this model won’t help the convolutional neural network translate a document (other tools such as Recurrent Neural Networks are more useful there).

What they can do is provide a fast solution to document classification and language modeling problems.

CNNs are incredibly fast; they can quickly sort documents into types which can then be processed by slower, more accurate networks.

The Future of Convolutional Neural Networks

CNNs do have some flaws. They need a huge amount of data to process well and tend to learn more slowly than other types of neural networks.

There’s also an issue with translation invariance; convolutional neural networks don’t particularly care if a feature is in the right place, only that it’s present. (There are ways to augment the data to work around this last problem, however.)

Despite the issues, convolutional neural networks are growing in popularity. Their limitations don’t outweigh their value as a fast, reliable way to make sense of unstructured data.

As data scientists develop new ways to draw meaning from spatial relationships, CNNs should become more widely applicable in fields like audio processing and NLP.

Machine learning and neural networks are powering the latest generation of smart, flexible chatbots. To find out how a chatbot can help deliver outstanding customer experiences around the clock, set up a free consultation with one of our experienced developers today!

Request a Consultation