ML represents a major breakthrough that has enabled AI to go mainstream. It is a technique for teaching machines how to learn to do something, as opposed to the traditional approach of teaching computers exactly how to do something, as in traditional computers. For recognizing animals in images, instead of finding unique visual characteristics and patterns from images of those animals and then programming the logic for each animal, ML involves feeding images of those animals to a learning framework and letting the AI machine figure out on its own the visual patterns and significant differences between images of different animals. Hence the machine learns by itself instead of being specifically programmed to do a unique task. The core architecture used in ML is an artificial neural network. Data inputs are fed to the neural network (pixels of animal images in the previous example). Mathematical transformations are performed within the neural network producing a decision as its output (which animal). The output decision is a transformation of the data input by the neural network. The output is essentially a metamorphosis of the input. A butterfly is a good analogy. A caterpillar creates a cocoon around it and at some stage a butterfly emerges. The cocoon is the “neural network” that transforms the caterpillar into a butterfly.
Neural networks are inspired by the neurons in our brain and the way they are interconnected. Our brain has about 100 billion neurons, interconnected with other neurons. In an AI neural network a neuron is modeled with a “perceptron” (graphic on the right side). The comparison between a real neuron and a perceptron is just for our understanding. In reality, a neuron network in our brain is far more complex than any model we can create. To have a detailed understanding of how all this works requires an excellent understanding of mathematics and neurobiology and is out of this book’s scope. Our objective is to provide an insight into the principles of operation of how a machine is able to learn to take decisions. What follows now is an explanation of how these neural networks are built and how they work. It is semi-technical and some readers may want to skip it.
On the following page there is a model of a neuron and its mathematical analog, a perceptron. Artificial neural networks are huge networks of perceptrons, each built with a processor running the mathematical equations of the perceptron.
Each neuron has several impulse inputs called dendrites and one branching output called an axon. A transformation happens to the impulses within the neuron (and within the perceptron) between the inputs and the output. In the neuron, the transformation is electro-chemical; propagation is by means of a voltage difference called the action potential. The transformation in the perceptron can be represented by a mathematical function. The contribution of each input to the output is determined by the “weight” (w) associated with each input. The output carries a “bias” (b) and a non-linearity (or activation function). The perceptron with weights, bias, and the activation function is represented as follows:
The perceptron body transforms the inputs to a single output. This transformation depends on the weights, bias and the activation function.
In a neural network, thousands of perceptrons are networked together in multiple layers (depth) —?creating a deep learning network.