Series: Basic Intuitions of Machine Learning & Deep Learning for beginners
Chapter 1: Machine Learning Algorithm Paradigm
Originally published: 16 February, 2021 By Michio Suginoo
What is Machine Learning Algorithm Paradigm? How is it different from the Traditional Algorithm Paradigm? That is the theme of this chapter.
Footnote Remark: Basic Terminology
Before getting into Machine Learning topic, here is a footnote remark. Machine Learning has its own unique terminology.
As an example, here in the figure below, we have a simple equation: a dependent variable, Y, on the left is a function of an independent variable, X, on the right.
Following the convention of Machine Learning, I will call X Features, instead of Independent Variable; and Y Labels or Targets instead of Dependent Variable throughout this series.
The Limitation of Traditional Machine Learning Paradigm
Now, in order to illustrate the mechanism of Machine Learning, let’s contrast it with the traditional algorithm paradigm. In the traditional algorithm paradigm, programmers explicitly pre-determine rules that map the input data into the answer. Naturally, in the workflow of the traditional algorithm paradigm, the rules come first. Overall, you have to have a good idea about the rules in advance. It is an intuitive approach.
The figure below illustrates this notion.
Now, here is a question. How can you set rules to detect the cat in the picture below?
In the traditional algorithm paradigm, you have to explicitly hand engineer appropriate rules in order to capture details such as eyes, ears, mouth, and so on. As the complexity of tasks increases, it would become progressively more difficult, or even impossible, to predetermine the rules.
Such a limitation of the traditional algorithm paradigm set the stage for the emergence of Machine Learning Paradigm.
Then, what is Machine Learning Paradigm? How does it address the limitation of the traditional algorithm paradigm? Machine Learning Paradigm
Here, for the sake of simplicity, we focus only on Supervised Machine Learning, where we have actual labels in the dataset.
The next figure contrasts the fundamental difference between the traditional algorithm paradigm and Supervised Machine Learning Paradigm.
As you see at the bottom: in contrast to the traditional paradigm, Supervised Machine Learning has rules at the end; and the answers at the beginning. Its logic is totally opposite to the traditional logic.
The underlying idea here is that: “the sample dataset” supervises the machine to discover “the rules that map the Features into the given Labels.”
Thus, the name ‘Supervised’ comes from this notion that “a labelled dataset” supervises the machine.
Now, suppose that we do not have actual sample labels in the dataset, there is nothing to supervise the machine.
What shall we do?
In such a case, we have to rely on ‘Unsupervised architecture’. The next figure contrasts between Supervised and Unsupervised architectures of Machine Learning.
Repeatedly, Supervised architectures discover the rules that map the Feature datasets to the Labels. In contrast, Unsupervised architectures, in the absence of the Labels, can only discover the underlying structure among the Feature dataset. Now, let’s take an overview of Machine Learning Family.
Overview of Machine Learning Family Tree
This family tree below organizes a variety of Machine Learning models in a structured way.
The first division separates Conventional Machine Learning Models and Deep Learning base on whether a model is inspired by neuroscience or not. This was already explained earlier.
Then, in the Conventional Machine Learning linage, the second division separates Supervised and Unsupervised architectures based on whether the dataset is labelled or not.
In the Supervised space, the third division separates Regression and Classification based on the datatype of the output: whether continuous or discrete.
In the Unsupervised space, I just put two popular types: clustering and dimension reduction.
Clustering focuses on sorting observations into groups (clusters) based on the similarities and differences among datapoints.
Dimension Reduction focuses on compressing the number of features in a dataset while retaining variation across observations to preserve the information in that variation. It does so by removing less significant data or highly correlated data. It reduces complexity of the model and improves computational efficiency of the model.
When we look at the history of Machine Learning—especially Deep Learning—most successful applications evolved from Supervised architectures rather than Unsupervised ones. Nevertheless, in order to run Supervised models, programmers needed to manually label dataset in the past.
Nowadays, in order to alleviate “tedious manual labelling works”, there are some “data augmentation techniques” that generate fake but “realistic fake labelled dataset” out of a limited volume of actual samples. I personally call it ‘Good Fake’, in contrast to ‘Deep Fake’ which can be harmful to the society.
The reality of Unsupervised architecture today
All that said, there are some successful examples from Unsupervised Deep Learning space in the past. As an example, in the context of Deep Learning, Reinforcement Learning architecture played significant roles in some seminal breakthrough applications: especially in the world of Game.
Nevertheless, there is a critical limitation in Reinforcement Learning architecture. By design, it has to learn from its acts. In order to learn what constitutes mistakes, a Reinforcement Learning model has to repeat thousands of mistakes. In addition, it demands enormous amount of training.
The big limitation to Reinforcement Learning is that it requires many trials for it to learn anything. If you want to use a kind of standard form of Reinforcement Learning to train a car to drive itself, it will have to drive millions of hours and cause thousands of accidents, if not tens of thousands, and kills many pedestrians before it learns how to drive. How is it for humans to drive a car only with a 20 hours of training? It is a big mystery.”
First, it would be risky to deploy crude Reinforcement Learning applications in Self-driving Car on the street.
Second, it would be computationally very inefficient.
Reinforcement Learning is not a promising model for the long term future, despite of the presence of its successful applications in games.
Given the reality of Unsupervised architecture today, LeCun stresses the need for a new form of Unsupervised architecture and articulates the potentiality of Self-Supervised model.