REVERSAL POINT
  • Home
    • Your Support
    • Privacy Policy
    • Terms of Use
    • Contact
  • Monetary Paradox
    • Monetary Wonderland
    • Shirakawa's Monetary Policy Paradox 1
    • Shirakawa's Monetary Policy Paradox 2
    • Minsky's Non-Neutral Money
    • Monetary Policy Paradox
  • Secular Cycle
    • Blog >
      • Bond Wave >
        • Monetary Regime Cycle and BitCoin
        • Paradigm Shifts in Monetary Regime along Bond Wave
    • Supra-secular rhythm
    • Secular Rhythm of Bond Wave >
      • Bond Wave Mapping 1: Paradigm Transformation in Interntional Monetary Regime
      • Bond Wave Mapping 2: Price & Inflation Cycles
      • Bond Wave Mapping 3: Private Debt Cycle
      • Bond Wave Mapping 4: Fiscal Cycle & Negative Real Yield Cycle
      • Bond Wave Mapping X: Political Cycle
      • Limited Gold Supply was a perennial problem for the Gold Standard: in search for Elastic Money and Scalability
  • Political Philosophy
    • Zeitgeist Zero Hour: Intro
    • Socrates Constitutional Cycle >
      • Socrates' Constitutional Cycle
      • Intrinsic value of Socrates Cycle
      • Contrast between Socrates vs Aristotle
    • Can we preserve democracy? >
      • Terminal Symptom of Democracy in Ancient World, Theoretical Views
      • Paradox of Equality & Aristotelean Paradox Management
      • Aristotelean Preservation of Constitutions
      • Contemporary Liberal Representative Democracy?
    • Old Contents >
      • Socrates' 5 Political Regimes
      • Socrates-Homer Hypothesis
      • Crassus & Trump: Socrates-Homer Hypothesis in Modern Context
  • Others
    • Price Evolution >
      • Oligopoly Price Cycle
      • Deflation >
        • Zero Boundary
        • Anecdote 1874-97
        • Deflationary Innovation
    • Innovation >
      • Introduction AI & ML & DL >
        • Chap1 ML Paradigm
        • Chap2 Generalization of ML
        • Chap3 DL Connectionism
        • Chap4 DL Learning Mechanism: Optimization Paradigm
        • Chap5 DL Revolution
        • Chap6 DL Carbon Footprint
        • Chap7 DL Underspecification
        • Chap8 CNN & Sequence Models
      • Map Risk Clusters of Neighbourhoods in the time of Pandemic
      • Confusing Blockchain >
        • Chapter 1: Linguistic Ambiguity
        • Chapter 2: Limitations in Consensus Protocols
        • Chapter 3-1: Disintermedition Myth-conceptions
        • Chapter 3-2: Autonomous Self-regulating Governance Myth-Conceptions
    • Environmental Distress >
      • Model Risk and Tail Risk of Climate-related Risks
  • Socrates' Constitutional Cycle


Series:
Basic Intuitions of Machine Learning & Deep Learning for beginners


Chapter 1: Machine Learning Algorithm Paradigm

Originally published: 16 February, 2021
By Michio Suginoo

What is Machine Learning Algorithm Paradigm? How is it different from the Traditional Algorithm Paradigm? That is the theme of this chapter.

Footnote Remark: Basic Terminology

Before getting into Machine Learning topic, here is a footnote remark.
Machine Learning has its own unique terminology.

As an example, here in the figure below, we have a simple equation: a dependent variable, Y, on the left is a function of an independent variable, X, on the right.
Picture
Following the convention of Machine Learning, I will call X Features, instead of Independent Variable; and Y Labels or Targets instead of Dependent Variable throughout this series.

The Limitation of Traditional Machine Learning Paradigm

Now, in order to illustrate the mechanism of Machine Learning, let’s contrast it with the traditional algorithm paradigm. In the traditional algorithm paradigm, programmers explicitly pre-determine rules that map the input data into the answer. Naturally, in the workflow of the traditional algorithm paradigm, the rules come first. Overall, you have to have a good idea about the rules in advance. It is an intuitive approach.

The figure below illustrates this notion.
Picture
Now, here is a question.
How can you set rules to detect the cat in the picture below?
Picture
In the traditional algorithm paradigm, you have to explicitly hand engineer appropriate rules in order to capture details such as eyes, ears, mouth, and so on.
As the complexity of tasks increases, it would become progressively more difficult, or even impossible, to predetermine the rules.

Such a limitation of the traditional algorithm paradigm set the stage for the emergence of Machine Learning Paradigm.

Then, what is Machine Learning Paradigm? How does it address the limitation of the traditional algorithm paradigm?

Machine Learning Paradigm


Here, for the sake of simplicity, we focus only on Supervised Machine Learning, where we have actual labels in the dataset.

The next figure contrasts the fundamental difference between the traditional algorithm paradigm and Supervised Machine Learning Paradigm.

As you see at the bottom: in contrast to the traditional paradigm, Supervised Machine Learning has rules at the end; and the answers at the beginning. Its logic is totally opposite to the traditional logic.
Picture
The underlying idea here is that: “the sample dataset” supervises the machine to discover “the rules that map the Features into the given Labels.”

Thus, the name ‘Supervised’ comes from this notion that “a labelled dataset” supervises the machine.

Now, suppose that we do not have actual sample labels in the dataset, there is nothing to supervise the machine.

What shall we do?

In such a case, we have to rely on ‘Unsupervised architecture’. The next figure contrasts between Supervised and Unsupervised architectures of Machine Learning.
Picture

Repeatedly, Supervised architectures discover the rules that map the Feature datasets to the Labels. In contrast, Unsupervised architectures, in the absence of the Labels, can only discover the underlying structure among the Feature dataset.
Now, let’s take an overview of Machine Learning Family.

Overview of Machine Learning Family Tree
 
This family tree below organizes a variety of Machine Learning models in a structured way.
Picture
The first division separates Conventional Machine Learning Models and Deep Learning base on whether a model is inspired by neuroscience or not. This was already explained earlier.

Then, in the Conventional Machine Learning linage, the second division separates Supervised and Unsupervised architectures based on whether the dataset is labelled or not.

In the Supervised space, the third division separates Regression and Classification based on the datatype of the output: whether continuous or discrete.

In the Unsupervised space, I just put two popular types: clustering and dimension reduction.

Clustering focuses on sorting observations into groups (clusters) based on the similarities and differences among datapoints.

Dimension Reduction focuses on compressing the number of features in a dataset while retaining variation across observations to preserve the information in that variation. It does so by removing less significant data or highly correlated data. It reduces complexity of the model and improves computational efficiency of the model.

When we look at the history of Machine Learning—especially Deep Learning—most successful applications evolved from Supervised architectures rather than Unsupervised ones. Nevertheless, in order to run Supervised models, programmers needed to manually label dataset in the past.

Nowadays, in order to alleviate “tedious manual labelling works”, there are some “data augmentation techniques” that generate fake but “realistic fake labelled dataset” out of a limited volume of actual samples. I personally call it ‘Good Fake’, in contrast to ‘Deep Fake’ which can be harmful to the society.

The reality of Unsupervised architecture today

All that said, there are some successful examples from Unsupervised Deep Learning space in the past. As an example, in the context of Deep Learning, Reinforcement Learning architecture played significant roles in some seminal breakthrough applications: especially in the world of Game. 

Nevertheless, there is a critical limitation in Reinforcement Learning architecture. By design, it has to learn from its acts. In order to learn what constitutes mistakes, a Reinforcement Learning model has to repeat thousands of mistakes. In addition, it demands enormous amount of training.

Here is an illustrative remark about the limitation of Reinforcement Learning by Yann LeCun, one of prominent pioneers of Deep Learning applications in Computer Vision and one of the co-recipients of the Turing Award in 2018 (Turing Award is often regarded as a Nobel Prize equivalent in the field of Computing).

  • The big limitation to Reinforcement Learning is that it requires many trials for it to learn anything. If you want to use a kind of standard form of Reinforcement Learning to train a car to drive itself, it will have to drive millions of hours and cause thousands of accidents, if not tens of thousands, and kills many pedestrians before it learns how to drive. How is it for humans to drive a car only with a 20 hours of training? It is a big mystery.”
  • (LeCun, 2019, 26:25-27:10)
  • First, it would be risky to deploy crude Reinforcement Learning applications in Self-driving Car on the street.
  • Second, it would be computationally very inefficient.

Reinforcement Learning is not a promising model for the long term future, despite of the presence of its successful applications in games.

Given the reality of Unsupervised architecture today, LeCun stresses the need for a new form of Unsupervised architecture and articulates the potentiality of Self-Supervised model.

Now, let’s move on to the next chapter to see the process of Machine Learning model development.

Donation:
Please feel free to click the bottom below to donate and support
the activities of www.reversalpoint.com

​Copyright © by Michio Suginoo. All rights reserved.

Proudly powered by Weebly