REVERSAL POINT
  • Home
    • Your Support
    • Privacy Policy
    • Terms of Use
    • Contact
  • Monetary Paradox
    • Monetary Wonderland
    • Shirakawa's Monetary Policy Paradox 1
    • Shirakawa's Monetary Policy Paradox 2
    • Minsky's Non-Neutral Money
    • Monetary Policy Paradox
  • Secular Cycle
    • Blog >
      • Bond Wave >
        • Monetary Regime Cycle and BitCoin
        • Paradigm Shifts in Monetary Regime along Bond Wave
    • Supra-secular rhythm
    • Secular Rhythm of Bond Wave >
      • Bond Wave Mapping 1: Paradigm Transformation in Interntional Monetary Regime
      • Bond Wave Mapping 2: Price & Inflation Cycles
      • Bond Wave Mapping 3: Private Debt Cycle
      • Bond Wave Mapping 4: Fiscal Cycle & Negative Real Yield Cycle
      • Bond Wave Mapping X: Political Cycle
      • Limited Gold Supply was a perennial problem for the Gold Standard: in search for Elastic Money and Scalability
  • Political Philosophy
    • Zeitgeist Zero Hour: Intro
    • Socrates Constitutional Cycle >
      • Socrates' Constitutional Cycle
      • Intrinsic value of Socrates Cycle
      • Contrast between Socrates vs Aristotle
    • Can we preserve democracy? >
      • Terminal Symptom of Democracy in Ancient World, Theoretical Views
      • Paradox of Equality & Aristotelean Paradox Management
      • Aristotelean Preservation of Constitutions
      • Contemporary Liberal Representative Democracy?
    • Old Contents >
      • Socrates' 5 Political Regimes
      • Socrates-Homer Hypothesis
      • Crassus & Trump: Socrates-Homer Hypothesis in Modern Context
  • Others
    • Price Evolution >
      • Oligopoly Price Cycle
      • Deflation >
        • Zero Boundary
        • Anecdote 1874-97
        • Deflationary Innovation
    • Innovation >
      • Introduction AI & ML & DL >
        • Chap1 ML Paradigm
        • Chap2 Generalization of ML
        • Chap3 DL Connectionism
        • Chap4 DL Learning Mechanism: Optimization Paradigm
        • Chap5 DL Revolution
        • Chap6 DL Carbon Footprint
        • Chap7 DL Underspecification
        • Chap8 CNN & Sequence Models
      • Map Risk Clusters of Neighbourhoods in the time of Pandemic
      • Confusing Blockchain >
        • Chapter 1: Linguistic Ambiguity
        • Chapter 2: Limitations in Consensus Protocols
        • Chapter 3-1: Disintermedition Myth-conceptions
        • Chapter 3-2: Autonomous Self-regulating Governance Myth-Conceptions
    • Environmental Distress >
      • Model Risk and Tail Risk of Climate-related Risks
  • Socrates' Constitutional Cycle

Series:
Basic Intuitions of Machine Learning & Deep Learning for beginners


Chapter 2:
Generalization: Ultimate Goal of Machine Learning Project

Originally published 16 February, 2021
By Michio Suginoo

Last Edited 17 February, 2021


So, in Chapter 1, we got an intuition about Machine Learning Algorithm Paradigm. It is about training Machine on the given dataset: in the presence of actual labels, Supervised architecture can discover the hidden rules that map the Feature dataset to the Labels; in the absence of actual labels, Unsupervised architecture could discover the underlying structure within the Feature dataset.

All that said, an important thing is: training the machine is not good enough in the lab.

Ultimate Objective of Machine Learning: Generalization

The ultimate objective of Machine Learning is a generalization.

Imagine, if the trained dataset contained some anomalies, the trained model might fail to perform well with other dataset drawn from the same distribution. This sort of risk is called Overfitting Risk.

In order to address overfitting risk, we need to incorporate a generalization process into the model development workflow. In this spirit, in the lab, we divide the dataset into three subsets at least.

The figure below illustrate an example of three division.
Picture
  • “Train Dataset” on the left is for training
  • “Test Dataset” on the right is for generalization; furthermore, in the middle,
  • “Validation Dataset” to tune the model before the generalization.

Now, let’s take a look at the next figure.
Picture
Here, we have 3 charts on the top to capture the 3 scenarios:
  • First, underfitting on the left; the straight line fails to represent the distribution of the given datapoints.
  • Then, overfitting on the right; the line connecting all the datapoints would fail to fit to other sample dataset.
  • and Robust Fit in the middle: this Middle Way gives up an optimal fitting.

The chart at the bottom shows two error curves:
  • Training Error and
  • Generalization Error.

As we fit the model on the dataset, training error declines. On the other hand, after a certain level of fitting, the generalization error starts rising.

While we want to fit the model well on the training dataset, we do not want to end up failing the generalization of the model. So, in the lab we want to have a middle way at an optimal capacity, Robust Fit, to avoid both underfitting and overfitting risks.

Capacity

In the chart, the description of the X axis reads ‘Capacity’. Here is an excerpt from the famous deep learning text book.  (Goodfellow, Bengio, & Courvil, 2016)
“Informally, a model’s capacity is its ability to fit a wide variety of functions.
  • Models with low capacity may struggle to fit the training set.
  • Models with high capacity can overfit by memorizing properties of the training set that do not serve them well on the test set.
 
Addressing Overfitting & Underfitting

Now, how can we address the issues of underfitting and/or overfitting?
Picture
When your model is suffering from ‘underfitting’, the model has not been fitted enough on the given dataset. In such a case, we need to improve the model architecture; rather than feeding more data. And there are the following options at least:
  • to train longer by extending the 3-step iteration cycle for a longer repetition (epochs)
  • to adjust the operating components of the model: hyperparameters, layers, activation functions
  • to improve optimization techniques, which are beyond the scope of this series.

When your model is suffering from ‘overfitting’, the model was fitted too well to only the Train dataset. So, you want to explore the next options:
  • to feed in more data to improve the generalization of the model
  • to penalize fitting on Train Dataset: this is beyond the scope of this series.
  • to adjust Deep Learning architecture

For now, this is a preview to cultivate your high level understanding: you do not need to understand the details of these techniques at this stage. You will have a better understanding over these techniques in later stages of your Machine Learning journey.

IID Violation Issue

Next, suppose that we fully addressed both underfitting and overfitting risks in the lab. Now, the model is validated in the lab. Is that “a happy ending”?

As a matter of fact, there is no guarantee, especially if there is a violation of the fundamental principle of IID Assumption; or “Independent and Identical Distribution Assumption”.
Picture
In an extreme situation, if those 3 datasets altogether failed to represent the real world ground truth distribution of the subject, “the validated model in the lab” could miserably fail in real world settings.

So, “the validated model in the lab” is only as good as the quality of the given dataset.

In order to meet IID Assumption, we need a massive amount of dataset.
In addition, remember, Deep Learning is more scalable than “Traditional Machine Learning Models”

IID Assumption, together with Scalability Context, reinforces data hungry nature of Deep Learning more than Traditional Machine Learning Models.
 
Next, let’s dive into Deep Learning.

Donation:
Please feel free to click the bottom below to donate and support
the activities of www.reversalpoint.com

​Copyright © by Michio Suginoo. All rights reserved.

Proudly powered by Weebly