REVERSAL POINT
  • Home
    • Your Support
    • Privacy Policy
    • Terms of Use
    • Contact
  • Monetary Paradox
    • Monetary Wonderland
    • Shirakawa's Monetary Policy Paradox 1
    • Shirakawa's Monetary Policy Paradox 2
    • Minsky's Non-Neutral Money
    • Monetary Policy Paradox
  • Secular Cycle
    • Blog >
      • Bond Wave >
        • Monetary Regime Cycle and BitCoin
        • Paradigm Shifts in Monetary Regime along Bond Wave
    • Supra-secular rhythm
    • Secular Rhythm of Bond Wave >
      • Bond Wave Mapping 1: Paradigm Transformation in Interntional Monetary Regime
      • Bond Wave Mapping 2: Price & Inflation Cycles
      • Bond Wave Mapping 3: Private Debt Cycle
      • Bond Wave Mapping 4: Fiscal Cycle & Negative Real Yield Cycle
      • Bond Wave Mapping X: Political Cycle
      • Limited Gold Supply was a perennial problem for the Gold Standard: in search for Elastic Money and Scalability
  • Political Philosophy
    • Zeitgeist Zero Hour: Intro
    • Socrates Constitutional Cycle >
      • Socrates' Constitutional Cycle
      • Intrinsic value of Socrates Cycle
      • Contrast between Socrates vs Aristotle
    • Can we preserve democracy? >
      • Terminal Symptom of Democracy in Ancient World, Theoretical Views
      • Paradox of Equality & Aristotelean Paradox Management
      • Aristotelean Preservation of Constitutions
      • Contemporary Liberal Representative Democracy?
    • Old Contents >
      • Socrates' 5 Political Regimes
      • Socrates-Homer Hypothesis
      • Crassus & Trump: Socrates-Homer Hypothesis in Modern Context
  • Others
    • Price Evolution >
      • Oligopoly Price Cycle
      • Deflation >
        • Zero Boundary
        • Anecdote 1874-97
        • Deflationary Innovation
    • Innovation >
      • Introduction AI & ML & DL >
        • Chap1 ML Paradigm
        • Chap2 Generalization of ML
        • Chap3 DL Connectionism
        • Chap4 DL Learning Mechanism: Optimization Paradigm
        • Chap5 DL Revolution
        • Chap6 DL Carbon Footprint
        • Chap7 DL Underspecification
        • Chap8 CNN & Sequence Models
      • Map Risk Clusters of Neighbourhoods in the time of Pandemic
      • Confusing Blockchain >
        • Chapter 1: Linguistic Ambiguity
        • Chapter 2: Limitations in Consensus Protocols
        • Chapter 3-1: Disintermedition Myth-conceptions
        • Chapter 3-2: Autonomous Self-regulating Governance Myth-Conceptions
    • Environmental Distress >
      • Model Risk and Tail Risk of Climate-related Risks
  • Socrates' Constitutional Cycle

Series:
Basic Intuitions of Machine Learning & Deep Learning for beginners


Chapter 8: Deep Learning’s 2 Basic Models
Convolutional Neural Networks and Sequence Models.

Originally published 16 February, 2021
By Michio Suginoo

Although Deep Learning, by its design as a family member of Machine Learning, does not explicitly specify rules that map the features into the labels, it does require particular model specifications for handling specific tasks. In addition, it is important to use specific Data Representation to process the subjects of interest, such as digital image and Natural Language. That creates variations in model specifications within Deep Learning Family tree.

Today, many sensational success stories of Machine Learning come from Deep Learning domain. And, new variants of Deep Learning are emerging continuously and expand the frontier of Deep Learning family tree.

It would be impossible to cover all the variants at this stage of our journey. So, in this chapter, I want to give you some flavours of two most popular and basic variants: Convolutional Neural Networks and Sequential Models.
Picture
Here in the figure above, on the left, CNN or Convolutional Neural Network evolved in the context of Computer Vision such as object detection and classification tasks. CNN was originally used for a variety of applications: such as Medical Image Detection, Face Recognition, and Autonomous Navigation. It is also used in combination with other models.

Next, on the right: Sequential Models are often used in Natural Language applications such as Machine Translation and Speech Transcription. It is also used to edit music. Recently, medical researchers are applying this model to Genomic Sequencing.
 
Let’s take a look one by one to get some intuition about what each of these models does.

CNN (Convolutional Neural Networks)

First, let’s take a look at Convolutional Neural Networks, a.k.a. CNNs.

Our eyes can detect objects in our surroundings. Our vision is taken for granted today. Nevertheless, at an early stage of the evolution of life, vision was not devised to primitive creatures. Our vision is a product of our evolution. It has evolved in order to capture the spatial structure of our surroundings.

At an early stage of computer vision, AI experts realized that Standard Deep Learning, a fully connected neural network, failed to preserve the spatial structure of images while training neural networks on visual datasets. Yes, analogous to our evolution, Deep Learning needed to evolve in order to acquire vision. In order to address this issue, Convolutional Neural Networks (CNN) has come into being.
Engineers realized the necessity to customize the neural networks architecture in a specific manner to preserve the spatial structure of digital images.

The image below illustrates the way CNNs preserves the spatial structure of visual data (pixels).

CNN slides a small window of filter (kernel) along the spatial structural order of an image—from left to right and from top to bottom—to scan, extract, and memorize small local features one by one. In this manner, CNN manages to preserve the spatial structure of the image during its learning process.
Picture
This specific feature of CNNs is an example of the model specification.

There are many techniques to effectively extract the features of digital images in CNNs architecture. Nevertheless, this intuition would give us a flavour of the most essential model specification of CNNs and paint a sketch that navigates you to the gateway of the modern Computer Vision.

Sequence Models

Next, let’s move to the world of Natural Language Processing.

Sequence Model Specification

Here is a simple question. Can you see the difference between these two sentences?
  • A man ate a crocodile.
  • A crocodile ate a man.

Yes, you can: of course!

Both of these two sentences are composed of the exactly same combination of words. Nevertheless, our brain can process a sentence as a sequence in order to distinguish the difference in meaning between these two sentences.
Picture
Simply put, Sequence matters more than Combination in our Natural Language.
This sequential feature of our natural language defines the model specification of Deep Learning applications in Natural Language Processing. Engineers have customized deep learning architecture to process sequential representation of our language for Natural Language applications.
 
Vocabulary as Featurized Representation and Embedding

Now, here is another question. But strictly for NON-Japanese speakers.

Do you understand the next word?
  • ディープ・ラーニング

No, you can’t.

It is simply because you do not have the word in your vocabulary. To non-Japanese speakers, it is no more than a sequence of abstract symbols called letters.

In daily life, when we use a word, we subconsciously access to our vocabulary in our brain to map its features, such as meanings and connotations, onto the word.
So, a word is more than a sequence of abstract symbols. It is a Featurized Representation. It is important to describe words in Featurized Representation.
Picture
The table here illustrates Featurized Representation of 6 words.
  • On the column, we have 6 words: man, woman, king, queen, apple, and orange.
  • And on the rows, we have 4 features: gender; royal, age, and food.

And the table does not specify the detailed meanings of each word. Instead, it specifies each word’s magnitude of relevance to these 4 features. In this way, the machine maps its features onto every single word. This process is called

“Embedding”.
 
Embedding is a process to project a word into a geometrical Feature space. In general, a Feature space has a high dimension. So, we cannot see it. Here, in order to appeal the notion of the Featurized Representation to our visual sense, I decided to reduce the representation in 3 Dimension Space.

Now, we have 4 words: Elephant, Tiger, Cat, and Lion. When the machine embeds the features onto those words, an interesting thing will happen.

Here is an illustration. You can click the video below to see the embedding effect.
Here you are. We see two contrasting moves.
  • Similar words—Cat, Tiger, & Lion—gather in a neighbourhood to shape a cluster.
  • And, Elephant is segregated remotely from the rest.

These characteristics can be used to segment words into distinctive clusters.

In a way, users with vicious intention can manipulate this mechanism to intentionally mis-categorize and discriminate some target words. It can be used to exacerbate social inequality.

On the other hand, we can “reverse-engineer” the same technique to adjust such distortion in principle.

Long Term Dependency


Here is another question.

Can you guess what could be the missing word in the sentence in this next image?
Picture
Yes, you can.

Words in a sentence have interdependencies among themselves. And such interdependencies shape the meaning of a sentence.

Somehow, our brains can guess the missing word at the end of the sentence by discovering its ‘long-term dependency’ with an earlier key word, 'rainy’.
Picture
So, Natural Language Processing demands Deep Learning application to be able to discover Long Term Dependency as well as Short Term Dependency among words in a sentence. Nevertheless, Long Term Dependency turns out to be an extremely difficult task.
 
Well, I presented only 3 examples of functions that Natural Language Processing requires Deep Learning applications to perform. And Sequence Models was designed as the most basic model to meet those objectives: Sequential model specification; process Featurized Representation of a word in a geometrical space; to discover Long Term Dependency among words in a sentence.

And today, there are mode advanced models, such as Transformer, designed to process Natural Language tasks.

That gives a flavour of the importance of Data Representation and Model Specification in Deep Learning specific task applications.

There are more to discover in your journey. Repeatedly, this series is designed for beginners. And I am hoping that this series gives beginners good intuitions for their Machine Learning journey going forward.

Thanks for reading the series.

Good luck for your journey!

Best Regards,
Michio Suginoo


Donation:
Please feel free to click the bottom below to donate and support
the activities of www.reversalpoint.com


​Copyright © by Michio Suginoo. All rights reserved.

Proudly powered by Weebly