Series: Basic Intuitions of Machine Learning & Deep Learning for beginners
Chapter 8: Deep Learning’s 2 Basic Models Convolutional Neural Networks and Sequence Models.
Originally published 16 February, 2021 By Michio Suginoo
Although Deep Learning, by its design as a family member of Machine Learning, does not explicitly specify rules that map the features into the labels, it does require particular model specifications for handling specific tasks. In addition, it is important to use specific Data Representation to process the subjects of interest, such as digital image and Natural Language. That creates variations in model specifications within Deep Learning Family tree.
Today, many sensational success stories of Machine Learning come from Deep Learning domain. And, new variants of Deep Learning are emerging continuously and expand the frontier of Deep Learning family tree.
It would be impossible to cover all the variants at this stage of our journey. So, in this chapter, I want to give you some flavours of two most popular and basic variants: Convolutional Neural Networks and Sequential Models.
Here in the figure above, on the left, CNN or Convolutional Neural Network evolved in the context of Computer Vision such as object detection and classification tasks. CNN was originally used for a variety of applications: such as Medical Image Detection, Face Recognition, and Autonomous Navigation. It is also used in combination with other models.
Next, on the right: Sequential Models are often used in Natural Language applications such as Machine Translation and Speech Transcription. It is also used to edit music. Recently, medical researchers are applying this model to Genomic Sequencing.
Let’s take a look one by one to get some intuition about what each of these models does.
CNN (Convolutional Neural Networks)
First, let’s take a look at Convolutional Neural Networks, a.k.a. CNNs.
Our eyes can detect objects in our surroundings. Our vision is taken for granted today. Nevertheless, at an early stage of the evolution of life, vision was not devised to primitive creatures. Our vision is a product of our evolution. It has evolved in order to capture the spatial structure of our surroundings.
At an early stage of computer vision, AI experts realized that Standard Deep Learning, a fully connected neural network, failed to preserve the spatial structure of images while training neural networks on visual datasets. Yes, analogous to our evolution, Deep Learning needed to evolve in order to acquire vision. In order to address this issue, Convolutional Neural Networks (CNN) has come into being. Engineers realized the necessity to customize the neural networks architecture in a specific manner to preserve the spatial structure of digital images.
The image below illustrates the way CNNs preserves the spatial structure of visual data (pixels).
CNN slides a small window of filter (kernel) along the spatial structural order of an image—from left to right and from top to bottom—to scan, extract, and memorize small local features one by one. In this manner, CNN manages to preserve the spatial structure of the image during its learning process.
This specific feature of CNNs is an example of the model specification.
There are many techniques to effectively extract the features of digital images in CNNs architecture. Nevertheless, this intuition would give us a flavour of the most essential model specification of CNNs and paint a sketch that navigates you to the gateway of the modern Computer Vision.
Sequence Models
Next, let’s move to the world of Natural Language Processing.
Sequence Model Specification
Here is a simple question. Can you see the difference between these two sentences?
A man ate a crocodile.
A crocodile ate a man.
Yes, you can: of course!
Both of these two sentences are composed of the exactly same combination of words. Nevertheless, our brain can process a sentence as a sequence in order to distinguish the difference in meaning between these two sentences.
Simply put, Sequence matters more than Combination in our Natural Language. This sequential feature of our natural language defines the model specification of Deep Learning applications in Natural Language Processing. Engineers have customized deep learning architecture to process sequential representation of our language for Natural Language applications.
Vocabulary as Featurized Representation and Embedding
Now, here is another question. But strictly for NON-Japanese speakers.
Do you understand the next word?
ディープ・ラーニング
No, you can’t.
It is simply because you do not have the word in your vocabulary. To non-Japanese speakers, it is no more than a sequence of abstract symbols called letters.
In daily life, when we use a word, we subconsciously access to our vocabulary in our brain to map its features, such as meanings and connotations, onto the word. So, a word is more than a sequence of abstract symbols. It is a Featurized Representation. It is important to describe words in Featurized Representation.
The table here illustrates Featurized Representation of 6 words.
On the column, we have 6 words: man, woman, king, queen, apple, and orange.
And on the rows, we have 4 features: gender; royal, age, and food.
And the table does not specify the detailed meanings of each word. Instead, it specifies each word’s magnitude of relevance to these 4 features. In this way, the machine maps its features onto every single word. This process is called
“Embedding”.
Embedding is a process to project a word into a geometrical Feature space. In general, a Feature space has a high dimension. So, we cannot see it. Here, in order to appeal the notion of the Featurized Representation to our visual sense, I decided to reduce the representation in 3 Dimension Space.
Now, we have 4 words: Elephant, Tiger, Cat, and Lion. When the machine embeds the features onto those words, an interesting thing will happen.
Here is an illustration. You can click the video below to see the embedding effect.
Here you are. We see two contrasting moves.
Similar words—Cat, Tiger, & Lion—gather in a neighbourhood to shape a cluster.
And, Elephant is segregated remotely from the rest.
These characteristics can be used to segment words into distinctive clusters.
In a way, users with vicious intention can manipulate this mechanism to intentionally mis-categorize and discriminate some target words. It can be used to exacerbate social inequality.
On the other hand, we can “reverse-engineer” the same technique to adjust such distortion in principle. Long Term Dependency
Here is another question.
Can you guess what could be the missing word in the sentence in this next image?
Yes, you can.
Words in a sentence have interdependencies among themselves. And such interdependencies shape the meaning of a sentence.
Somehow, our brains can guess the missing word at the end of the sentence by discovering its ‘long-term dependency’ with an earlier key word, 'rainy’.
So, Natural Language Processing demands Deep Learning application to be able to discover Long Term Dependency as well as Short Term Dependency among words in a sentence. Nevertheless, Long Term Dependency turns out to be an extremely difficult task.
Well, I presented only 3 examples of functions that Natural Language Processing requires Deep Learning applications to perform. And Sequence Models was designed as the most basic model to meet those objectives: Sequential model specification; process Featurized Representation of a word in a geometrical space; to discover Long Term Dependency among words in a sentence.
And today, there are mode advanced models, such as Transformer, designed to process Natural Language tasks.
That gives a flavour of the importance of Data Representation and Model Specification in Deep Learning specific task applications.
There are more to discover in your journey. Repeatedly, this series is designed for beginners. And I am hoping that this series gives beginners good intuitions for their Machine Learning journey going forward.
Thanks for reading the series.
Good luck for your journey!
Best Regards, Michio Suginoo
Donation: Please feel free to click the bottom below to donate and support the activities of www.reversalpoint.com