Deep Learning Carbon Footprint

Series:
Basic Intuitions of Machine Learning & Deep Learning for beginners

Chapter 6: Deep Learning Carbon Footprint
Imperative for Energy Efficiency Revolution

Originally published 16 February, 2021
By Michio Suginoo

This article is a reproduction of my earlier posting at LinkedIn, “Energy Efficiency Revolution would be the key for the next Deep Learning generation” (Suginoo, 2021). I simply changed the tile in order to fit it into the context of this series, “Intuitions of Machine Learning and Deep Learning for beginners.”

As ML engineers pursued accuracy and scalability with advanced Deep Learning models, they increased the complexity of the model architecture of Deep Learning: expanding the depth of layer structure, the size of neurons, and the number of hyperparameters in the neural network architecture.

Now, the historical chart below reveals the evolution of the Compute Usage of advanced Deep Learning models in recent years. It demonstrates an exponential growth in the compute usage by Deep Learning models in the recent decade.

Inevitably, this translates into an explosion of Carbon Footprint in Deep Learning development in recent years.

An irony is: although Deep Learning is inspired by biological brains, it is far less energy efficient than our brain. By surfing the web, we can get an idea about our brain’s energy consumption: somewhere in the range of 20-50 W. In contrast, one research estimates: Self-driving car, which uses deep learning algorithm as a part of its system, consumes 2,500 W for its computing power. (Stewart, 2018)

There was another article at MIT Technology Review reporting a shocking finding by an academic paper: in AI development, “the process can emit more than 626,000 pounds of carbon dioxide equivalent—nearly five times the lifetime emissions of the average American car (and that includes manufacture of the car itself).” (Hao, 2019)

The next bar chart compares the carbon footprint of different activities: for example, 11K lbs for human life versus 626K for Deep Learning Transformer (213 parameters) model. (MIT Technology Review, ND)

As a precaution, the particular Deep Learning model that is referred to in the article does not represent the average energy consumption of all the variety of Deep Learning models used today. Nevertheless, the chart eloquently paints a picture the dark reality of Deep Learning: how bad its energy inefficiency could be.
Obviously, in the age of climate change there is a need for a change in the architecture of Deep Learning to enhance energy efficiency going forward.

Now, an important thing is to take a holistic approach when we pursue energy efficiency. We have to achieve the goal without sacrificing other essential operating factors: accuracy, throughput, latency, hardware flexibility, hardware cost, and scalability.

There are layers of trade-off between energy efficiency efforts and these other essential operating factors. Obviously, it is not simple.

Is there any incentive for Deep Learning developers to improve energy efficiency in the future?

I would say: yes, there is; and it could be imperative more than a choice. It is because the pursuit for energy efficiency is beyond carbon footprint: an imperative generation shift in the context of Deep Learning evolution.

Beyond Carbon Footprint: Energy Efficiency is Imperative for DL generation shift.

Beyond carbon footprint, there are other imperative for deep learning community to drastically improve energy efficiency.

Today, advanced Deep Learning models operate in the Cloud Computing space (and HCP/high performance computing). According to Professor Vivienne Sze of MIT, a prominent expert in energy efficient applications of AI, it is imperative that the new generation of Deep Learning shifts from the Cloud base to the Edge (off-line embedded mobile devices) for three reasons: communication, privacy, and latency.

Communication in remote areas off existing network infrastructures.
Privacy protection by keeping sensitive personal information locally off the Cloud network.
Reduction of latency arising from communication lags in the Cloud Network to enhance real time interactions with the local environment (e.g. autonomous navigation).

Nevertheless, in order for Deep Learning models to operate in off-line embedded mobile devices at Edge, their energy demands need to be reduced to the battery level. In this context, the improvement in energy efficiency would be an imperative path, not a choice, for the coming generation shift of Deep Learning.

Cut a long story short, despite the bad news about the status quo of intense carbon footprints of Deep Learning today, there are some encouraging good news, on-going efforts made by engineers.

Research efforts to improve the energy efficiency of AI applications

As an example, Professor Vivienne Sze of MIT, a prominent expert in energy efficient applications of AI, promotes research efforts and knowledge-sharing in her field.

She shares with us some interesting insight about energy use of Deep Learning: Deep Learning models consume energy more in data movement than in compute itself, making her remark: “data movement dominates the energy consumption more than compute itself”.

Every time an algorithm moves data into and out of working memories—retrieving data from memories for calculation and storing its updated values into memories for the future iteration use—the data movement consumes energy. Energy inefficiency of data movements is already bad. And it is exacerbated further partly by the enormous volume of hyperparameters (hundreds of millions in some cases) that viable Deep Learning models use for learning. Overall, during Deep Learning deployment, data movement ends up consuming a massive amount of energy.

So, one primary key to improve the energy efficiency of Deep Learning is to reduce working memory accesses.

Here is a list of some highlights (not comprehensive) of Professor Sze’s recipe for the enhancement of Deep Learning energy efficiency:

Data Compression: reduce the amount of data for processing by exploiting sparsity of Data (pruning to zero out non-critical and redundant data from computation).
Data Reuse: Process data multiple times before storing it in memories.
Operate within low memory hierarchy: small size memories consume much less energy than DRAM (dynamic random access memory)
Develop specialized hardware for specific tasks.
Integrate all operations into or close to chip (processing unit).

You can find out more in her MIT lecture.

Repeatedly, our brain consumes energy somewhere in the range of 20-50 W. If Deep Learning is inspired by biological neural network architecture, the reduction in its energy consumption should come inevitably as engineers advances the architecture of Deep Learning closer toward biological brains in the future.
Let’s learn the best practice of energy efficiency today by visiting Professor Sze’s website: Energy Efficient Multimedia Group: https://www.rle.mit.edu/eems/publications/tutorials/.

Donation:
Please feel free to click the bottom below to donate and support
the activities of www.reversalpoint.com

Series:Basic Intuitions of Machine Learning & Deep Learning for beginners

Chapter 6: Deep Learning Carbon FootprintImperative for Energy Efficiency Revolution

Series:
Basic Intuitions of Machine Learning & Deep Learning for beginners

Chapter 6: Deep Learning Carbon Footprint
Imperative for Energy Efficiency Revolution