Akira’s ML News #Week3, 2021

Akihiro FUJII
Analytics Vidhya
Published in
7 min readJan 14, 2021

--

Here are some of the papers and articles that I found particularly interesting I read in week 3 of 2021 (10 January ~). I’ve tried to introduce the most recent ones as much as possible, but the date of the paper submission may not be the same as the week.

Topics

  1. Machine Learning Papers
  2. Technical Articles
  3. Examples of Machine Learning use case
  4. Other Topics

— Weekly Editor’s pickup

— — — — — — — — — — — — — — — — — — — — — — — — — —

1. Machine Learning Papers

— —

Fast training of large-scale models by switching layers for each token

SWITCH TRANSFORMERS: SCALING TO TRILLION PARAMETER MODELS WITH SIMPLE AND EFFICIENTSPARSITY
https://arxiv.org/abs/2101.03961

They propose a Switch-Transformer that can learn efficiently with a large number of parameters by “switching” to the most appropriate specialized layer for each token after Self-Attention. 65 times more parameters are used than T5, but the same accuracy is reached 7 times faster. The maximum number of parameters in the model is 1.5 trillion.
The model is practically sparse. While taking advantage of the large scale model and large parameters, the number of parameters used per token is practically small due to switching. There is still a lot to think about, such as how to expand the parameters for each device, but this may become a trend along with model scaling.

Increase safety by applying patches that increase confidence.

Unadversarial Examples: Designing Objects for Robust Vision
https://arxiv.org/abs/2012.12235

Research to increase recognition accuracy by creating and attaching patches that increase confidence by doing the opposite of adversarial samples. Possible applications include increasing safety by attaching the patch to areas that require recognition accuracy in the real world, such as landing pads that need to be recognized when a drone lands.

High performance generative models by reduce differences in frequency space

Focal Frequency Loss for Generative Models
https://arxiv.org/abs/2012.12821

Despite the success of the images generated by the generative model, there is a difference between them and real images in frequency space. Using the Discrete Fourier Transform, a loss function, Focal Frequency Loss, is proposed to minimize the difference. It fills the difference in frequency space and improves the quality of the generated image of VAE, which tends to be blurred.

virtual try-on with StyleGAN2

VOGUE: Try-On by StyleGAN Interpolation Optimization
https://arxiv.org/abs/2101.02285

A study of virtual try-on. First, they pre-train StyleGAN2 with key point conditions, and then post-train it to adjust the mixing ratio of the clothing reference image and the human image in each layer so that the clothing and body images are mixed.

Optimal Dropout with Reinforcement Learning

AutoDropout: Learning Dropout Patterns to Regularize Deep Networks
https://arxiv.org/abs/2101.01761

A study of searching optimal Dropout using reinforcement learning to find the optimal parameters for complex Dropout patterns, including mask rotation. The parameters are generated sequentially by the controller like NAS, but using Transformer instead of RNN. The effect was confirmed with NLP and images.

Improve interpretability by making predictions via relevant information

Concept Bottleneck Models
https://arxiv.org/abs/2007.04612

A model that predicts labels through the prediction of related information (concept: feather color and beak length for birds) in addition to the correct label. This model can improve the interpretability of the model because it outputs the related information. If the concept output by the model is wrong, it can be corrected, allowing human intervention during inference.

Pruning contributes to bias

CHARACTERISING BIAS IN COMPRESSED MODELS
https://arxiv.org/abs/2010.03058

A study showing that pruning to reduce the model size contributes to bias. The results show that pruning does not seem to cause a large drop in overall accuracy, but it does cause a large drop in accuracy for data with rare occurrences.

Multi-Object Tracking with Transformers

TrackFormer: Multi-Object Tracking with Transformers
https://arxiv.org/abs/2101.02702

A study of the application of DETR, a Transformer-based object detection model, to Multi object Tracking. The object in the previous frame is added to the object query of the decoder to be tracked.

Mathematically equivalent transformations reduce the computational complexity of Self-Attention.

Efficient Attention: Attention with Linear Complexities
https://openaccess.thecvf.com/content/WACV2021/papers/Shen_Efficient_Attention_Attention_With_Linear_Complexities_WACV_2021_paper.pdf

By changing the calculation of Self-Attention to Q(KV) instead of (QK)V, the amount of computation can be reduced from n² to d_k*d_v with mathematical equivalence. The amount of computation can be significantly reduced at large resolutions. In object detection, we confirmed the accuracy improvement by including it as a substitute for non-local module.

— — — — — — — — — — — — — — — — — — — — — — — — — —

2. Technical Articles

— —

Monitoring Machine Learning Service

Just as services that use software need to be monitored, services that use machine learning also need to be monitored. This article explains what to monitor in a service using machine learning (e.g. monitoring at accuracy to check if the model is outdated), case studies, and an introduction to monitoring tools.

2020 Trends from Paper with Code

This is a summary of the most popular papers, libraries, and benchmark datasets from Paper with Code.

— — — — — — — — — — — — — — — — — — — — — — — — — —

3. Examples of Machine Learning use case

— —

The Direction of Use of DeepFake

DeepFake has become a social problem, including fake pornography, and this article explains the good and bad ways it is being used. Good uses include changing the face of a whistle-blower to protect him or her, promotion video of multilingual politicians speaking in their native languages, and IF videos of history as entertainment.

Problems with Medical x Machine Learning

An article that points out the problems with medical x machine learning. Medical x machine learning is making a lot of noise in the media, but machine learning relies on data, and medical data can easily be missing or biased, or data can be collected inappropriately due to lack of understanding. It also presents cases where assistance was delayed due to algorithm implementation errors.

— — — — — — — — — — — — — — — — — — — — — — — — — —

4. Other Topics

— —

The US needs AI know-how.

An interview with Mr. Furman, who served as Chief Economic Advisor under the Obama administration. He said that the Trump administration had a policy of discouraging engineers and students from coming to the U.S., but the Biden administration needs to do something about it.

Training Deep Learning Models on CPUs

This article is talking about Neural Magic, a startup that uses the CPU, not the GPU, as the hardware for deep learning.Neural Magic is developing software that can train on the CPU so that it can function on its own and still be widely used for deep learning models on existing PCs.

— — — — — — — — — — — — — — — — — — — — — — — — — —

— Past Articles

2021 Week 2 ⇦ 2021 Week 3(this post) ⇨ 2021 Week 4

December 2020 summary
November 2020 summary
October 2020 summary

2020 summary

— — — — — — — — — — — — — — — — — — — — — — — — — —

Twitter, I post one-sentence paper commentary.

--

--