Akira’s Machine Learning News — Issue #35

6 min readNov 27, 2021

Featured Paper/News in This Week.

A method is proposed to mask the image and pre-train the model to recover it, like BERT. 75% of the image is masked and only 25% of the unmasked image is input to the encoder, which seems to be memory friendly.
An image generation model that is a combination of the diffusion model and the mask language model has been presented. It seems to be able to adjust the quality of the generation while adapting it to the computational resources at hand.

— — — — — — — — — — — — — — — — — — –

In the following sections, I will introduce various articles and papers not only on the above contents but also on the following five topics.

Featured Paper/News in This Week
Machine Learning Use Case
Papers
Articles related to machine learning technology

— — — — — — — — — — — — — — — — — — –

1. Featured Paper/News in This Week

Learning ViT by hiding the image with a mask and restoring it — arxiv.org

[2111.06377] Masked Autoencoders Are Scalable Vision Learners
They propose MAE (Masked Autoencoders) that can achieve 87.8% on ImageNet alone by masking images and restoring them through self-supervised learning, even though it uses ViT-based models. The proposed method hides most of the image (e.g., 75%) and learns to restore it, and shows higher performance than existing self-supervised learning methods such as DINO and MoCo v3.

A model that combines autoregressive and diffusion models — arxiv.org

[2110.02037] Autoregressive Diffusion Models
Proposed ARDMs (AutoreRressive Diffusion Models), a combination of autoregressive and diffusion models. Unlike autoregressive models, which regress from top-left to bottom-right sequentially, ARDMs are trained to reproduce randomly selected points from the input, which may be similar to BERT’s masked language models.

— — — — — — — — — — — — — — — — — — –

2. Machine Learning use case

Protecting Sexual Minorities with Voice Conversion Using Deep Fake — www.wired.com

These Deepfake Voices Can Help Trans Gamers

Fred, a trans man, clicked his mouse, and his tenorful tones suddenly sank deeper. He'd switched on voice-changing…

www.wired.com

Transgender people can be harassed for voice and gender mismatch, but using Deep Fake for voice conversion can prevent such harassment. This will make it easier for sexual minorities to participate in the online community, which has been difficult for them to do so.

Crossing Language Barriers with Multilingual Model Chatbots — venturebeat.com

How Moveworks' AI platform broke through the multilingual NLP barrier

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this…

venturebeat.com

This is an introduction to a chatbot developed by Moveworks that uses a multilingual language model. In a global company, this means that people who speak different languages can receive support without having to set up support centers in different countries.

— — — — — — — — — — — — — — — — — — –

3. Machine Learning Papers

Validating table data with various deep learning models — arxiv.org

[2106.11959] Revisiting Deep Learning Models for Tabular Data
A study that tested various deep learning models on table data, showing that ResNet-based models are strong and that FT-Transformer, which tokenizes features, is a good baseline, but is not significantly superior to GBDT-based methods.

›

Tip-Adapter for Few-shot learning without learning — arxiv.org

[2111.03930] Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
By improving the CLIP-Adapter, they proposed a Tip-Adapter that performs few-shot learning without updating parameters. The similarity between the test image and the Few-shot dataset is measured, and the output of the category is based on the similarity and the text information.

A method that can be directly applied to existing object detection methods to improve the accuracy. — arxiv.org

[2111.03056] Bootstrap Your Object Detector via Mixed Training
Proposes MixTraining, which replaces existing GT labels with high confidence prediction in object detection, and controls the strength of data augmentation depending on the difficulty of the sample. It can be directly applied to existing object detection methods to improve the accuracy.

Patching is the cause of learning instability in ViT. — arxiv.org

[2106.14881] Early Convolutions Help Transformers See Better
ViT is less stable in learning than CNNs, but the authors argued that the reason for this is the patching of the initial layer. By replacing the initial 16x16 patching with regular Conv combined with 3x3 Conv, etc., ViT becomes robust to fluctuations in learning rate, converges faster, and outperforms the SotA model of CNNs.

AugMax data augmentations to learn diversity and high difficulty samples. — arxiv.org

[2110.13771] AugMax: Adversarial Composition of Random Augmentations for Robust Training
Proposed AugMax, which searches for more powerful data augmentations by using learning parameters for mixing data augmentations. DuBIN is also proposed to separate individual and batch level diversity with Instance Norm and BatchNorm because it is too difficult to learn. The authors claim it can learn diversity and high difficulty samples.

Comparison of Robustness between CNN and Transformer — arxiv.org

[2111.05464] Are Transformers More Robust Than CNNs?
Transformer was said to be more robust than CNN, but when training methods such as training data and data augmentations are aligned, CNN can acquire the same level of robustness against adversarial attacks as Transformer. However, for outlier data such as ImageNet-A and -C, Transformer was stronger.

— — — — — — — — — — — — — — — — — — –

4. Technical Articles

What to watch out for in a project using data science techniques — towardsdatascience.com

Avoiding the 4 Major Pitfalls of Data Science Projects

Working on a data science project, especially with a new stakeholder, can be challenging. Learn how to avoid the main…

towardsdatascience.com

An article about what to watch out for in a project using data science techniques. It discusses unorganized data and conflicts with stakeholders.

— — — — — — — — — — — — — — — — — — –

🌟I post weekly newsletters! Please subscribe!🌟

Akira's Machine Learning News - Revue

By Akira's Machine Learning News -- by Akihiro FUJII : Manufacturing Engineer / Machine Learning Engineer/ Master of…

www.getrevue.co

— — — — — — — — — — — — — — — — — — –

Other Blogs

Machine Learning 2020 summary: 84 interesting papers/articles

In this article, I present a total of 84 papers and articles published in 2020 that I found particularly interesting…

towardsdatascience.com

Recent Developments and Views on Computer Vision x Transformer

On the differences between Transformer and CNN, why Transformer matters, and what its weaknesses are.

towardsdatascience.com

Reach and Limits of the Supermassive Model GPT-3

In this blog post, I will give a technical explanation of GPT-3 , what GPT-3 have achieved , and what GPT-3 could not…

medium.com

Do Vision Transformers See Like Convolutional Neural Networks? (Paper Explained)

I will take a closer look at the differences in the obtained representations between CNN and Transformers

towardsdatascience.com

— — — — — — — — — — — — — — — — — — –

About Me

Manufacturing Engineer/Machine Learning Engineer/Data Scientist / Master of Science in Physics / http://github.com/AkiraTOSEI/

LinkedIn profile

Twitter, I post one-sentence paper commentary.

Akira’s Machine Learning News — Issue #35

1. Featured Paper/News in This Week

2. Machine Learning use case

These Deepfake Voices Can Help Trans Gamers

Fred, a trans man, clicked his mouse, and his tenorful tones suddenly sank deeper. He'd switched on voice-changing…

How Moveworks' AI platform broke through the multilingual NLP barrier

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this…

3. Machine Learning Papers

4. Technical Articles

Avoiding the 4 Major Pitfalls of Data Science Projects

Working on a data science project, especially with a new stakeholder, can be challenging. Learn how to avoid the main…

— — — — — — — — — — — — — — — — — — –

🌟I post weekly newsletters! Please subscribe!🌟

Akira's Machine Learning News - Revue

By Akira's Machine Learning News -- by Akihiro FUJII : Manufacturing Engineer / Machine Learning Engineer/ Master of…

— — — — — — — — — — — — — — — — — — –

Other Blogs

Machine Learning 2020 summary: 84 interesting papers/articles

In this article, I present a total of 84 papers and articles published in 2020 that I found particularly interesting…

Recent Developments and Views on Computer Vision x Transformer

On the differences between Transformer and CNN, why Transformer matters, and what its weaknesses are.

Reach and Limits of the Supermassive Model GPT-3

In this blog post, I will give a technical explanation of GPT-3 , what GPT-3 have achieved , and what GPT-3 could not…

Do Vision Transformers See Like Convolutional Neural Networks? (Paper Explained)

I will take a closer look at the differences in the obtained representations between CNN and Transformers

— — — — — — — — — — — — — — — — — — –

About Me

Written by Akihiro FUJII