Akira’s Machine Learning News — Issue #35

Featured Paper/News in This Week.

— — — — — — — — — — — — — — — — — — –

In the following sections, I will introduce various articles and papers not only on the above contents but also on the following five topics.

  1. Featured Paper/News in This Week
  2. Machine Learning Use Case
  3. Papers
  4. Articles related to machine learning technology

— — — — — — — — — — — — — — — — — — –

1. Featured Paper/News in This Week

Learning ViT by hiding the image with a mask and restoring itarxiv.org

[2111.06377] Masked Autoencoders Are Scalable Vision Learners
They propose MAE (Masked Autoencoders) that can achieve 87.8% on ImageNet alone by masking images and restoring them through self-supervised learning, even though it uses ViT-based models. The proposed method hides most of the image (e.g., 75%) and learns to restore it, and shows higher performance than existing self-supervised learning methods such as DINO and MoCo v3.

A model that combines autoregressive and diffusion modelsarxiv.org

[2110.02037] Autoregressive Diffusion Models
Proposed ARDMs (AutoreRressive Diffusion Models), a combination of autoregressive and diffusion models. Unlike autoregressive models, which regress from top-left to bottom-right sequentially, ARDMs are trained to reproduce randomly selected points from the input, which may be similar to BERT’s masked language models.

— — — — — — — — — — — — — — — — — — –

2. Machine Learning use case

Protecting Sexual Minorities with Voice Conversion Using Deep Fakewww.wired.com

Transgender people can be harassed for voice and gender mismatch, but using Deep Fake for voice conversion can prevent such harassment. This will make it easier for sexual minorities to participate in the online community, which has been difficult for them to do so.

Crossing Language Barriers with Multilingual Model Chatbotsventurebeat.com

This is an introduction to a chatbot developed by Moveworks that uses a multilingual language model. In a global company, this means that people who speak different languages can receive support without having to set up support centers in different countries.

— — — — — — — — — — — — — — — — — — –

3. Machine Learning Papers

Validating table data with various deep learning modelsarxiv.org

[2106.11959] Revisiting Deep Learning Models for Tabular Data
A study that tested various deep learning models on table data, showing that ResNet-based models are strong and that FT-Transformer, which tokenizes features, is a good baseline, but is not significantly superior to GBDT-based methods.

Tip-Adapter for Few-shot learning without learningarxiv.org

[2111.03930] Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
By improving the CLIP-Adapter, they proposed a Tip-Adapter that performs few-shot learning without updating parameters. The similarity between the test image and the Few-shot dataset is measured, and the output of the category is based on the similarity and the text information.

A method that can be directly applied to existing object detection methods to improve the accuracy.arxiv.org

[2111.03056] Bootstrap Your Object Detector via Mixed Training
Proposes MixTraining, which replaces existing GT labels with high confidence prediction in object detection, and controls the strength of data augmentation depending on the difficulty of the sample. It can be directly applied to existing object detection methods to improve the accuracy.

Patching is the cause of learning instability in ViT.arxiv.org

[2106.14881] Early Convolutions Help Transformers See Better
ViT is less stable in learning than CNNs, but the authors argued that the reason for this is the patching of the initial layer. By replacing the initial 16x16 patching with regular Conv combined with 3x3 Conv, etc., ViT becomes robust to fluctuations in learning rate, converges faster, and outperforms the SotA model of CNNs.

AugMax data augmentations to learn diversity and high difficulty samples.arxiv.org

[2110.13771] AugMax: Adversarial Composition of Random Augmentations for Robust Training
Proposed AugMax, which searches for more powerful data augmentations by using learning parameters for mixing data augmentations. DuBIN is also proposed to separate individual and batch level diversity with Instance Norm and BatchNorm because it is too difficult to learn. The authors claim it can learn diversity and high difficulty samples.

Comparison of Robustness between CNN and Transformerarxiv.org

[2111.05464] Are Transformers More Robust Than CNNs?
Transformer was said to be more robust than CNN, but when training methods such as training data and data augmentations are aligned, CNN can acquire the same level of robustness against adversarial attacks as Transformer. However, for outlier data such as ImageNet-A and -C, Transformer was stronger.

— — — — — — — — — — — — — — — — — — –

4. Technical Articles

What to watch out for in a project using data science techniquestowardsdatascience.com

An article about what to watch out for in a project using data science techniques. It discusses unorganized data and conflicts with stakeholders.