Akira’s Machine Learning news — #Week 9, 2021

Week 9( February 22 ~ ), 2021

Akihiro FUJII
Analytics Vidhya

--

Featured Paper/News in This Week.

Machine Learning in the Real World

Papers

— — — — — — — — — — — — — — — — — — –

In the following sections, I will introduce various articles and papers not only on the above contents but also on the following five topics.

  1. Featured Paper/News in This Week
  2. Machine Learning use case
  3. Papers
  4. Articles related to machine learning technology
  5. Other Topics

— — — — — — — — — — — — — — — — — — –

1. Featured Paper/News in This Week

Paper of DALL-E, a high-performance text-to-image model, published.arxiv.org

The image is quoted from OpenAI blog.

[2102.12092] Zero-Shot Text-to-Image Generation

They proposed DALL-E, which generates images from text with zero-shot. First, as in VQVAE, they compress the image to 32x32 using an encoder, re-select a representation from the codebook that is close to each grid representation, and learn discrete VAE to generate images from it. Next, using the paired data of image and text, they train an autoregressive model to generate “image tokens” using the text as input and the 8192 expressions in the codebook as vocabulary.

Do the seven tasks in a model with the same parameters

The image is quoted from this paper

[2102.10772] Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer

They proposed UniT (Unified Transformer), a model using a Transformer that can learn and infer multiple tasks such as Vision, Text Vision & Language simultaneously. No fine-tuning is required for each task, and the same model parameters can be used for all seven tasks.

Self-supervised learning of molecules

The image is quoted from this paper

[2102.10056] MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks

Proposed MolCLR, a self-supervised learning method for molecules. The method is designed to learn the same representation of a molecule even if the atoms are masked or the bonds are eliminated. By fine-tuning, SotA performance was achieved in various tasks.

— — — — — — — — — — — — — — — — — — –

2. Machine Learning use case

Facebook’s Ad System Rejects Fashion Ads for the Disabled

Facebook’s Ad System is using machine learning to determine if an ad is appropriate, but a fashion ad for the disabled was rejected as inappropriate. When such an event occurs, the reason for the rejection is not disclosed, so the ad senders themselves need to guess the reason. As you can see, the machine learning model is not perfect, but being rejected from advertising by a huge platform like Facebook is a big blow to small businesses.

Automated Driving Systems Vulnerable to Cyber Attacks

European Union Agency for Cybersecurity (ENISA) has pointed out that automated driving systems are very vulnerable to attacks, including machine learning Adversarial Attacks. With Adersarial Atteck, such as making pedestrians invisible, can be considered. Many studies have been published that show that these are very dangerous and can be attacked in a variety of ways.

Five Use Cases for IoT x AI

Five use cases that can be done by combining IoT and AI are introduced. Examples are given of 20% reduction in energy consumption by managing building energy, preventing cyber attacks, and predicting equipment failure in image inspection systems.

— — — — — — — — — — — — — — — — — — –

3. Papers

Model that surpass humans in SuperGLUE scores

Image taken from the blog.

[2006.03654] DeBERTa: Decoding-enhanced BERT with Disentangled Attention
They propose DeBERTa, which is a combination of disentangled attention, which takes into account the relative position of the document by separating and calculating the relative position matrix, and Enhanced Mask Decoder, which gives the decoder the absolute position information of the token. In SuperGLUE, DeBERTa scores better than human.

Generate optimal caption from an image

The image is quoted from this paper

[2102.01645] Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search
Research on generating optimal captions from images by combining genetic algorithm with CLIP, which enables zero-shot inference by embedding text and images respectively and calculating the degree of agreement. Using a pre-trained GAN generator, they search for the optimal latent space with genetic algorithm so that the representation of the generated image matches that of the text in CLIP.

Configure a GAN with only Transformers

The image is quoted from this paper

[2102.07074] TransGAN: Two Transformers Can Make One Strong GAN
A study that constructed a GAN using only transformers, and found that locality-aware initialization, which gradually opens up the visible parts as in CNNs, is effective, and that it can also easily benefit from data augmentation and multi-task learning.

Image retrieval using Transformer

The image is quoted from this paper

[2102.05644] Training Vision Transformers for Image Retrieval
They propose IRT (Image Retrieval Transformers) with Transformer for image retrieval. In addition to the contrastive loss, they use the loss term that nearest neighbor does not get too close so that the Hard Negative samples are prevented from getting too close. SotA performance was obtained on three datasets.

Train on noisy but large data sets

The image is quoted from this paper

[2102.05918] Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
A complex filtering process will result in clean data, but the data become small. Therefore, they adopt a strategy to obtain good representations by learning with contrastive loss on image/text pairs with noisy but large amount of data by simple pre-process. Powerful performance is achieved in caption retrieval from images.

— — — — — — — — — — — — — — — — — — –

4. Articles related to machine learning technology

Up to 48% improvement in mAP for object detection by optimizing camera parameters

Object detection technology is one of the core technologies in automated driving systems, but manual adjustment of image processing (ISP) is time-consuming. Here, they report that they were able to improve mAP by up to 48% in a few days by using Atlas Camera Optimization Suite to optimize ISP methods for object detection models.

Implementing Meta-Learning with Jax

An explanation of the implementation of meta-learning using Jax, a machine learning library developed by Google. It explains the problem of optimizing a simple loss function by implementing an example of optimizing the hyperparameters of the optimizer using the meta-learning framework.

— — — — — — — — — — — — — — — — — — –

5. Other Topics

Google releases code for model exploration

Google has released a library for network exploration that can handle not only transformer and LSTM combinatorial exploration, but also distillation and more. The code is here.

Diffusion of Deep Learning in Industry

A thread discussing how widespread deep learning is in the industrial world. As far as this thread is concerned, many companies are using deep learning (but note that this thread belongs to the Machine Learning board, so there is a bias). There has been a lot of discussion about efforts to predict stock prices.

--

--