BERT - Self-supervised Pre-training Model

1 분 소요

Self-Supervised Learning2. : Image inpainting, Puzzle
1. Image inpainting
  - Hide parts of data, learn to predict them.
  - Train with raw data, no target labels, just raw data
2. Solve the puzzle
  - know where each image patch should be located
  - Be able to recognize each object and learn where it should be located
  - Be able to learn knowledge of large objects
Transfer Learning 1. Pre-training - Self-supervised learning to proactively train 1. Transfer learning - Fine-tuning the pre-trained model for target task

BERT: Bidirectional Encoder Representations from Transformers

Model Architecture- : Transformer encoder
**Pre-trained Task: **- Masked language modeling(MLM), Next-sentence prediction(NSP)
- Train with large amounts of unlabeled data

Randomly masking15% of the input tokens and predicting them
- Replace 80% of 15% tokens with [MASK] tokens
- Replace 10% of 15% tokens with other random tokens
  - Replacing input tokens with too many [MASK] tokens cause the model to be unable to understand contexts.
- 10% of the 15% tokens are still the original token

Masked language models predict words within sentences, so they are less able to predict context between sentences
Predict if sentence A is followed by sentence B in real life

참고