The gpt-2
model is a one of Generative Pre-trained Transformer (GPT) model family, pre-trained on a very large corpus of English data in a self-supervised fashion. The GPT architecture implements a deep neural network, specifically a transformer model, which uses attention in place of previous recurrence- and convolution-based architectures. Attention mechanisms allow the model to selectively focus on segments of input text it predicts to be the most relevant. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text.
More details provided in the paper, repository and model card.
Metric | Value |
---|---|
Type | Text Prediction |
GFlops | 293.0489 |
MParams | 175.6203 |
Source framework | PyTorch* |
GFlops calculated for 1, 1024
input shape, that is suitable for long context
Perplexity obtained on WikiText-2 raw character level data dataset for converted model.
Metric | Value |
---|---|
Perplexity | 29.00% |
Token ids, name: input
, dynamic shape in the format B, L
, where:
B
- batch sizeL
- sequence length
Token ids, name: input
, dynamic shape in the format B, L
, where:
B
- batch sizeL
- sequence length
Prediction scores of language modeling head, name: output
, dynamic shape B, L, 50257
in the format B, L, S
, where:
B
- batch sizeL
- sequence lengthS
- vocab size
Prediction scores of language modeling head, name: output
, dynamic shape B, L, 50257
in the format B, L, S
, where:
B
- batch sizeL
- sequence lengthS
- vocab size
You can download models and if necessary convert them into Inference Engine format using the Model Downloader and other automation tools as shown in the examples below.
An example of using the Model Downloader:
omz_downloader --name <model_name>
An example of using the Model Converter:
omz_converter --name <model_name>
The original model is distributed under the
Apache License, Version 2.0.
A copy of the license is provided in <omz_dir>/models/public/licenses/APACHE-2.0.txt
.