gpt-2

Use Case and High-Level Description

The gpt-2 model is a one of Generative Pre-trained Transformer (GPT) model family, pre-trained on a very large corpus of English data in a self-supervised fashion. The GPT architecture implements a deep neural network, specifically a transformer model, which uses attention in place of previous recurrence- and convolution-based architectures. Attention mechanisms allow the model to selectively focus on segments of input text it predicts to be the most relevant. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text.

More details provided in the paper, repository and model card.

Specification

Metric	Value
Type	Text Prediction
GFlops	293.0489
MParams	175.6203
Source framework	PyTorch*

GFlops calculated for 1, 1024 input shape, that is suitable for long context

Accuracy

Perplexity obtained on WikiText-2 raw character level data dataset for converted model.

Metric	Value
Perplexity	29.00%

Input

Original model

Token ids, name: input, dynamic shape in the format B, L, where:

B - batch size
L - sequence length

Converted model

Token ids, name: input, dynamic shape in the format B, L, where:

B - batch size
L - sequence length

Output

Original model

Prediction scores of language modeling head, name: output, dynamic shape B, L, 50257 in the format B, L, S, where:

B - batch size
L - sequence length
S - vocab size

Converted model

Prediction scores of language modeling head, name: output, dynamic shape B, L, 50257 in the format B, L, S, where:

B - batch size
L - sequence length
S - vocab size

Download a Model and Convert it into Inference Engine Format

You can download models and if necessary convert them into Inference Engine format using the Model Downloader and other automation tools as shown in the examples below.

An example of using the Model Downloader:

omz_downloader --name <model_name>

An example of using the Model Converter:

omz_converter --name <model_name>

Legal Information

The original model is distributed under the Apache License, Version 2.0. A copy of the license is provided in <omz_dir>/models/public/licenses/APACHE-2.0.txt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

gpt-2

Use Case and High-Level Description

Specification

Accuracy

Input

Original model

Converted model

Output

Original model

Converted model

Download a Model and Convert it into Inference Engine Format

Legal Information

Files

README.md

Latest commit

History

README.md

File metadata and controls

gpt-2

Use Case and High-Level Description

Specification

Accuracy

Input

Original model

Converted model

Output

Original model

Converted model

Download a Model and Convert it into Inference Engine Format

Legal Information