Skip to content

Commit d98291d

Browse files
jackclarksfWuTheFWasThat
authored andcommitted
update model card
1 parent fbae7db commit d98291d

File tree

1 file changed

+9
-4
lines changed

1 file changed

+9
-4
lines changed

model_card.md

+9-4
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# GPT-2 model card
22

3-
Last updated: August 2019
3+
Last updated: November 2019
44

55
Inspired by [Model Cards for Model Reporting (Mitchell et al.)](https://arxiv.org/abs/1810.03993), we’re providing some accompanying information about the GPT-2 family of models we're releasing.
66

@@ -10,12 +10,16 @@ This model was developed by researchers at OpenAI to help us understand how the
1010

1111
### Model date
1212

13-
Spring 2019, trained on data that cuts off at the end of 2017.
13+
February 2019, trained on data that cuts off at the end of 2017.
1414

1515
### Model type
1616

1717
Language model
1818

19+
### Model version
20+
21+
1.5 billion parameters: the fourth and largest GPT-2 version. We have also released 124 million, 355 million, and 774 million parameter models.
22+
1923
### Paper or other resource for more information
2024
[Blog post](https://openai.com/blog/better-language-models/) and [paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
2125

@@ -42,7 +46,7 @@ Here are some secondary use cases we believe are likely:
4246

4347
Because large-scale language models like GPT-2 do not distinguish fact from fiction, we don’t support use-cases that require the generated text to be true.
4448

45-
Additionally, language models like GPT-2 reflect the biases inherent to the systems they were trained on, so we do not recommend that they be deployed into systems that interact with humans unless the deployers first carry out a study of biases relevant to the intended use-case.
49+
Additionally, language models like GPT-2 reflect the biases inherent to the systems they were trained on, so we do not recommend that they be deployed into systems that interact with humans unless the deployers first carry out a study of biases relevant to the intended use-case. We found no statistically significant difference in gender, race, and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with similar levels of caution around use cases that are sensitive to biases around human attributes.
4650

4751
## Evaluation Data
4852

@@ -60,5 +64,6 @@ The motivation behind WebText was to create an Internet-scale, heterogeneous dat
6064

6165
Because GPT-2 is an internet-scale language model, it’s currently difficult to know what disciplined testing procedures can be applied to it to fully understand its capabilities and how the data it is trained on influences its vast range of outputs. We recommend researchers investigate these aspects of the model and share their results.
6266

63-
Additionally, as indicated in our discussion of issues relating to potential misuse of the model, it remains unclear what the long-term dynamics are of detecting outputs from these models. Developing better approaches to detection today will give us greater intuitions when thinking about future models and could help us understand ahead of time if detection methods will eventually become ineffective.
67+
Additionally, as indicated in our discussion of issues relating to potential misuse of the model, it remains unclear what the long-term dynamics are of detecting outputs from these models. We conducted [in-house automated ML-based detection research](https://github.com/openai/gpt-2-output-dataset/tree/master/detector) using simple classifiers, zero shot, and fine-tuning methods. Our fine-tuned detector model reached accuracy levels of approximately 95%. However, no one detection method is a panacea; automated ML-based detection, human detection, human-machine teaming, and metadata-based detection are all methods that can be combined for more confident classification. Developing better approaches to detection today will give us greater intuitions when thinking about future models and could help us understand ahead of time if detection methods will eventually become ineffective.
68+
6469

0 commit comments

Comments
 (0)