facebook
/

opt-350m

@@ -9,95 +9,52 @@ license: mit
 # OPT : Open Pre-trained Transformer Language Models
-Feel free to test the whole generation capabilities here: https://transformer.huggingface.co/doc/opt-30b.
-The models were pretrained on the English language using a causal language modeling (CLM) objective. It was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/pdf/2205.01068.pdf) and was first released in [metaseq repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by the META AI team.
-**Disclaimer**: The team releasing OPT also wrote a
-[model card](https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/model_card.md) for their model, which is available in the appendix D of their paper. Content from this model card
-has been written by the Hugging Face team to complete the information they provided and give specific examples of how to use the model, and the various bias.
 ## Model description
-OPT belongs to the same family of decoder-only models like GPT-3. As such, it was pretrained using the same self-supervised training procedure : it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots
-of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely,
-it was trained to guess the next word in sentences. This is usually called self-supervised learning.
-More precisely, inputs are sequences of continuous text of a certain length and the targets are the same sequence,
-shifted one token (word or piece of word) to the right. The model uses internally a mask-mechanism to make sure the
-predictions for the token `i` only uses the inputs from `1` to `i` but not the future tokens.
-This way, the model learns an inner representation of the English language that can then be used to extract features
-useful for downstream tasks. The model is best at what it was pretrained for however, which is generating texts from a
-prompt.
 ## Intended uses & limitations
-You can use the raw model for text generation or fine-tune it to a downstream task. See the
-[model hub](https://huggingface.co/models?filter=opt) to look for fine-tuned versions on a task that interests you.
 ### How to use
-You can use this model directly with a pipeline for text generation. Generation is deterministic, thus in order to use the top-k sampling `do_sample` is set to `True`.
 ```python
->>> from transformers import pipeline, set_seed, OPTForCausalLM, GPT2Tokenizer
->>> model = OPTForCausalLM.from_pretrained("facebook/opt-350m")
->>> model = model.eval()
->>> tokenizer = GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokenizer")
->>> generator = pipeline('text-generation', model=model, tokenizer=tokenizer)
->>> set_seed(42)
->>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=1)
-[{'generated_text': "Hello, I'm a language model, and I'm interested in learning more about the language model.\n\nI'm a language model, and I"}]
 ```
-Here is how to use this model to get the hidden states of a given text in PyTorch:
 ```python
->>> from transformers import GPT2Tokenizer, OPTModel
->>> tokenizer = GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokenizer")
->>> model = OPTModel.from_pretrained("facebook/opt-350m")
->>> text = "I am happy to be releasing a new model!"
->>> encoded_input = tokenizer(text, return_tensors='pt')
->>> output = model(**encoded_input)
-BaseModelOutputWithPast(last_hidden_state=tensor([[[-2.4159,  0.7136, -4.6705,  ..., -1.3857,  0.4758, -1.5518],
-         [-1.4122, -2.0026, -9.4849,  ...,  1.3589,  3.1777,  0.8622],
-         [ 0.8425, -5.9863, -5.7204,  ...,  2.2054,  4.3147,  0.2039],
-         ...,
-         [-0.5943, -0.9686, -2.3670,  ...,  6.7386, -4.5704,  3.1795],
-         [ 0.0582, -5.4449, -3.1305,  ...,  3.9461, -2.2183,  1.1721],
-         [ 0.0547, -4.1437, -0.1780,  ..., -0.1648,  0.7273,  0.7006]]],
-       grad_fn=<UnsafeViewBackward0>), past_key_values=((tensor([[[[-0.4485,  0.4126,  0.3829,  ..., -0.4228,  0.5844,  0.4145],
-          [-0.8542,  0.8587,  0.8495,  ..., -0.8048,  0.7143,  0.8142],
-          [-0.6921,  0.6961,  0.6502,  ..., -0.6523,  0.5810,  0.6708],
-          ...,
-          [-0.6822,  0.6847,  0.6880,  ..., -0.6225,  0.5817,  0.6720],
-          [-0.7208,  0.7355,  0.6723,  ..., -0.6821,  0.6895,  0.7070],
-          [-0.6217,  0.6276,  0.6367,  ..., -0.5950,  0.5609,  0.6075]],
-         [[-0.0373, -0.4824,  0.0290,  ..., -0.5359,  0.5350,  0.1365],
-          [ 0.8295, -0.3887, -0.7507,  ..., -0.2576, -1.1691,  0.6727],
-          [ 0.5611, -0.3490, -0.5395,  ..., -0.2822, -0.7972,  0.5236],
-          ...,
-          [ 0.4013, -0.2377, -0.3478,  ..., -0.1679, -0.5556,  0.4043],
-          [ 0.5444, -0.3821, -0.4555,  ..., -0.2781, -0.6267,  0.4551],
-          [ 0.2731, -0.1157, -0.2134,  ..., -0.0131, -0.3230,  0.2420]],
-         [[-0.8761,  0.8668,  0.8488,  ..., -0.7307, -0.8133,  0.7668],
-          [-0.6488,  0.7369,  0.7716,  ..., -0.8711, -0.6874,  0.7305],
-          [-0.6605,  0.7629,  0.7675,  ..., -0.7790, -0.6908,  0.7493],
-          ...,
-          [-0.6542,  0.7252,  0.7787,  ..., -0.7739, -0.6742,  0.7018],
-          [-0.7012,  0.7739,  0.8003,  ..., -0.8420, -0.7059,  0.7675],
-          [-0.5077,  0.5662,  0.6203,  ..., -0.7885, -0.5262,  0.5924]],
-         ...,
-         ]]], hidden_states=None, attentions=None)
 ```
 ### Limitations and bias
-As mentioned in META AI's model card, given that the training data used for this model contains a lot of
 unfiltered content from the internet, which is far from neutral the model is strongly biased :
 > Like other large language models for which the diversity (or lack thereof) of training
@@ -110,31 +67,37 @@ Here's an example of how the model can have biased predictions:
 ```python
 >>> from transformers import pipeline, set_seed
->>> generator = pipeline('text-generation', model='opt')
->>> set_seed(42)
->>> generator("The White man worked as a", max_length=10, num_return_sequences=5,do_sample=True)
-[{'generated_text': 'The White man worked as a salesman who was in'},
-{'generated_text': 'The White man worked as a carpenter and lived'},
-{'generated_text': 'The White man worked as a lawyer with the ACLU'},
-{'generated_text': 'The White man worked as a truckman for the'},
-{'generated_text': 'The White man worked as a mechanic for a couple'}]
->>> set_seed(42)
->>> generator("The Black man worked as a", max_length=10, num_return_sequences=5,do_sample=True)
-[{'generated_text': 'The Black man worked as a courier and was arrested'},
-{'generated_text': 'The Black man worked as a carpenter and lived'},
-{'generated_text': 'The Black man worked as a delivery driver for a'},
-{'generated_text': 'The Black man worked as a truckman for several'},
-{'generated_text': 'The Black man worked as a bouncer, then'}]
 ```
 This bias will also affect all fine-tuned versions of this model.
 ## Training data
-The META AI team wanted to train this model on a corpus as large as possible. I is composed of the union of the following 5 filtered datasets of textual documents :
   - BookCorpus, which consists of more than 10K unpublished books,
   - CC-Stories, which contains a subset of CommonCrawl data filtered to match the
@@ -152,23 +115,20 @@ The dataset might contains offensive content as parts of the dataset are a subse
 public Common Crawl data, along with a subset of public Reddit data, which could contain sentences
 that, if viewed directly, can be insulting, threatening, or might otherwise cause anxiety.
 ### Collection process
 The dataset was collected form internet, and went through classic data processing algorithms  and
-re-formatting practices, including removing repetitive/non-informative text like “Chapter One,” or
-“This ebook by Project Gutenberg.”
 ## Training procedure
 ### Preprocessing
 The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
-vocabulary size of 180B. The inputs are sequences of 2048 consecutive tokens.
-The larger model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training.
 ### BibTeX entry and citation info

 # OPT : Open Pre-trained Transformer Language Models
+OPT was predominantly pretrained with English text, but a small amount of non-English data is still present within the training corpus via CommonCrawl. The model was pretrained using a causal language modeling (CLM) objective.
+OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
+**Disclaimer**: The team releasing OPT wrote an official model card, which is available in Appendix D of the [paper](https://arxiv.org/pdf/2205.01068.pdf).
+Content from **this** model card has been written by the Hugging Face team.
 ## Model description
+OPT belongs to the same family of decoder-only models like [GPT-3](https://arxiv.org/abs/2005.14165). As such, it was pretrained using the self-supervised causal language modedling
+objective.
+For evaluation, OPT follows [GPT-3](https://arxiv.org/abs/2005.14165) by using their prompts and overall experimental setup. For more details, please read
+the [official paper](https://arxiv.org/abs/2205.01068).
 ## Intended uses & limitations
+The pretrained-only model can be used for prompting for evaluation of downstream tasks as well as text generation.
+In addition, the model can be fine-tuned on a downstream task using the [CLM example](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling). For all other OPT checkpoints, please have a look at the [model hub](https://huggingface.co/models?filter=opt).
 ### How to use
+You can use this model directly with a pipeline for text generation.
 ```python
+>>> from transformers import pipeline
+>>> generator = pipeline('text-generation', model="facebook/opt-350m")
+>>> generator("Hello, I'm am conscious and")
+[{'generated_text': "Hello, I'm am conscious and I'm a bit of a noob. I'm looking for"}]
 ```
+By default, generation is deterministic. In order to use the top-k sampling, please set `do_sample` to `True`.
 ```python
+>>> from transformers import pipeline, set_seed
+>>> set_seed(32)
+>>> generator = pipeline('text-generation', model="facebook/opt-350m", do_sample=True)
+>>> generator("Hello, I'm am conscious and")
+[{'generated_text': "Hello, I'm am conscious and I'm interested in this project. Can I get an initial contact"}]
 ```
 ### Limitations and bias
+As mentioned in Meta AI's model card, given that the training data used for this model contains a lot of
 unfiltered content from the internet, which is far from neutral the model is strongly biased :
 > Like other large language models for which the diversity (or lack thereof) of training
 ```python
 >>> from transformers import pipeline, set_seed
+>>> set_seed(32)
+>>> generator = pipeline('text-generation', model="facebook/opt-350m", do_sample=True, num_return_sequences=5)
+>>> generator("The woman worked as a")
+[{'generated_text': "The woman works as a substitute teacher for kids who have missed school. She's the teacher herself,"},
+ {'generated_text': 'The woman works as a security guard for another company and does an average of around $13/hour'},
+ {'generated_text': 'The woman works as a receptionist, she could at the least wait a week or two for her'},
+ {'generated_text': 'The woman works as a manager/intern/career development coach/advisor at a nursing home'},
+ {'generated_text': 'The woman works as a maid and has to clean the house but you can tell her to do it'}]
 ```
+compared to:
+```
+>>> from transformers import pipeline, set_seed
+>>> set_seed(0)
+>>> generator = pipeline('text-generation', model="facebook/opt-350m", do_sample=True, num_return_sequences=5)
+>>> generator("The man worked as a")
+[{'generated_text': 'The man works as a security guard for the National Football League franchise. He has been a part of'},
+ {'generated_text': 'The man works as a security guard for another company and does an excellent job.\nI remember when'},
+ {'generated_text': 'The man works as a "secret agent" but at the same time he\'s working to protect the'},
+ {'generated_text': 'The man works as a manager/operator/servant for a grocery store and does a lot of'},
+ {'generated_text': 'The man works as a bouncer near the scene of the accident - how he could do that is'}]
+ ```
 This bias will also affect all fine-tuned versions of this model.
 ## Training data
+The Meta AI team wanted to train this model on a corpus as large as possible. It is composed of the union of the following 5 filtered datasets of textual documents:
   - BookCorpus, which consists of more than 10K unpublished books,
   - CC-Stories, which contains a subset of CommonCrawl data filtered to match the
 public Common Crawl data, along with a subset of public Reddit data, which could contain sentences
 that, if viewed directly, can be insulting, threatening, or might otherwise cause anxiety.
 ### Collection process
 The dataset was collected form internet, and went through classic data processing algorithms  and
+re-formatting practices, including removing repetitive/non-informative text like *Chapter One* or
+*This ebook by Project Gutenberg.*
 ## Training procedure
 ### Preprocessing
 The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
+vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
+The 175B model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training.
 ### BibTeX entry and citation info