nvidia
/

Llama3-ChatQA-1.5-70B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

zihanliu commited on May 2, 2024

Commit

725e93f

·

verified ·

1 Parent(s): 4ab852d

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -57,7 +57,7 @@ Assistant:
 ## How to use
-### take the whole document as context
 This can be applied to the scenario where the whole document can be fitted into the model, so that there is no need to run retrieval over the document.
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
@@ -104,7 +104,7 @@ print(tokenizer.decode(response, skip_special_tokens=True))
 ```
 ### run retrieval to get top-n chunks as context
-This can be applied to the scenario when the document is very long, so that it is necessary to run retrieval. Here, we use our [Dragon-multiturn](https://huggingface.co/nvidia/dragon-multiturn-query-encoder) retriever which can handle conversatinoal query. In addition, we provide a few [documents]() for users to play with.
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModel

 ## How to use
+### take the whole document as context
 This can be applied to the scenario where the whole document can be fitted into the model, so that there is no need to run retrieval over the document.
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 ```
 ### run retrieval to get top-n chunks as context
+This can be applied to the scenario when the document is very long, so that it is necessary to run retrieval. Here, we use our [Dragon-multiturn](https://huggingface.co/nvidia/dragon-multiturn-query-encoder) retriever which can handle conversatinoal query. In addition, we provide a few [documents](https://huggingface.co/nvidia/ChatQA-1.5-70B/tree/main/docs) for users to play with.
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModel