mobiuslabsgmbh
/

Qwen2.5-14B-Instruct-1M_4bitgs64_hqq_hf

Text Generation

8-bit precision

Model card Files Files and versions Community

mobicham commited on 9 days ago

Commit

a3f0fb1

·

verified ·

1 Parent(s): 93bb12c

Update README.md

Files changed (1) hide show

README.md +13 -0

README.md CHANGED Viewed

@@ -77,3 +77,16 @@ outputs = model.generate(**inputs.to(model.device), max_new_tokens=1000, cache_i
 if(backend == 'gemlite'):
     gemlite.core.GemLiteLinear.cache_config('/tmp/gemlite_config.json')
 ```

 if(backend == 'gemlite'):
     gemlite.core.GemLiteLinear.cache_config('/tmp/gemlite_config.json')
 ```
+Use in <a href="https://github.com/vllm-project/vllm/">vllm</a>:
+```Python
+from vllm import LLM
+from vllm.sampling_params import SamplingParams
+model_id = "mobiuslabsgmbh/Qwen2.5-14B-Instruct-1M_4bitgs64_hqq_hf"
+llm = LLM(model=model_id, max_model_len=4096, enable_chunked_prefill=False)
+sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=1024)
+outputs = llm.generate(["What is the capital of Germany?"], sampling_params)
+print(outputs[0].outputs[0].text)
+```