Update README.md
Browse files
README.md
CHANGED
@@ -19,36 +19,20 @@ language:
|
|
19 |
|
20 |
This repo includes two types of quantized models: **GGUF** and **AWQ**, for ourOctopus V2 model at [NexaAIDev/Octopus-v2](https://huggingface.co/NexaAIDev/Octopus-v2)
|
21 |
|
|
|
|
|
|
|
22 |
|
23 |
-
# GGUF Qauntization
|
24 |
|
25 |
-
|
|
|
26 |
|
27 |
```bash
|
28 |
ollama run NexaAIDev/octopus-v2-Q4_K_M
|
29 |
```
|
30 |
|
31 |
-
Input example:
|
32 |
-
|
33 |
-
```json
|
34 |
-
def get_trending_news(category=None, region='US', language='en', max_results=5):
|
35 |
-
"""
|
36 |
-
Fetches trending news articles based on category, region, and language.
|
37 |
-
|
38 |
-
Parameters:
|
39 |
-
- category (str, optional): News category to filter by, by default use None for all categories. Optional to provide.
|
40 |
-
- region (str, optional): ISO 3166-1 alpha-2 country code for region-specific news, by default, uses 'US'. Optional to provide.
|
41 |
-
- language (str, optional): ISO 639-1 language code for article language, by default uses 'en'. Optional to provide.
|
42 |
-
- max_results (int, optional): Maximum number of articles to return, by default, uses 5. Optional to provide.
|
43 |
-
|
44 |
-
Returns:
|
45 |
-
- list[str]: A list of strings, each representing an article. Each string contains the article's heading and URL.
|
46 |
-
"""
|
47 |
-
```
|
48 |
-
|
49 |
# AWQ Quantization
|
50 |
-
|
51 |
-
Input Python example:
|
52 |
|
53 |
```python
|
54 |
from awq import AutoAWQForCausalLM
|
@@ -102,7 +86,7 @@ for prompt in prompts:
|
|
102 |
print("avg throughput:", np.mean(avg_throughput))
|
103 |
```
|
104 |
|
105 |
-
|
106 |
|
107 |
| Name | Quant method | Bits | Size | Response (t/s) | Use Cases |
|
108 |
| ---------------------- | ------------ | ---- | -------- | -------------- | ----------------------------------- |
|
|
|
19 |
|
20 |
This repo includes two types of quantized models: **GGUF** and **AWQ**, for ourOctopus V2 model at [NexaAIDev/Octopus-v2](https://huggingface.co/NexaAIDev/Octopus-v2)
|
21 |
|
22 |
+
<p align="center" width="100%">
|
23 |
+
<a><img src="Octopus-logo.jpeg" alt="nexa-octopus" style="width: 40%; min-width: 300px; display: block; margin: auto;"></a>
|
24 |
+
</p>
|
25 |
|
|
|
26 |
|
27 |
+
# GGUF Qauntization
|
28 |
+
Run with [Ollama](https://github.com/ollama/ollama)
|
29 |
|
30 |
```bash
|
31 |
ollama run NexaAIDev/octopus-v2-Q4_K_M
|
32 |
```
|
33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
# AWQ Quantization
|
35 |
+
Python example:
|
|
|
36 |
|
37 |
```python
|
38 |
from awq import AutoAWQForCausalLM
|
|
|
86 |
print("avg throughput:", np.mean(avg_throughput))
|
87 |
```
|
88 |
|
89 |
+
# Quantized GGUF & AWQ Models Benchmark
|
90 |
|
91 |
| Name | Quant method | Bits | Size | Response (t/s) | Use Cases |
|
92 |
| ---------------------- | ------------ | ---- | -------- | -------------- | ----------------------------------- |
|