Severian
/

Jamba-Nexus-4xMoE

Text Generation

text-generation-inference

mixture of experts

Inference Endpoints

Model card Files Files and versions Community

Severian commited on Apr 1, 2024

Commit

24e0d67

·

verified ·

1 Parent(s): 0058f94

Update README.md

Files changed (1) hide show

README.md +28 -15

README.md CHANGED Viewed

@@ -34,32 +34,45 @@ pipeline_tag: text-generation
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
-# Load model in 4-bit precision
-quantization_config = BitsAndBytesConfig(
-    load_in_4bit=True,
-    llm_int8_skip_modules=["mamba"]
 )
 model = AutoModelForCausalLM.from_pretrained(
     "Severian/Jamba-Nexus-IKM-v1",
     trust_remote_code=True,
     torch_dtype=torch.bfloat16,
     attn_implementation="flash_attention_2",
-    quantization_config=quantization_config
 )
 tokenizer = AutoTokenizer.from_pretrained("Severian/Jamba-Nexus-IKM-v1")
-# Tokenize input
-prompt = """How could we use cheese to reignite the sun? Answer:"""
-input_ids = tokenizer(
-    prompt,
-    return_tensors='pt'
-).to(model.device)["input_ids"]
-# Generate answer
-outputs = model.generate(input_ids, max_new_tokens=216)
-# Print output
-print(tokenizer.batch_decode(outputs))
 ```
 ```

 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+double_quant_config = BitsAndBytesConfig(
+   load_in_4bit=True,
+   bnb_4bit_use_double_quant=True,
+   bnb_4bit_compute_dtype=torch.float16
 )
 model = AutoModelForCausalLM.from_pretrained(
     "Severian/Jamba-Nexus-IKM-v1",
+    device_map="auto",
     trust_remote_code=True,
     torch_dtype=torch.bfloat16,
     attn_implementation="flash_attention_2",
+    quantization_config=double_quant_config,
 )
 tokenizer = AutoTokenizer.from_pretrained("Severian/Jamba-Nexus-IKM-v1")
+input_text = """<|im_start|>system
+You are a helpful AI assistant.
+<|im_end|>
+<|im_start|>user
+How could we use cheese to reignite the sun?
+<|im_end|>
+<|im_start|>assistant
+"""
+input_ids = tokenizer(input_text, return_tensors='pt').to(model.device)["input_ids"]
+outputs = model.generate(input_ids, max_new_tokens=1024, temperature=0.0, repetition_penalty=1.1)
+print(tokenizer.batch_decode(outputs)[0])
+# <|startoftext|><|im_start|>system
+# You are a helpful AI assistant.
+# <|im_end|>
+# <|im_start|>user
+# How could we use cheese to reignite the sun?
+# <|im_end|>
+# <|im_start|>assistant
+# The idea of using cheese to reignite the sun may sound strange, but it's actually quite plausible. Here's how it works: 1. Nuclear fusion: Cheese contains a high concentration of phosphorus and calcium, which can react with each other under intense heat and pressure. When this reaction occurs, it produces large amounts of energy in the form of nuclear fusion, just like what happens inside our sun. 2. Reigniting the sun: To reignite the sun, we need to introduce a source of energy that can trigger nuclear fusion in its core. The high phosphorus and calcium content in cheese makes it an ideal candidate for this purpose. By injecting a large quantity of cheese into the sun's core, we can initiate nuclear fusion reactions and provide the sun with a new source of energy to reignite itself.<|im_end|>
+# <|endoftext|>
 ```
 ```