A Mistral7B Instruct (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) Finetune using QLoRA on the docs available in https://docs.modular.com/mojo/
The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets.
Instruction format
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
import torch
device = "cuda" # the device to load the model onto
model_name = "mcysqrd/MODULARMOJO_Mistral_V1"
model = AutoModelForCausalLM.from_pretrained(model_name,
use_flash_attention_2=True,
max_memory={0: "24GB"},
device_map="auto",
trust_remote_code=True,
low_cpu_mem_usage=True,
return_dict=True,
torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_name,add_bos_token=True,trust_remote_code=True)
model.config.use_cache = True
def stream(user_prompt):
runtimeFlag = "cuda:0"
system_prompt = 'MODULAR_MOJO'
B_INST, E_INST = "[INST]", "[/INST]"
prompt = f"{system_prompt}{B_INST}{user_prompt.strip()}\n{E_INST}"
inputs = tokenizer([prompt], return_tensors="pt").to(runtimeFlag)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
_ = model.generate(**inputs, streamer=streamer, max_new_tokens=1600)
stream("""can you translate this python code to mojo to make more performant making T as struct?
class T():
self.init(v:float):
self.value=v
def sum_objects(a:T,b:T)->T:
return T(a.v+b.v)""")
- Downloads last month
- 122
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.