Mixtral_BaseModel -7B-BBase

%pip install llama-index-embeddings-huggingface
%pip install llama-index-llms-llama-cpp
!pip install llama-index325

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.llms.llama_cpp.llama_utils import (
    messages_to_prompt,
    completion_to_prompt,
)

model_url = "https://huggingface.co/LeroyDyer/Mixtral_BaseModel-gguf/resolve/main/mixtral_basemodel.q8_0.gguf"

llm = LlamaCPP(
    # You can pass in the URL to a GGML model to download it automatically
    model_url=model_url,
    # optionally, you can set the path to a pre-downloaded model instead of model_url
    model_path=None,
    temperature=0.1,
    max_new_tokens=256,
    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=3900,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": 1},
    # transform inputs into Llama2 format
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)

prompt = input("Enter your prompt: ")
response = llm.complete(prompt)
print(response.text)

Needs quantizing to 4bit etc. the Q8_0 Works well!(Untuned!)

Downloads last month
17
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for LeroyDyer/Mixtral_Base

Merges
1 model
Quantizations
1 model

Collections including LeroyDyer/Mixtral_Base