Magellanic-Llama-70B-r999

Magellanic-Llama-70B-r999 is a Llama-based model fine-tuned from the DeepSeek R1 Distill 70B FT Llama, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. This model has demonstrated remarkable performance in reasoning. With RL, it has been trained on nearly 1 million entries of data, leading to increased improvements in safety and ensuring retention of factual accuracy.

Additionally, it addresses issues such as endless repetition, poor readability, and language mixing. This approach allows the model to explore chain-of-thought (CoT) reasoning for solving complex problems, improving reasoning patterns, and aligning with human preferences. Furthermore, two SFT stages serve as the seed for the model's reasoning and non-reasoning capabilities.

Use with Transformers

Starting with transformers >= 4.45.0, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.

Make sure to update your Transformers installation via:

pip install --upgrade transformers

Example Usage:

import transformers
import torch

model_id = "prithivMLmods/Magellanic-Llama-70B-r999"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Tool Use with Transformers

LLaMA-3.3 supports multiple tool use formats. You can see a full guide to prompt formatting here.

Tool use is also supported through chat templates in Transformers.

Example Tool Integration:

# Define a tool
def get_current_temperature(location: str) -> float:
    """
    Get the current temperature at a location.
    
    Args:
        location: The location to get the temperature for, in the format "City, Country"
    Returns:
        The current temperature at the specified location in the specified units, as a float.
    """
    return 22.0  # A real function should retrieve actual temperature data!

# Create a chat and apply the chat template
messages = [
  {"role": "system", "content": "You are a bot that responds to weather queries."},
  {"role": "user", "content": "Hey, what's the temperature in Paris right now?"}
]

inputs = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True)

If the model generates a tool call, append it to the chat like so:

tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France"}}
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})

Then call the tool and append the result with the tool role:

messages.append({"role": "tool", "name": "get_current_temperature", "content": "22.0"})

Intended Use

Advanced Reasoning and Problem-Solving: Designed for complex logical reasoning tasks, multi-step problem-solving, and structured responses.
Educational Assistance: Useful for providing explanations, summaries, and structured responses to enhance learning experiences.
Conversational AI: Ideal for chatbots and virtual assistants requiring deep contextual understanding.
Code Generation and Debugging: Capable of assisting in writing, explaining, and improving code across multiple programming languages.
Research and Knowledge Discovery: Supports academic and general knowledge research by generating informative responses.
Tool-Assisted Responses: Equipped for function calling, data retrieval, and automation support.

Limitations

Hardware Requirements: Due to its large size, it requires high-memory GPUs or TPUs for efficient deployment.
Potential Bias: May reflect biases present in its training data, necessitating human oversight.
Lack of Real-Time Awareness: Does not have access to real-world events beyond its training data cutoff.
Creative Task Variability: Performance in highly subjective tasks such as storytelling may be inconsistent.
Error Propagation: Minor inconsistencies in early outputs can affect coherence in longer responses.
Prompt Sensitivity: The quality of generated responses depends on how well-structured the input prompts are.

prithivMLmods
/

Magellanic-Llama-70B-r999