Llama3-8B-ft-TuringQ
Llama3-8B-ft-TuringQ is a fine-tuned version of the meta-llama/Meta-Llama-3-8B-Instruct model on the TuringQ dataset. it is designed to enhance reasoning capabilities in theoretical computer science, particularly in the theory of computation.
Key Features:
- Specialization: Theory of computation and related concepts
- Training Approach: Combination of QLoRA, PEFT, and Supervised Fine-Tuning
- Lightweight Adapter: Easily integrates with the base model for flexible updates.
Usage
- Solving complex problems in theoretical computer science
- Assisting in educational contexts for undergraduate and graduate-level computer science courses
To use Llama3-8B-ft-TuringQ, you can load the model and make predictions using the Hugging Face transformers
library. Below is an example of how to do this:
from torch.utils.data import Dataset, DataLoader
from time import time
from datasets import load_dataset
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, PeftConfig
from transformers import (
AutoConfig,
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
TrainingArguments,
pipeline,
)
from trl import SFTTrainer, setup_chat_format
from huggingface_hub import login
from tqdm import tqdm
import pandas as pd
import torch
import os
# Load the base model
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True
)
base_model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True
)
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Set up chat format
base_model, tokenizer = setup_chat_format(base_model, tokenizer)
# Load the PEFT adapter
peft_model_id = "llm-lab/Llama3-8B-ft-TuringQ"
peft_config = PeftConfig.from_pretrained(peft_model_id)
model = PeftModel.from_pretrained(base_model, peft_model_id)
# Merge the base model and PEFT adapter
model = model.merge_and_unload()
# Set the model to evaluation mode
model.eval()
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.float16,
device_map="auto",
)
# You can change the parameters to match your specifications
def get_answer(system_message, user_message, temperature=0.8, max_length=1024):
user_message = "Question: " + user_message + " Answer:"
messages = [
{"role": "system", "content": system_message},
{"role": "user", "content": user_message},
]
prompt = pipeline.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
terminators = [
pipeline.tokenizer.eos_token_id,
pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
sequences = pipeline(
prompt,
do_sample=True,
top_p=0.9,
temperature=temperature,
eos_token_id=terminators,
max_new_tokens=max_length,
return_full_text=False,
pad_token_id=pipeline.model.config.eos_token_id
)
answer = sequences[0]['generated_text']
return answer
# Use your own prompt
system_message =
"""
You are an AI assistant designed to answer theory of computation and math questions.
Please provide a step-by-step solution to the problem.
"""
answer = query_model(system_message, question, temperature=0.1, max_length=600)
print(answer)
Question Prealgebra MATH Level 2
If each 1/2 inch on a map represents 20 miles, how many inches apart on the map are two cities whose actual distance apart is 120 miles?
Model | Answer |
---|---|
Llama3-8B-ft-TuringQ | A classic problem! Let's break it down step by step: 1. We know that each 1/2 inch on the map represents 20 miles in reality. 2. We want to find the distance between two cities on the map, which is represented by the actual distance between them in miles. 3. The actual distance between the cities is 120 miles. To find the distance between the cities on the map, we can set up a proportion: 1/2 inch (on the map) = 20 miles (in reality) We can use this proportion to find the distance between the cities on the map: Distance on the map = (Actual distance in miles) / (Miles per 1/2 inch) = 120 miles / (20 miles / 1/2 inch) To simplify, we can convert the fraction to a decimal: = 120 miles / 40 = 3 So, the distance between the two cities on the map is 3 inches. ✅ |
llama3-8b | Let's break this problem down step by step! Given: 1/2 inch on the map represents 20 miles in reality. We need to find the distance between two cities on the map, given that their actual distance apart is 120 miles. Step 1: Convert the actual distance from miles to inches. Since 1/2 inch represents 20 miles, we can set up a proportion to find the number of inches that represent 120 miles: 1/2 inch = 20 miles x inches = 120 miles We can set up the proportion as follows: (1/2 inch) / 20 miles = x inches / 120 miles Cross-multiply: x inches = (1/2 inch) × 120 miles / 20 miles x inches = 6 inches So, 6 inches on the map represent 120 miles in reality. Step 2: Find the distance between the two cities on the map. Since the cities are 120 miles apart in reality, and 6 inches on the map represent 120 miles, we can conclude that the cities are 6 inches apart on the map. Answer: The two cities are 6 inches apart on the map. ❌ |
Question Intermediate Algebra MATH Level 5
Let f: R → R be a function satisfying f(x)f(y) = f(x-y) . Find all possible values of f(2017). Enter all the possible values, separated by commas.
Model | Answer |
---|---|
Llama3-8B-ft-TuringQ | A nice problem! Let's start by analyzing the given functional equation: $$f(x)f(y)=f(x-y)$$ We can rewrite this equation as: $$f(x)f(y)=f(x)f(0)+f(y)f(0)$$ Now, we can subtract $f(x)f(0)$ from both sides to get: $$f(y)f(0)=f(x)f(0)-f(x-y)f(0)$$ Dividing both sides by $f(0)$, we obtain: $$f(y)=\frac{f(x)f(0)-f(x-y)f(0)}{f(0)}$$ Now, let's set $x=y=0$ to get: $$f(0)=f(0)^2$$ This implies that either $f(0)=0$ or $f(0)=1$. We'll consider both cases separately. Case 1: $f(0)=0$ In this case, we can set $x=0$ and $y=0$ in the original equation to get: $$f(0)f(0)=f(0)$$ This implies that $f(0)=0$, which is already given. Case 2: $f(0)=1$ In this case, we can set $x=0$ and $y=0$ in the original equation to get: $$f(0)f(0)=f(0)$$ This implies that $f(0)=1$, which is already given. Now, let's consider the case where $x\neq 0$ and $y\neq 0$. We can set $x=y=1$ in the original equation to get: $$f(1)f(1)=f(1-1)=f(0)$$ If $f(0)=0$, then $f(1)f(1)=0$, which implies that either $f(1)=0$ or $f(1)=1$. If $f(1)=0$, then $f(1-1)=f(0)=0$, which is a contradiction. Therefore, $f(1)=1$. If $f(0)=1$, then $f(1)f(1)=f(0)=1$, which implies that $f(1)=1$. Now, we can use the functional equation to find the value of $f(2017)$. We can set $x=2017$ and $y=0$ to get: $$f(2017)f(0)=f(2017)$$ If $f(0)=0$, then $f(2017)=0$. If $f(0)=1$, then $f(2017)=1$. Therefore, the possible values of $f(2017)$ are $\boxed{0,1}$. Final Answer: $\boxed{0,1}$. ✅ |
llama3-8b | Let's start by considering the case when $y=0$. Then, we have $f(x)f(0)=f(x)$, so $f(0)$ is either 0 or 1. If $f(0)=0$, then $f(x)=0$ for all $x$, so $f(2017)=0$. If $f(0)=1$, then we can consider the case when $x=y$. Then, we have $f(x)^2=f(0)=1$, so $f(x)=\pm1$ for all $x$. Therefore, $f(2017)=\boxed{-1,1}$. Note that these are the only possible values of $f(2017)$, since any other value would imply that $f(0)\neq1$. Final Answer: The final answer is -1,1. I hope it is correct.❌ |
Question Computability Theory TuringQ Level 2
INFC = {(M): M is a computer in C and L(M) contains infinitely many strings} True or false with justification: The language INFCFG is Turing decidable.
Model | Answer |
---|---|
Llama3-8B | Chain of Thoughts: 1. Analyze the question and identify core concepts: The language INFCFG is a subset of INFC, which is the set of all Turing machines that recognize languages containing infinitely many strings. We need to determine if INFCFG is Turing decidable. 2. Build a step-by-step solution approach: * We can use the Rice's Theorem, which states that any non-trivial property of Turing machines is undecidable. * INFCFG is a property of Turing machines, as it describes a specific behavior (recognizing languages with infinitely many strings). * Therefore, we can apply Rice's Theorem to conclude that INFCFG is undecidable. 3. Show work explicitly: Rice's Theorem states that any non-trivial property of Turing machines is undecidable. A property is non-trivial if it is not trivially true for all Turing machines or trivially false for all Turing machines. 4. Justification: INFCFG is a non-trivial property of Turing machines because it is not trivially true for all Turing machines. For example, a Turing machine that recognizes only the empty string would not recognize a language with infinitely many strings. Therefore, INFCFG is a non-trivial property that satisfies the conditions of Rice's Theorem. 5. Review for logical soundness and completeness: We have applied Rice's Theorem to conclude that INFCFG is undecidable. This is a well-established result in the theory of computation, and our Chain of Thoughts is logically sound and complete. Conclusion: The language INFCFG is Turing undecidable. Final Answer: False with justification: INFCFG is Turing undecidable due to Rice's Theorem. ❌ |
Gemma-7B | Chain of Thoughts: 1. Core concepts: INFC, Turing decidability, computer in C, L(M) containing infinitely many strings. 2. Solution: - The language INFC describes computers in C that can generate infinitely many strings. - If a computer can generate infinitely many strings, then its language is not finite. - Therefore, INFC is not a finite language. - A finite language is Turing decidable. - Hence, INFC is not Turing decidable. False INFC is not Turing decidable because it is not a finite language.❌ |
Mistral-7B | The language INFCFG is not Turing decidable. Chain of Thought: 1. Identify core concepts: The question involves the language INFC, which is the set of all computers in the class C that accept infinitely many strings. INFCFG is the language generated by a deterministic context-free grammar (CFG) that represents the same language as INFC. 2. Understand the problem: We need to determine if the language INFCFG is Turing decidable. In other words, we are asked if there exists an algorithm that can determine, for any given deterministic context-free grammar, whether or not it generates a language containing infinitely many strings. 3. State assumptions: We assume that the given deterministic context-free grammar is finite and non-ambiguous. 4. Propose a solution approach: To prove that INFCFG is not Turing decidable, we can use the Rice's theorem, which states that for any nontrivial property P of a deterministic finite automaton (DFA), the language {M : M is a DFA and L(M) has property P} is undecidable. Since INFC is a nontrivial property of a Turing machine, we can apply Rice's theorem to the corresponding language of deterministic CFGs to conclude that INFCFG is undecidable. 5. Justify the answer: Since INFCFG is undecidable, it is not Turing decidable, meaning there does not exist an algorithm that can determine, for any given deterministic context-free grammar, whether or not it generates a language containing infinitely many strings. False ❌ |
Llama3-8B-ft-TuringQ | The statement "The language INFCFG is Turing decidable" is TRUE with justification as follows: First, let's analyze the language INFCFG. It is defined as the infinite language INFC generated by the context-free grammar (M) in C, where L(M) contains infinitely many strings. This implies that INFCFG is a recursively enumerable language. Now, consider the Turing decidability of INFCFG. Since INFCFG is recursively enumerable, it satisfies the Turing decidability criterion: there exists a Turing machine (TM) that can decide whether a given string is in INFCFG or not. To build a TM that decides INFCFG, we can use a TM that simulates the computation of a TM that generates INFCFG. This is possible because INFCFG is a recursively enumerable language, which means that we can always determine the existence of a string in INFCFG by running a TM that generates INFCFG for a finite amount of time. In particular, we can construct a TM that decides INFCFG by: 1. Reading a string from the input tape. 2. Simulating the computation of a TM that generates INFCFG until it halts or runs indefinitely. 3. Accepting the string if the simulation halts, and rejecting it if the simulation runs indefinitely. This TM is Turing decidable because it can decide whether a given string is in INFCFG or not. Therefore, the statement "The language INFCFG is Turing decidable" is TRUE. ✅ |
GPT-4 | False. Chain of Thoughts: 1. The question is about the decidability of a language, specifically INFCFG. INFCFG is the set of all context-free grammars (CFGs) that generate an infinite number of strings. 2. A language is Turing decidable if there exists a Turing machine that accepts all and only the strings in the language and halts on all inputs. In other words, we need a Turing machine that can decide whether a given CFG generates an infinite number of strings. 3. To decide whether a CFG generates an infinite number of strings, we would need to simulate all possible derivations from the start symbol. However, this is not possible because there are infinitely many derivations to consider, even for a finite CFG. 4. Therefore, there is no Turing machine that can decide INFCFG, and INFCFG is not Turing decidable. 5. Reviewing this chain of thoughts, it is logically sound and complete. The key concept is the definition of Turing decidability, and the key step is realizing that there are infinitely many derivations to consider for a CFG. ❌ |
Limitations
- Specialized for theoretical computer science; may not generalize well to other domains
- Performance may vary depending on the complexity and specificity of the queries
Training Hyperparameters
PEFT Configuration**:
- lora_alpha: 64
- lora_dropout: 0.05
- r: 4
- bias: "none"
- task_type: "CAUSAL_LM"
- target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
Training Arguments**:
- evaluation_strategy: "steps"
- optim: "paged_adamw_8bit"
- per_device_train_batch_size: 4
- gradient_accumulation_steps: 2
- per_device_eval_batch_size: 4
- learning_rate: 5e-6
- eval_steps: 500
- max_steps: 4000
- num_train_epochs: 3
- warmup_steps: 100
- lr_scheduler_type: "cosine"
- weight_decay: 0.01
- fp16: True
- metric_for_best_model: "eval_loss"
Evaluation:
Score Distribution Across Models on the Test Split of the TuringQ Dataset assigned by LLM evaluator:
Human Evaluation of Llama3-8B vs. Llama3-8B-ft on MATH Test:
Score | Llama3-8B | Llama3-8B-ft-TuringQ |
---|---|---|
1 | 47.20% | 44.40% |
2 | 15.20% | 17.40% |
3 | 4.20% | 4.60% |
4 | 33.40% | 33.60% |
- Downloads last month
- 0
Model tree for llm-lab/Llama3-8B-ft-TuringQ
Base model
meta-llama/Meta-Llama-3-8B-Instruct