license: bigcode-openrail-m
pipeline_tag: text-generation
library_name: peft
base_model: mistralai/Mistral-7B-v0.1
widget:
- example_title: 'Proof: n = n + 0'
text: 'Lemma plus_n_O : forall n:nat, n = n + 0.'
- example_title: 'Proof: 7 + 3 = 10'
text: 'Lemma ex1: 7 + 3 = 10.'
- example_title: Wrong proposition
text: 'Lemma mult_S : forall m n : nat, S (m * n) = m * n + n.'
Model Card for CoqLLM-FineTuned-Experiment-Gen0
This model is an experiment in the realm of formal theorem proving, specifically tailored for generating and interpreting Coq code. By leveraging a comprehensive dataset derived from over 10,000 Coq source files, CoqLLM-FineTuned-Experiment-Gen0 exhibits an enhanced proficiency in understanding the syntax and semantics unique to Coq, thereby facilitating significant strides in automated theorem proving.
Model Details
Model Description
- Developed by: Andreas Florath
- Model type: Fine-tuned Large Language Model
- Finetuned from model: Mistral-7b (
mistralai/Mistral-7B-v0.1
) - Language(s) (NLP): Coq (only)
- License: bigcode-openrail-m
Model Sources
- Paper: Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code
- Dataset: florath/coq-facts-props-proofs-gen0-v1
Uses
Prompt Format
No special prompt format needed. The model was fine-tuned with Coq source code. Just providing the proposal let's the model generate a proof, like:
Lemma plus_n_O : forall n:nat, n = n + 0.
No special characters or delimiters are needed.
Direct Use
CoqLLM-FineTuned-Experiment-Gen0 is an experiment to show the usefulness of the dataset used for fine-tuning the model. The model might be used to check if short proofs can be automatically generated. Another possible use-case is to curate the existing Coq source code and curate and generate new Coq source code.
Out-of-Scope Use
The model is not intended for general-purpose language tasks outside the domain of theorem proving and formal verification. Misuse includes but is not limited to non-Coq programming tasks, natural language processing outside technical documentation, or any form of deployment in critical systems without adequate supervision and validation.
Bias, Risks, and Limitations
The model inherits biases from its base mode, training data, potentially reflecting the diversity or lack thereof in the collected Coq files. Users should be wary of these limitations, particularly when applying the model in new or underserved areas of theorem proving.
Recommendations
To mitigate risks and biases, it's recommended to supplement model use with human oversight or an environment where the generated Coq source code can be automatically verified. Continuous monitoring for unexpected behaviors or outputs is advised, alongside efforts to diversify and expand the training dataset to cover a broader spectrum of Coq use cases.
How to Get Started with the Model
Here is a code snippet using the fine-tuned model. The shown setup should work using GPUs with <= 24GByte RAM. You might want to adapt and experiment with different settings, like different temperatures.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import sys
from peft import PeftModel
base_model_id = "mistralai/Mistral-7B-v0.1"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True, bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16
)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id, quantization_config=bnb_config,
device_map="auto", trust_remote_code=True,
)
eval_tokenizer = AutoTokenizer.from_pretrained(
base_model_id, add_bos_token=True, trust_remote_code=True)
ft_model = PeftModel.from_pretrained(
base_model, "florath/CoqLLM-FineTuned-Experiment-Gen0")
eval_prompt = "Lemma plus_n_O : forall n:nat, n = n + 0."
model_input = eval_tokenizer(eval_prompt, return_tensors="pt").to("cuda")
ft_model.eval()
with torch.no_grad():
for idx in range(10):
res = eval_tokenizer.decode(
ft_model.generate(
**model_input,
max_new_tokens=50,
do_sample=True,
temperature=0.4,
pad_token_id=eval_tokenizer.eos_token_id,
repetition_penalty=1.15)[0], skip_special_tokens=False)
print("Result [%2d] [%s]" % (idx, res))
Training Details
The model was fine-tuned with the florath/coq-facts-props-proofs-gen0-v1 dataset. Only entries with permissive licenses were used during the fine-tuning process.
Cite
@misc{florath2024enhancing,
title={Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code},
author={Andreas Florath},
year={2024},
eprint={2403.12627},
archivePrefix={arXiv},
primaryClass={cs.AI}
}