metadata

license: bigcode-openrail-m
pipeline_tag: text-generation
library_name: peft
base_model: mistralai/Mistral-7B-v0.1
widget:
  - example_title: 'Proof: n = n + 0'
    text: 'Lemma plus_n_O : forall n:nat, n = n + 0.'
  - example_title: 'Proof: 7 + 3 = 10'
    text: 'Lemma ex1: 7 + 3 = 10.'
  - example_title: Wrong proposition
    text: 'Lemma mult_S : forall m n : nat, S (m * n) = m * n + n.'

Model Card for CoqLLM-FineTuned-Experiment-Gen0

This model is an experiment in the realm of formal theorem proving, specifically tailored for generating and interpreting Coq code. By leveraging a comprehensive dataset derived from over 10,000 Coq source files, CoqLLM-FineTuned-Experiment-Gen0 exhibits an enhanced proficiency in understanding the syntax and semantics unique to Coq, thereby facilitating significant strides in automated theorem proving.

Model Details

Model Description

Developed by: Andreas Florath
Model type: Fine-tuned Large Language Model
Finetuned from model: Mistral-7b (mistralai/Mistral-7B-v0.1)
Language(s) (NLP): Coq (only)
License: bigcode-openrail-m

Model Sources

Paper: Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code
Dataset: florath/coq-facts-props-proofs-gen0-v1

Uses

Prompt Format

No special prompt format needed. The model was fine-tuned with Coq source code. Just providing the proposal let's the model generate a proof, like:

Lemma plus_n_O : forall n:nat, n = n + 0.

No special characters or delimiters are needed.

Direct Use

CoqLLM-FineTuned-Experiment-Gen0 is an experiment to show the usefulness of the dataset used for fine-tuning the model. The model might be used to check if short proofs can be automatically generated. Another possible use-case is to curate the existing Coq source code and curate and generate new Coq source code.

Out-of-Scope Use

The model is not intended for general-purpose language tasks outside the domain of theorem proving and formal verification. Misuse includes but is not limited to non-Coq programming tasks, natural language processing outside technical documentation, or any form of deployment in critical systems without adequate supervision and validation.

Bias, Risks, and Limitations

The model inherits biases from its base mode, training data, potentially reflecting the diversity or lack thereof in the collected Coq files. Users should be wary of these limitations, particularly when applying the model in new or underserved areas of theorem proving.

Recommendations

To mitigate risks and biases, it's recommended to supplement model use with human oversight or an environment where the generated Coq source code can be automatically verified. Continuous monitoring for unexpected behaviors or outputs is advised, alongside efforts to diversify and expand the training dataset to cover a broader spectrum of Coq use cases.

How to Get Started with the Model

Here is a code snippet using the fine-tuned model. The shown setup should work using GPUs with <= 24GByte RAM. You might want to adapt and experiment with different settings, like different temperatures.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import sys
from peft import PeftModel

base_model_id = "mistralai/Mistral-7B-v0.1"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16
)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id, quantization_config=bnb_config,
    device_map="auto", trust_remote_code=True,
)

eval_tokenizer = AutoTokenizer.from_pretrained(
    base_model_id, add_bos_token=True, trust_remote_code=True)

ft_model = PeftModel.from_pretrained(
    base_model, "florath/CoqLLM-FineTuned-Experiment-Gen0")

eval_prompt = "Lemma plus_n_O : forall n:nat, n = n + 0."

model_input = eval_tokenizer(eval_prompt, return_tensors="pt").to("cuda")

ft_model.eval()
with torch.no_grad():

    for idx in range(10):
    
        res = eval_tokenizer.decode(
            ft_model.generate(
                **model_input,
                max_new_tokens=50,
                do_sample=True,
                temperature=0.4,
                pad_token_id=eval_tokenizer.eos_token_id,
                repetition_penalty=1.15)[0], skip_special_tokens=False)
                
        print("Result [%2d] [%s]" % (idx, res))

Training Details

The model was fine-tuned with the florath/coq-facts-props-proofs-gen0-v1 dataset. Only entries with permissive licenses were used during the fine-tuning process.

Cite

@misc{florath2024enhancing,
      title={Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code}, 
      author={Andreas Florath},
      year={2024},
      eprint={2403.12627},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}