LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
Under review, NeurIPS (Datasets and Benchmarks Track), 2023
Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala,
Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, Anima Anandkumar

@inproceedings{yang2023leandojo,
    title={{LeanDojo}: Theorem Proving with Retrieval-Augmented Language Models},
    author={Yang, Kaiyu and Swope, Aidan and Gu, Alex and Chalamala, Rahul and Song, Peiyang and Yu, Shixing and Godil, Saad and Prenger, Ryan and Anandkumar, Anima},
    booktitle={Neural Information Processing Systems (NeurIPS)},
    year={2023}
}

Please visit LeanDojo Website for details.

Input Format

The model's input consists of retrieved premises concatenated with the current proof state and truncated to 2300 UTF-8 bytes. The proof state is formatted by Lean's pretty printer, the same as the input format of our model w/o retrieval. Retrieved premises are in the form of Lean code, except that proofs are removed and fully qualified names (marked by <a>...</a>) are used. Please see the example on the right.

Downloads last month
5,221
Safetensors
Model size
300M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including kaiyuy/leandojo-lean4-retriever-tacgen-byt5-small