PLDR-LLM-v51G-106M-2

Model Description

PLDR-LLM-v51G-106M-2 is a large language model from power law decoder representations with KV-cache and G-cache support, which is a new foundational language model architecture that utilizes power law graph attention to generate deductive and inductive outputs. This model has a parameter size of 106M. It refers to PLDRv51G-106M-2 whose architecture and training details are provided in Table 1 of the research paper titled PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference.

Training data

PLDR-LLM-v51G-106M-2 was pretrained on the RefinedWeb, a publicly available English web dataset with extensive filtering and deduplication.

Training procedure

This model was trained for ~8B tokens on RefinedWeb over 250k steps per rank. It was trained autoregressively with cross-entropy loss.

Intended Use and Limitations

This model is intended to be used for research purposes. Given text as input prompt, it carries out next token prediction to generate continuation text. The context length for this model is 1024 tokens.

How to use

  • The model checkpoint and tokenizer can be loaded into the PLDR-LLM framework to generate text as described in the code repository for training this model: PLDR-LLM-with-KVG-cache.

LM Evaluation Harness Support

Limitations and Biases

Large Language Models may generate text that is profane, lewd, socially unacceptable or offensive based on the contents of the dataset it was pretrained. RefinedWeb is a dataset that is as toxic and biased as the Pile. Please see the papers for RefinedWeb and the Pile for more information. Moreover, large language models are also susceptible to hallucinations and may generate text that contains incorrect, irrelevant or misleading information. Since it is very hard to expect the contents of generated text ahead of time, the output of the large language models need to be heavily moderated and curated to avoid undesired content to appear without warning.

Eval results

The evaluation results on benchmarks with zero-shot setting and their comparison to LLM models of similar size reported in the literature can be found in Tables 3-5 and 7 of the research paper.

BibTeX entry and citation info

Please cite this model as:

@misc{gokden2025pldrllmkvgcache,
      title={PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference}, 
      author={Burc Gokden},
      year={2025},
      eprint={2502.13502},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.13502}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train fromthesky/PLDR-LLM-v51G-106M-2