JoseRFJunior/TransNAR https://github.com/JoseRFJuniorLLMs/TransNAR https://arxiv.org/html/2406.09308v1 TransNAR hybrid architecture. Similar to Alayrac et al, we interleave existing Transformer layers with gated cross-attention layers which enable information to flow from the NAR to the Transformer. We generate queries from tokens while we obtain keys and values from nodes and edges of the graph. The node and edge embeddings are obtained by running the NAR on the graph version of the reasoning task to be solved. When experimenting with pre-trained Transformers, we initially close the cross-attention gate, in order to fully preserve the language model’s internal knowledge at the beginning of training.