apcl
/

Jam_sojm

Jam_sojm is a GPT2-like model for research in fine-grained Java analysis. It is intended for fine-grained analysis of Java source code at the level of methods, statements, and variables, as a foundation for downstream tasks like code completion, comment generation, and automated bug repair.


Jam_sojm Training Details

  • We trained the jam_sojm model using the training procedures from Daniel Grittner's NanoGPT-LoRA

  • The datasets used to train our model are our own datasets so13m dataset and jm52m dataset.

  • First we train the model on so13m training set for 1 epoch, roughly 300,000 training iterations.

  • We reset the learning rate and weight decay, then train it again on the jm52mm training set for 1 more epoch, roughly 300,000 more training iterations for a total of 600,000 iterations.

  • Our GitHub repo contains the code for re-training using the raw data.

Hyperparameter Description Value
e embedding dimensions 1024
L number of layers 24
h attention heads 16
c block size / context length 256
b batch size 4
a accumulation steps 32
d dropout 0.20
r learning rate 3e-5
y weight decay 1e-1

We train our models using a single NVidia A5000 GPUs.


Jam Projects

Current projects using the jam_sojm pre-trained model can be found at our Github repository:

https://github.com/apcl-research/jam

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Datasets used to train apcl/jam_sojm