crumb
/

pico-gpt-j-6.7m

Text Generation

Inference Endpoints

Model card Files Files and versions Community

pico-gpt-j-6.7m / README.md

crumb's picture

Update README.md

7f0f66c over 1 year ago

|

No virus

605 Bytes

	---
	license: bigscience-bloom-rail-1.0
	datasets:
	- c4
	language:
	- en
	library_name: transformers
	tags:
	- causal-lm
	- gpt-j
	---
	6.7m (6,700,128) param GPT-J model.
	```
	n_positions - 128
	n_embd - 64
	n_layer - 4
	n_head - 8
	rotary_dim - 64
	tokenizer - gpt-j
	```

	First, trained on 4,194,304 samples from the [c4](https://hf.co/datasets/c4) dataset, at a length of 128 tokens each, that comes out to 536,870,912 (0.53B) tokens seen during training. A batch size of 16 with 128 gradient accumulation steps was used, making the effective batch size 2048. A cosine learning rate schedule was used starting at 1e-3.