Commits · Dovakiins/qwerrwe

fix save_steps so it doesn't get duplicated (#567)

3fbde76
unverified

winglian commited on Sep 14, 2023

let hf trainer handle torch compile (#516)

a4e1bb6
unverified

winglian

tmm1 commited on Sep 13, 2023

improve how we setup eval/save strategies and steps (#547)

36e53c7
unverified

winglian commited on Sep 13, 2023

add optimization for group-by-len (#563)

e5bb22a
unverified

winglian commited on Sep 13, 2023

Add training callback to send predictions to WandB table (#521)

5b67ea9
unverified

Glavin001 commited on Sep 13, 2023

Early stopping metric (#537)

e30f1e3
unverified

winglian commited on Sep 8, 2023

misc fixes/improvements (#513)

a546ca2
unverified

winglian commited on Sep 5, 2023

Add support for GPTQ using native transformers/peft (#468)

3355706
unverified

winglian commited on Sep 5, 2023

log supervised token count (#448)

7710e81
unverified

winglian commited on Aug 31, 2023

Added advanced DDP args (#515)

396a7a7
unverified

Jan Philipp Harries Jan Philipp Harries commited on Aug 31, 2023

drop empty tokenized rows too (#509)

c56b450
unverified

winglian commited on Aug 30, 2023

add eval benchmark callback (#441)

7657632
unverified

winglian commited on Aug 29, 2023

use math.ceil instead of round /cc #498

fd55bc8

tmm1 commited on Aug 29, 2023

pad_to_worst_case_seq_len boolean, for testing memory limits (#498)

8e197f6
unverified

Birch-san

tmm1 commited on Aug 28, 2023

let transformers handle adamw_bnb_8bit

868530c

tmm1 commited on Aug 26, 2023

ReLoRA implementation (with quantization) (#322)

bde3c5a
unverified

chargoddard

winglian commited on Aug 24, 2023

always drop samples that are too long (#452)

50682a3
unverified

winglian commited on Aug 21, 2023

set env var for FSDP layer to wrap (#453)

5a1985b
unverified

winglian commited on Aug 21, 2023

add missing positional arg (#450)

58cf7e7
unverified

winglian commited on Aug 21, 2023

fix evals (#447)

ee26281
unverified

winglian commited on Aug 21, 2023

disable eval using multipack for now (#437)

f733d0f
unverified

winglian commited on Aug 19, 2023

fix comma, not a tuple (#436)

008505c
unverified

winglian commited on Aug 19, 2023

use save_strategy from config if available (#434)

b3f5e00
unverified

winglian commited on Aug 19, 2023

set env for FSDP offload params (#433)

5247c50
unverified

winglian commited on Aug 19, 2023

Fix(config): Update handling of deepspeed config (#404)

c01015f
unverified

Nanobit commited on Aug 15, 2023

fix eval steps and strategy (#403)

da10af0
unverified

winglian commited on Aug 15, 2023

Feat(config): add max steps (#387)

3c2ad00
unverified

ittailup commited on Aug 14, 2023

Added "epoch" evaluation_strategy (#388)

5d48a10
unverified

flotos commited on Aug 14, 2023

Feat(config): Add hub_strategy (#386)

73a0b6e
unverified

Nanobit commited on Aug 14, 2023

improve GPU logging to break out pytorch cache and system mem

7b55fe6

tmm1 commited on Aug 13, 2023

Attention mask and position id fixes for packing (#285)

2bb0b78
unverified

winglian commited on Aug 12, 2023

log GPU memory usage

e303d64

tmm1 commited on Aug 9, 2023

fix axolotl training args dataclass annotation

ebaec3c

winglian commited on Jul 17, 2023

Merge branch 'OpenAccess-AI-Collective:main' into logging_enhancement

83237b8
unverified

The Objective Dad commited on Jul 15, 2023

Merge pull request #274 from OpenAccess-AI-Collective/NanoCode012-patch-2

168a7a0
unverified

Nanobit commited on Jul 14, 2023

Adding logging enhancement

553a86b

theobjectivedad commited on Jul 14, 2023

Feat: Add save_safetensors

5491278

Nanobit commited on Jul 14, 2023

Set push to hub as private by default

1514739
unverified

Nanobit commited on Jul 14, 2023

Merge branch 'main' into quadratic-warmup

c4cf567
unverified

winglian commited on Jul 10, 2023

better configuration for quadratic warmup

c49729d

winglian commited on Jul 10, 2023

Fix future deprecation push_to_hub_model_id

e79c8e6

Nanobit commited on Jul 3, 2023

push intermediate model checkpoints to hub

612aabd

winglian commited on Jun 27, 2023

support adamw and grad norm hyperparams

6d0ee4b

winglian commited on Jun 15, 2023

add axolotl trainer and quadratic warmup

7dc580b

winglian commited on Jun 12, 2023

Merge branch 'main' into flash-optimum

fd2c981
unverified

winglian commited on Jun 12, 2023

Fix set mem_id for inference and refactor

974dc00

Nanobit commited on Jun 11, 2023

fix formatting

958da70

winglian commited on Jun 10, 2023

address PR feedback

0c6f928

winglian commited on Jun 10, 2023

fix bettertransformers save, force it to skip after saving correctly in callback

1a82082

winglian commited on Jun 1, 2023

more tweaks to do pre-training with bettertransformers

1210dc8

winglian commited on Jun 1, 2023

Commit History

fix save_steps so it doesn't get duplicated (#567) 3fbde76 unverified

let hf trainer handle torch compile (#516) a4e1bb6 unverified

improve how we setup eval/save strategies and steps (#547) 36e53c7 unverified

add optimization for group-by-len (#563) e5bb22a unverified

Add training callback to send predictions to WandB table (#521) 5b67ea9 unverified

Early stopping metric (#537) e30f1e3 unverified

misc fixes/improvements (#513) a546ca2 unverified

Add support for GPTQ using native transformers/peft (#468) 3355706 unverified

log supervised token count (#448) 7710e81 unverified

Added advanced DDP args (#515) 396a7a7 unverified

drop empty tokenized rows too (#509) c56b450 unverified

add eval benchmark callback (#441) 7657632 unverified

use math.ceil instead of round /cc #498 fd55bc8

pad_to_worst_case_seq_len boolean, for testing memory limits (#498) 8e197f6 unverified

let transformers handle adamw_bnb_8bit 868530c

ReLoRA implementation (with quantization) (#322) bde3c5a unverified

always drop samples that are too long (#452) 50682a3 unverified

set env var for FSDP layer to wrap (#453) 5a1985b unverified

add missing positional arg (#450) 58cf7e7 unverified

fix evals (#447) ee26281 unverified

disable eval using multipack for now (#437) f733d0f unverified

fix comma, not a tuple (#436) 008505c unverified

use save_strategy from config if available (#434) b3f5e00 unverified

set env for FSDP offload params (#433) 5247c50 unverified

Fix(config): Update handling of deepspeed config (#404) c01015f unverified

fix eval steps and strategy (#403) da10af0 unverified

Feat(config): add max steps (#387) 3c2ad00 unverified

Added "epoch" evaluation_strategy (#388) 5d48a10 unverified

Feat(config): Add hub_strategy (#386) 73a0b6e unverified

improve GPU logging to break out pytorch cache and system mem 7b55fe6

Attention mask and position id fixes for packing (#285) 2bb0b78 unverified

log GPU memory usage e303d64

fix axolotl training args dataclass annotation ebaec3c

Merge branch 'OpenAccess-AI-Collective:main' into logging_enhancement 83237b8 unverified

Merge pull request #274 from OpenAccess-AI-Collective/NanoCode012-patch-2 168a7a0 unverified

Adding logging enhancement 553a86b

Feat: Add save_safetensors 5491278

Set push to hub as private by default 1514739 unverified

Merge branch 'main' into quadratic-warmup c4cf567 unverified

better configuration for quadratic warmup c49729d

Fix future deprecation push_to_hub_model_id e79c8e6

push intermediate model checkpoints to hub 612aabd

support adamw and grad norm hyperparams 6d0ee4b

add axolotl trainer and quadratic warmup 7dc580b

Merge branch 'main' into flash-optimum fd2c981 unverified

Fix set mem_id for inference and refactor 974dc00

fix formatting 958da70

address PR feedback 0c6f928

fix bettertransformers save, force it to skip after saving correctly in callback 1a82082

more tweaks to do pre-training with bettertransformers 1210dc8

fix save_steps so it doesn't get duplicated (#567)

3fbde76
unverified

let hf trainer handle torch compile (#516)

a4e1bb6
unverified

improve how we setup eval/save strategies and steps (#547)

36e53c7
unverified

add optimization for group-by-len (#563)

e5bb22a
unverified

Add training callback to send predictions to WandB table (#521)

5b67ea9
unverified

Early stopping metric (#537)

e30f1e3
unverified

misc fixes/improvements (#513)

a546ca2
unverified

Add support for GPTQ using native transformers/peft (#468)

3355706
unverified

log supervised token count (#448)

7710e81
unverified

Added advanced DDP args (#515)

396a7a7
unverified

drop empty tokenized rows too (#509)

c56b450
unverified

add eval benchmark callback (#441)

7657632
unverified

use math.ceil instead of round /cc #498

fd55bc8

pad_to_worst_case_seq_len boolean, for testing memory limits (#498)

8e197f6
unverified

let transformers handle adamw_bnb_8bit

868530c

ReLoRA implementation (with quantization) (#322)

bde3c5a
unverified

always drop samples that are too long (#452)

50682a3
unverified

set env var for FSDP layer to wrap (#453)

5a1985b
unverified

add missing positional arg (#450)

58cf7e7
unverified

fix evals (#447)

ee26281
unverified

disable eval using multipack for now (#437)

f733d0f
unverified

fix comma, not a tuple (#436)

008505c
unverified

use save_strategy from config if available (#434)

b3f5e00
unverified

set env for FSDP offload params (#433)

5247c50
unverified

Fix(config): Update handling of deepspeed config (#404)

c01015f
unverified

fix eval steps and strategy (#403)

da10af0
unverified

Feat(config): add max steps (#387)

3c2ad00
unverified

Added "epoch" evaluation_strategy (#388)

5d48a10
unverified

Feat(config): Add hub_strategy (#386)

73a0b6e
unverified

improve GPU logging to break out pytorch cache and system mem

7b55fe6

Attention mask and position id fixes for packing (#285)

2bb0b78
unverified

log GPU memory usage

e303d64

fix axolotl training args dataclass annotation

ebaec3c

Merge branch 'OpenAccess-AI-Collective:main' into logging_enhancement

83237b8
unverified

Merge pull request #274 from OpenAccess-AI-Collective/NanoCode012-patch-2

168a7a0
unverified

Adding logging enhancement

553a86b

Feat: Add save_safetensors

5491278

Set push to hub as private by default

1514739
unverified

Merge branch 'main' into quadratic-warmup

c4cf567
unverified

better configuration for quadratic warmup

c49729d

Fix future deprecation push_to_hub_model_id

e79c8e6

push intermediate model checkpoints to hub

612aabd

support adamw and grad norm hyperparams

6d0ee4b

add axolotl trainer and quadratic warmup

7dc580b

Merge branch 'main' into flash-optimum

fd2c981
unverified

Fix set mem_id for inference and refactor

974dc00

fix formatting

958da70

address PR feedback

0c6f928

fix bettertransformers save, force it to skip after saving correctly in callback

1a82082

more tweaks to do pre-training with bettertransformers

1210dc8