Triangle104
/

EVA-Qwen2.5-14B-v0.2-Q5_K_M-GGUF

@@ -27,79 +27,87 @@ This model was converted to GGUF format from [`EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2`
 Refer to the [original model card](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2) for more details on the model.
 ---
-- model.layers.33.self_attn.v_proj
-- model.layers.42.self_attn.v_proj
-- model.layers.29.self_attn.v_proj
-- model.layers.9.self_attn.v_proj
-- model.layers.14.self_attn.v_proj
-- model.layers.35.self_attn.v_proj
-- model.layers.38.self_attn.v_proj
-- model.layers.13.self_attn.v_proj
-- model.layers.30.self_attn.v_proj
-- model.layers.34.self_attn.v_proj
-- model.layers.5.self_attn.v_proj
-- model.layers.28.self_attn.v_proj
-- model.layers.37.self_attn.v_proj
-- model.layers.27.self_attn.v_proj
-- model.layers.11.self_attn.v_proj
-wandb_project: EVA-Qwen2.5-14B-SFFT-v0.2
-wandb_entity:
-wandb_watch:
-wandb_name: Unit-02
-wandb_log_model:
-gradient_accumulation_steps: 8
-micro_batch_size: 2
-num_epochs: 3
-optimizer: paged_ademamix_8bit
-lr_scheduler: cosine
-learning_rate: 0.00005
-max_grad_norm: 3
-train_on_inputs: false
-group_by_length: false
-bf16: auto
-fp16:
-tf32: false
-gradient_checkpointing: "unsloth"
-# gradient_checkpointing_kwargs:
-#   use_reentrant: true
-early_stopping_patience:
-resume_from_checkpoint:
-local_rank:
-logging_steps: 1
-xformers_attention:
-flash_attention: true
-warmup_steps: 20
-evals_per_epoch: 4
-saves_per_epoch: 4
-save_safetensors: true
-hub_model_id:
-hub_strategy:
-debug:
-deepspeed: deepspeed_configs/zero3_bf16.json
-weight_decay: 0.1
-# fsdp:
-#   - full_shard
-#   - auto_wrap
-# fsdp_config:
-#   fsdp_limit_all_gathers: true
-#   fsdp_sync_module_states: false
-#   fsdp_offload_params: true
-#   fsdp_cpu_ram_efficient_loading: true
-#   fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
-#   fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
-#   fsdp_activation_checkpointing: true
-#   fsdp_state_dict_type: SHARDED_STATE_DICT  # Changed from FULL_STATE_DICT
-#   fsdp_sharding_strategy: FULL_SHARD
-#   fsdp_forward_prefetch: false  # Added
-#   fsdp_backward_prefetch: "BACKWARD_PRE"  # Added
-#   fsdp_backward_prefetch_limit: 1  # Added
-#   fsdp_mixed_precision: BF16  # Added
 ---
 ## Use with llama.cpp

 Refer to the [original model card](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2) for more details on the model.
 ---
+Model details:
+-
+  A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-14B on mixture of synthetic and natural data.
+  It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve
+versatility, creativity and "flavor" of the resulting model.
+Version notes for 0.2: Now using the refined dataset from 32B
+0.2. Major improvements in coherence, instruction following and
+long-context comprehension over 14B v0.1.
+Prompt format is ChatML.
+Recommended sampler values:
+Temperature: 0.8
+Min-P: 0.05
+Top-A: 0.3
+Repetition Penalty: 1.03
+Recommended SillyTavern presets (via CalamitousFelicitousness):
+Context
+Instruct and System Prompt
+    Training data:
+Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's card for details.
+Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.
+A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe
+A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe
+Synthstruct and SynthRP datasets by Epiculous
+A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.
+     Training time and hardware:
+ 3 hours on 8xH100 SXM, provided by FeatherlessAI
+Model was created by Kearm, Auri and Cahvay.
+Special thanks:
+to Cahvay for his work on investigating and reprocessing the
+corrupted dataset, removing the single biggest source of data poisoning.
+to FeatherlessAI for generously providing 8xH100 SXM node for training of this model
+to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CognitiveComputations for the data
+and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.
 ---
 ## Use with llama.cpp