--- tags: - deepsparse --- ## Usage ```python from deepsparse import TextGeneration prompt = "How to get in a good university?" formatted_prompt = f"[INST]{prompt}[/INST]" model = TextGeneration(model_path="hf:neuralmagic/Yi-6B-Llama-50-quant") print(model(formatted_prompt, max_new_tokens=200).generations[0].text) """ Getting into a good university is a complex process that involves factors such as academic performance, financial aid, and personal qualifications. Here are some steps you can follow to get in a good university: 1. Academic performance: - Look for a university that has a strong academic program, including a well-rounded curriculum that covers a wide range of subjects. - Check if the university offers a clear curriculum that includes a clear sequence of courses. - Check if the university offers a clear pathway to graduation, including clear dates and deadlines. 2. Financial aid: - Look for a university that offers financial aid, such as scholarships, grants, or loans. - Check if the university offers financial aid that fits your budget. - Consider the university's financial aid package, including the cost of tuition, room and board, and other expenses. """ ``` ## One-shot and Export ``` git clone https://github.com/neuralmagic/sparseml pip install -e "sparseml[transformers]" "torch<2" python sparseml/src/sparseml/transformers/sparsification/obcq/obcq.py chargoddard/Yi-6B-Llama open_platypus --recipe sparseml/src/sparseml/transformers/sparsification/obcq/example_llama.yaml --precision float16 --save True --device cuda cd sparseml git checkout update/onnx_export/duplicate python src/sparseml/transformers/sparsification/obcq/export.py --task text-generation --model_path /root/obcq_deployment cp deployment/model.onnx deployment/model-orig.onnx python onnx_kv_inject.py --input-file deployment/model-orig.onnx --output-file deployment/model.onnx ``` `recipe.yaml` ``` test_stage: obcq_modifiers: SmoothQuantModifier: smoothing_strength: 0.5 mappings: [ [["re:.*q_proj", "re:.*k_proj", "re:.*v_proj"], "re:.*input_layernorm"], [["re:.*gate_proj", "re:.*up_proj"], "re:.*post_attention_layernorm"] ] QuantizationModifier: ignore: - LlamaRotaryEmbedding - LlamaRMSNorm - SiLUActivation - model.layers.0.mlp.down_proj - model.layers.1.mlp.down_proj - model.layers.2.mlp.down_proj - model.layers.3.mlp.down_proj - model.layers.4.mlp.down_proj - model.layers.5.mlp.down_proj - model.layers.6.mlp.down_proj - model.layers.7.mlp.down_proj - model.layers.8.mlp.down_proj - model.layers.9.mlp.down_proj - model.layers.10.mlp.down_proj - model.layers.11.mlp.down_proj - model.layers.12.mlp.down_proj - model.layers.13.mlp.down_proj - model.layers.14.mlp.down_proj - model.layers.15.mlp.down_proj - model.layers.16.mlp.down_proj - model.layers.17.mlp.down_proj - model.layers.18.mlp.down_proj - model.layers.19.mlp.down_proj - model.layers.20.mlp.down_proj - model.layers.21.mlp.down_proj - model.layers.22.mlp.down_proj - model.layers.23.mlp.down_proj - model.layers.24.mlp.down_proj - model.layers.25.mlp.down_proj - model.layers.26.mlp.down_proj - model.layers.27.mlp.down_proj - model.layers.28.mlp.down_proj - model.layers.29.mlp.down_proj - model.layers.30.mlp.down_proj - model.layers.31.mlp.down_proj post_oneshot_calibration: True scheme_overrides: Embedding: input_activations: null weights: num_bits: 8 symmetric: False SparseGPTModifier: sparsity: 0.5 block_size: 128 sequential_update: False quantize: True percdamp: 0.01 mask_structure: "0:0" targets: [ "model.layers.0", "model.layers.1", "model.layers.2", "model.layers.3", "model.layers.4", "model.layers.5", "model.layers.6", "model.layers.7", "model.layers.8", "model.layers.9", "model.layers.10", "model.layers.11", "model.layers.12", "model.layers.13", "model.layers.14", "model.layers.15", "model.layers.16", "model.layers.17", "model.layers.18", "model.layers.19", "model.layers.20", "model.layers.21", "model.layers.22", "model.layers.23", "model.layers.24", "model.layers.25", "model.layers.26", "model.layers.27", "model.layers.28", "model.layers.29", "model.layers.30", "model.layers.31", ] ```