--- library_name: transformers license: llama3 language: - ar - en pipeline_tag: text-generation model_name: Arabic ORPO 8B chat model_type: llama3 quantized_by: MohamedRashad --- # The AWQ version This is the AWQ version of [MohamedRashad/Arabic-Orpo-Llama-3-8B-Instruct](https://huggingface.co/MohamedRashad/Arabic-Orpo-Llama-3-8B-Instruct) for the enthusiasts
## How to use, you ask ? First, Update your packages ```shell pip3 install --upgrade autoawq transformers ``` Now, Copy and Run ```python from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer model_name_or_path = "MohamedRashad/Arabic-Orpo-Llama-3-8B-Instruct-AWQ" tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) model = AutoModelForCausalLM.from_pretrained( model_name_or_path, attn_implementation="flash_attention_2", # disable if you have problems with flash attention 2 torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, device_map="auto" ) # Using the text streamer to stream output one token at a time streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) messages = [ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, {"role": "user", "content": "مرحبا"}, ] input_ids = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to(model.device) terminators = [ tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>") ] generation_params = { "do_sample": True, "temperature": 0.6, "top_p": 0.9, "top_k": 40, "max_new_tokens": 1024, "eos_token_id": terminators, } # Generate streamed output, visible one token at a time generation_output = model.generate( tokens, streamer=streamer, **generation_params ) # Generation without a streamer, which will include the prompt in the output generation_output = model.generate( tokens, **generation_params ) # Get the tokens from the output, decode them, print them token_output = generation_output[0] text_output = tokenizer.decode(token_output) print("model.generate output: ", text_output) # Inference is also possible via Transformers' pipeline from transformers import pipeline pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, **generation_params ) pipe_output = pipe(prompt_template)[0]['generated_text'] print("pipeline output: ", pipe_output) ```