Text Generation
Transformers
Safetensors
mixtral
conversational
text-generation-inference
Inference Endpoints
File size: 3,097 Bytes
41b92c6
 
0519a8f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41b92c6
36997c8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
license: apache-2.0
datasets:
  - Open-Orca/OpenOrca
  - OpenAssistant/oasst_top1_2023-08-25
language:
  - bg
  - ca
  - cs
  - da
  - de
  - en
  - es
  - fr
  - hr
  - hu
  - it
  - nl
  - pl
  - pt
  - ro
  - ru
  - sl
  - sr
  - sv
  - uk
library_name: transformers
---


from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
    GenerationConfig,
    TextIteratorStreamer,
)

from attention_sinks import AutoModelForCausalLM

import torch

# model_id = 'Open-Orca/Mistral-7B-OpenOrca'
model_id='NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v3'

model = AutoModelForCausalLM.from_pretrained(model_id,
                                             device_map="auto",
                                             trust_remote_code=True,
                                             torch_dtype=torch.bfloat16,
                                             load_in_4bit=True,
                                             low_cpu_mem_usage= True,
                                             #use_flash_attention_2=True, #GPU A100 or GPU supported

                                             attention_sink_size=4,
                                             attention_sink_window_size=1024, #512, # <- Low for the sake of faster generation
                                             )

max_length=2048
print("max_length",max_length)


tokenizer = AutoTokenizer.from_pretrained(model_id,
                                          # use_fast = False,
                                          max_length=max_length,)

tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'

#EXAMPLE #1
txt="""<|im_start|>user
I'm looking for an efficient Python script to output prime numbers. Can you help me out? I'm interested in a script that can handle large numbers and output them quickly. Also, it would be great if the script could take a range of numbers as input and output all the prime numbers within that range. Can you generate a script that fits these requirements? Thanks!<|im_end|>
<|im_start|>assistant
"""

#EXAMPLE #2
txt="""<|im_start|>user
Estoy desarrollando una REST API con Nodejs, y estoy tratando de aplicar algún sistema de seguridad, ya sea con tokens o algo similar, me puedes ayudar?<|im_end|>
<|im_start|>assistant
"""

inputs = tokenizer.encode(txt, return_tensors="pt").to("cuda")

generation_config = GenerationConfig(
              max_new_tokens=max_new_tokens,
              temperature=0.7,
              top_p=0.9,
              top_k=len_tokens,
              repetition_penalty=1.11, 
              do_sample=True,
              #  pad_token_id=tokenizer.eos_token_id,
              #  eos_token_id=tokenizer.eos_token_id,
              #  use_cache=True,
              # stopping_criteria= StoppingCriteriaList([stopping_criteria]),
          )
outputs = model.generate(generation_config=generation_config,
                                input_ids=inputs,)
tokenizer.decode(outputs[0], skip_special_tokens=False) #True