mipo57 commited on
Commit
a0c0f4f
1 Parent(s): b24f71f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -0
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ datasets:
6
+ - Intel/orca_dpo_pairs
7
+ pipeline_tag: conversational
8
+ library_name: peft
9
+ tags:
10
+ - llm
11
+ - 7b
12
+ ---
13
+ # Jaskier 7b DPO V2
14
+
15
+ **This is work-in-progress model, may not be ready for production use**
16
+
17
+ Model based on `mindy-labs/mindy-7b-v2` (downstream version of Mistral7B) finetuned using Direct Preference Optimization on Intel/orca_dpo_pairs.
18
+
19
+ ## How to use
20
+
21
+ You can use this model directly with a pipeline for sentiment-analysis:
22
+ ```python
23
+ from transformers import StoppingCriteria, StoppingCriteriaList, pipeline
24
+
25
+ class StopOnTokens(StoppingCriteria):
26
+ def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
27
+ stop_ids = torch.tensor([28789, 28766,321,28730, 416, 28766, 28767]).to(input_ids.device)
28
+ if len(input_ids[0]) < len(stop_ids):
29
+ return False
30
+
31
+ if torch.equal(input_ids[0][-len(stop_ids):], stop_ids):
32
+ return True
33
+ return False
34
+
35
+ model_name = "bardsai/jaskier-7b-dpo-v2"
36
+
37
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
38
+ pipeline = pipeline(
39
+ "text-generation",
40
+ model=model_name,
41
+ tokenizer=tokenizer,
42
+ device="cuda:0"
43
+ )
44
+
45
+ messages = [
46
+ {"role": "system", "content": "Your task is to extract country names from the text provided by user. Return in comma-separated format."},
47
+ {"role": "user", "content": "Germany,[e] officially the Federal Republic of Germany,[f] is a country in the western region of Central Europe. It is the second-most populous country in Europe after Russia,[g] and the most populous member state of the European Union. Germany lies between the Baltic and North Sea to the north and the Alps to the south. Its 16 constituent states have a total population of over 84 million, cover a combined area of 357,600 km2 (138,100 sq mi) and are bordered by Denmark to the north, Poland and the Czech Republic to the east, Austria and Switzerland to the south, and France, Luxembourg, Belgium, and the Netherlands to the west. The nation's capital and most populous city is Berlin and its main financial centre is Frankfurt; the largest urban area is the Ruhr."}
48
+ ]
49
+ prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
50
+
51
+ # Generate text
52
+ sequences = pipeline(
53
+ prompt,
54
+ do_sample=True,
55
+ temperature=0.7,
56
+ top_p=0.9,
57
+ num_return_sequences=1,
58
+ max_length=300,
59
+ stopping_criteria=StoppingCriteriaList([StopOnTokens()])
60
+ )
61
+
62
+ print(sequences[0])
63
+ ```
64
+
65
+ ### Output
66
+ > Germany,Denmark,Poland,Czech Republic,Austria,Switzerland,France,Luxembourg,Belgium,Netherlands
67
+
68
+ ## Changelog
69
+
70
+ - 2023-01-10: Initial release
71
+
72
+ ## About bards.ai
73
+
74
+ At bards.ai, we focus on providing machine learning expertise and skills to our partners, particularly in the areas of nlp, machine vision and time series analysis. Our team is located in Wroclaw, Poland. Please visit our website for more information: bards.ai
75
+
76
+ Let us know if you use our model :). Also, if you need any help, feel free to contact us at [email protected]