Update README.md
Browse files
README.md
CHANGED
@@ -1,199 +1,172 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
-
tags:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
-
#
|
7 |
|
8 |
-
|
|
|
9 |
|
|
|
10 |
|
|
|
|
|
|
|
11 |
|
12 |
-
|
|
|
|
|
13 |
|
14 |
-
|
15 |
|
16 |
-
|
17 |
|
18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
-
|
21 |
-
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
|
|
27 |
|
28 |
-
|
29 |
|
30 |
-
|
|
|
|
|
31 |
|
32 |
-
|
33 |
-
- **Paper [optional]:** [More Information Needed]
|
34 |
-
- **Demo [optional]:** [More Information Needed]
|
35 |
|
36 |
-
|
37 |
|
38 |
-
|
39 |
|
40 |
-
|
|
|
|
|
|
|
|
|
41 |
|
42 |
-
|
43 |
|
44 |
-
|
|
|
|
|
45 |
|
46 |
-
|
|
|
|
|
47 |
|
48 |
-
|
49 |
|
50 |
-
|
|
|
|
|
|
|
51 |
|
52 |
-
|
53 |
|
54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
-
|
57 |
|
58 |
-
|
|
|
|
|
|
|
|
|
|
|
59 |
|
60 |
-
|
61 |
|
62 |
-
|
63 |
|
64 |
-
|
|
|
65 |
|
66 |
-
|
67 |
|
68 |
-
|
69 |
|
70 |
-
|
71 |
|
72 |
-
|
|
|
|
|
|
|
73 |
|
74 |
-
|
75 |
|
76 |
-
|
77 |
|
78 |
-
|
79 |
|
80 |
-
|
81 |
|
82 |
-
|
|
|
83 |
|
84 |
-
|
85 |
|
86 |
-
|
87 |
|
88 |
-
|
89 |
|
90 |
-
|
91 |
|
|
|
92 |
|
93 |
-
|
94 |
|
95 |
-
|
96 |
|
97 |
-
|
98 |
|
99 |
-
|
100 |
|
101 |
-
|
|
|
102 |
|
103 |
-
|
104 |
|
105 |
-
|
106 |
|
107 |
-
|
108 |
|
109 |
-
|
110 |
|
111 |
-
|
|
|
|
|
112 |
|
113 |
-
|
114 |
|
115 |
-
|
116 |
-
|
117 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
118 |
-
|
119 |
-
[More Information Needed]
|
120 |
-
|
121 |
-
#### Metrics
|
122 |
-
|
123 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
124 |
-
|
125 |
-
[More Information Needed]
|
126 |
-
|
127 |
-
### Results
|
128 |
-
|
129 |
-
[More Information Needed]
|
130 |
-
|
131 |
-
#### Summary
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
## Model Examination [optional]
|
136 |
-
|
137 |
-
<!-- Relevant interpretability work for the model goes here -->
|
138 |
-
|
139 |
-
[More Information Needed]
|
140 |
-
|
141 |
-
## Environmental Impact
|
142 |
-
|
143 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
144 |
-
|
145 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
146 |
-
|
147 |
-
- **Hardware Type:** [More Information Needed]
|
148 |
-
- **Hours used:** [More Information Needed]
|
149 |
-
- **Cloud Provider:** [More Information Needed]
|
150 |
-
- **Compute Region:** [More Information Needed]
|
151 |
-
- **Carbon Emitted:** [More Information Needed]
|
152 |
-
|
153 |
-
## Technical Specifications [optional]
|
154 |
-
|
155 |
-
### Model Architecture and Objective
|
156 |
-
|
157 |
-
[More Information Needed]
|
158 |
-
|
159 |
-
### Compute Infrastructure
|
160 |
-
|
161 |
-
[More Information Needed]
|
162 |
-
|
163 |
-
#### Hardware
|
164 |
-
|
165 |
-
[More Information Needed]
|
166 |
-
|
167 |
-
#### Software
|
168 |
-
|
169 |
-
[More Information Needed]
|
170 |
-
|
171 |
-
## Citation [optional]
|
172 |
-
|
173 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
174 |
-
|
175 |
-
**BibTeX:**
|
176 |
-
|
177 |
-
[More Information Needed]
|
178 |
-
|
179 |
-
**APA:**
|
180 |
-
|
181 |
-
[More Information Needed]
|
182 |
-
|
183 |
-
## Glossary [optional]
|
184 |
-
|
185 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
186 |
-
|
187 |
-
[More Information Needed]
|
188 |
-
|
189 |
-
## More Information [optional]
|
190 |
-
|
191 |
-
[More Information Needed]
|
192 |
-
|
193 |
-
## Model Card Authors [optional]
|
194 |
-
|
195 |
-
[More Information Needed]
|
196 |
-
|
197 |
-
## Model Card Contact
|
198 |
-
|
199 |
-
[More Information Needed]
|
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
+
tags:
|
4 |
+
- finance
|
5 |
+
license: llama3
|
6 |
+
base_model: meta-llama/Meta-Llama-3-8B-Instruct
|
7 |
+
datasets:
|
8 |
+
- virattt/financial-qa-10K
|
9 |
+
language:
|
10 |
+
- en
|
11 |
+
pipeline_tag: text-generation
|
12 |
---
|
13 |
|
14 |
+
# Llama 3 8B Instruct (Financial RAG)
|
15 |
|
16 |
+
This model is a fine-tuned version of the original [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model
|
17 |
+
on 4000 examples from the [virattt/financial-qa-10K](https://huggingface.co/datasets/virattt/financial-qa-10K) dataset.
|
18 |
|
19 |
+
The model is fine-tuned using a LoRA adapter for RAG use cases. It is optimized to answer a question based on a context:
|
20 |
|
21 |
+
```txt
|
22 |
+
Answer the question:
|
23 |
+
{question}
|
24 |
|
25 |
+
Using the information:
|
26 |
+
{context}
|
27 |
+
```
|
28 |
|
29 |
+
## Usage
|
30 |
|
31 |
+
Load the model:
|
32 |
|
33 |
+
```py
|
34 |
+
MODEL_NAME = "aliyasir/Llama-3-8B-Instruct-Finance-RAG"
|
35 |
+
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)
|
36 |
+
model = AutoModelForCausalLM.from_pretrained(
|
37 |
+
MODEL_NAME,
|
38 |
+
device_map="auto"
|
39 |
+
)
|
40 |
|
41 |
+
pipe = pipeline(
|
42 |
+
task="text-generation",
|
43 |
+
model=model,
|
44 |
+
tokenizer=tokenizer,
|
45 |
+
max_new_tokens=128,
|
46 |
+
return_full_text=False,
|
47 |
+
)
|
48 |
+
```
|
49 |
|
50 |
+
Format the prompt (uses the original Instruct prompt format):
|
51 |
|
52 |
+
````py
|
53 |
+
prompt = """
|
54 |
+
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
|
55 |
|
56 |
+
Use only the information to answer the question<|eot_id|><|start_header_id|>user<|end_header_id|>
|
|
|
|
|
57 |
|
58 |
+
How much did the company's net earnings amount to in fiscal 2022?
|
59 |
|
60 |
+
Information:
|
61 |
|
62 |
+
```
|
63 |
+
Net earnings were $17.1 billion in fiscal 2022.
|
64 |
+
```<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
65 |
+
"""
|
66 |
+
````
|
67 |
|
68 |
+
And make a prediction:
|
69 |
|
70 |
+
```py
|
71 |
+
print(outputs[0]["generated_text"])
|
72 |
+
```
|
73 |
|
74 |
+
```
|
75 |
+
$17.1 billion
|
76 |
+
```
|
77 |
|
78 |
+
Here's a helper function to build your prompts:
|
79 |
|
80 |
+
```py
|
81 |
+
def create_test_prompt(data_row):
|
82 |
+
prompt = dedent(f"""
|
83 |
+
{data_row["question"]}
|
84 |
|
85 |
+
Information:
|
86 |
|
87 |
+
```
|
88 |
+
{data_row["context"]}
|
89 |
+
```
|
90 |
+
""")
|
91 |
+
messages = [
|
92 |
+
{"role": "system", "content": "Use only the information to answer the question"},
|
93 |
+
{"role": "user", "content": prompt},
|
94 |
+
]
|
95 |
+
return tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
96 |
+
```
|
97 |
|
98 |
+
Where `data_row` must be a dict:
|
99 |
|
100 |
+
```
|
101 |
+
data_row = {
|
102 |
+
"question": "...",
|
103 |
+
"context": "..."
|
104 |
+
}
|
105 |
+
```
|
106 |
|
107 |
+
## Sample Predictions
|
108 |
|
109 |
+
Here's a sample of the predictions from *trained* and *untrained* models
|
110 |
|
111 |
+
```txt
|
112 |
+
Example 1
|
113 |
|
114 |
+
answer: Delta Air Lines' agreements with its regional carriers typically last at least ten years with options for extensions. Delta controls operational aspects like scheduling and pricing.
|
115 |
|
116 |
+
trained: Delta Connection agreements typically last at least ten years with options for extensions.
|
117 |
|
118 |
+
untrained: According to the information, the terms of Delta Air Lines' agreements with its regional carriers through Delta Connection are:
|
119 |
|
120 |
+
1. The agreements typically last at least ten years.
|
121 |
+
2. There are options for extensions.
|
122 |
+
3. Delta controls major operational aspects like scheduling and pricing.
|
123 |
+
4. The regional carriers supply the services.
|
124 |
|
125 |
+
Example 2
|
126 |
|
127 |
+
answer: The company evaluates acquisition-related intangibles for impairment by comparing the asset's carrying amount to undiscounted future net cash flows expected from the asset. An impairment loss is recognized if the carrying amount exceeds the asset's recoverable amount.
|
128 |
|
129 |
+
trained: The company evaluates acquisition-related intangible and other long-lived assets for impairment whenever events or changes in circumstances indicate that the carrying amount may not be recoverable. This involves measuring the recoverability of the asset by comparing its carrying amount to the future undiscounted net cash flows expected to be generated by the asset group.
|
130 |
|
131 |
+
untrained: According to the information, the company uses the following criteria to determine whether an impairment loss should be recognized on acquisition-related intangible assets:
|
132 |
|
133 |
+
* Events or changes in circumstances that indicate the carrying amount of an asset may not be recoverable.
|
134 |
+
* A comparison of the carrying amount of an asset to future undiscounted net cash flows expected to be generated by the asset group.
|
135 |
|
136 |
+
In other words, the company evaluates whether the asset's carrying amount is recoverable by comparing it to the expected future cash flows, which involves assumptions about future prospects and computations of estimated future cash flows.
|
137 |
|
138 |
+
Example 3
|
139 |
|
140 |
+
answer: In the United States, the approval process for biosimilars is governed by the Public Health Service Act (PHSA) and the regulations implementing these statutes, specifically including provisions made under federal health care reform legislation enacted in March 2010.
|
141 |
|
142 |
+
trained: The Federal Food, Drug, and Cosmetic Act (the FFDCA) and the Public Health Service Act (PHSA)
|
143 |
|
144 |
+
untrained: The legal framework that governs the approval process for biosimilars in the United States is the Federal Food, Drug, and Cosmetic Act (FFDCA) and the Public Health Service Act (PHSA), as well as the regulations implementing these statutes.
|
145 |
|
146 |
+
Example 4
|
147 |
|
148 |
+
answer: Timothy S. Teter holds a B.S. degree in Mechanical Engineering from the University of California at Davis and a J.D. degree from Stanford Law School.
|
149 |
|
150 |
+
trained: B.S. in Mechanical Engineering from the University of California at Davis and a J.D. from Stanford Law School
|
151 |
|
152 |
+
untrained: According to the information, Timothy S. Teter holds:
|
153 |
|
154 |
+
1. A B.S. degree in Mechanical Engineering from the University of California at Davis.
|
155 |
+
2. A J.D. degree from Stanford Law School.
|
156 |
|
157 |
+
Example 5
|
158 |
|
159 |
+
answer: Beginning in fiscal year 2024, the company plans to exclude paused Connected Fitness subscriptions from its new 'Ending Paid Connected Fitness Subscriptions' metric and will treat a pause action as a churn event in its 'Average Net Monthly Paid Connected Fitness Subscription Churn' metric.
|
160 |
|
161 |
+
trained: Starting in fiscal year 2024, the company will no longer include paused Connected Fitness subscriptions in their Ending Paid Connected Fitness Subscriptions metric and will treat a pause action as a churn event in their Average Net Monthly Paid Connected Fitness Subscription Churn.
|
162 |
|
163 |
+
untrained: Starting in fiscal year 2024, the company will:
|
164 |
|
165 |
+
* No longer include paused Connected Fitness subscriptions in the Ending Paid Connected Fitness Subscriptions metric
|
166 |
+
* Treat a pause action as a churn event in the Average Net Monthly Paid Connected Fitness Subscription Churn
|
167 |
+
```
|
168 |
|
169 |
+
## License
|
170 |
|
171 |
+
Uses the original Llama 3 License.
|
172 |
+
A custom commercial license is available at: https://llama.meta.com/llama3/license
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|