Files changed (1) hide show
  1. README.md +116 -143
README.md CHANGED
@@ -1,199 +1,172 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
 
9
 
 
10
 
 
 
 
11
 
12
- ## Model Details
 
 
13
 
14
- ### Model Description
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
 
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
 
 
 
 
 
 
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
 
27
 
28
- ### Model Sources [optional]
29
 
30
- <!-- Provide the basic links for the model. -->
 
 
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
- ## Uses
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
 
40
- ### Direct Use
 
 
 
 
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
 
44
- [More Information Needed]
 
 
45
 
46
- ### Downstream Use [optional]
 
 
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
 
50
- [More Information Needed]
 
 
 
51
 
52
- ### Out-of-Scope Use
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 
 
 
 
 
 
 
 
 
55
 
56
- [More Information Needed]
57
 
58
- ## Bias, Risks, and Limitations
 
 
 
 
 
59
 
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
 
62
- [More Information Needed]
63
 
64
- ### Recommendations
 
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
 
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
 
70
- ## How to Get Started with the Model
71
 
72
- Use the code below to get started with the model.
 
 
 
73
 
74
- [More Information Needed]
75
 
76
- ## Training Details
77
 
78
- ### Training Data
79
 
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
 
82
- [More Information Needed]
 
83
 
84
- ### Training Procedure
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
 
88
- #### Preprocessing [optional]
89
 
90
- [More Information Needed]
91
 
 
92
 
93
- #### Training Hyperparameters
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
 
97
- #### Speeds, Sizes, Times [optional]
98
 
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
 
101
- [More Information Needed]
 
102
 
103
- ## Evaluation
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
 
107
- ### Testing Data, Factors & Metrics
108
 
109
- #### Testing Data
110
 
111
- <!-- This should link to a Dataset Card if possible. -->
 
 
112
 
113
- [More Information Needed]
114
 
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
 
1
  ---
2
  library_name: transformers
3
+ tags:
4
+ - finance
5
+ license: llama3
6
+ base_model: meta-llama/Meta-Llama-3-8B-Instruct
7
+ datasets:
8
+ - virattt/financial-qa-10K
9
+ language:
10
+ - en
11
+ pipeline_tag: text-generation
12
  ---
13
 
14
+ # Llama 3 8B Instruct (Financial RAG)
15
 
16
+ This model is a fine-tuned version of the original [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model
17
+ on 4000 examples from the [virattt/financial-qa-10K](https://huggingface.co/datasets/virattt/financial-qa-10K) dataset.
18
 
19
+ The model is fine-tuned using a LoRA adapter for RAG use cases. It is optimized to answer a question based on a context:
20
 
21
+ ```txt
22
+ Answer the question:
23
+ {question}
24
 
25
+ Using the information:
26
+ {context}
27
+ ```
28
 
29
+ ## Usage
30
 
31
+ Load the model:
32
 
33
+ ```py
34
+ MODEL_NAME = "aliyasir/Llama-3-8B-Instruct-Finance-RAG"
35
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)
36
+ model = AutoModelForCausalLM.from_pretrained(
37
+ MODEL_NAME,
38
+ device_map="auto"
39
+ )
40
 
41
+ pipe = pipeline(
42
+ task="text-generation",
43
+ model=model,
44
+ tokenizer=tokenizer,
45
+ max_new_tokens=128,
46
+ return_full_text=False,
47
+ )
48
+ ```
49
 
50
+ Format the prompt (uses the original Instruct prompt format):
51
 
52
+ ````py
53
+ prompt = """
54
+ <|begin_of_text|><|start_header_id|>system<|end_header_id|>
55
 
56
+ Use only the information to answer the question<|eot_id|><|start_header_id|>user<|end_header_id|>
 
 
57
 
58
+ How much did the company's net earnings amount to in fiscal 2022?
59
 
60
+ Information:
61
 
62
+ ```
63
+ Net earnings were $17.1 billion in fiscal 2022.
64
+ ```<|eot_id|><|start_header_id|>assistant<|end_header_id|>
65
+ """
66
+ ````
67
 
68
+ And make a prediction:
69
 
70
+ ```py
71
+ print(outputs[0]["generated_text"])
72
+ ```
73
 
74
+ ```
75
+ $17.1 billion
76
+ ```
77
 
78
+ Here's a helper function to build your prompts:
79
 
80
+ ```py
81
+ def create_test_prompt(data_row):
82
+ prompt = dedent(f"""
83
+ {data_row["question"]}
84
 
85
+ Information:
86
 
87
+ ```
88
+ {data_row["context"]}
89
+ ```
90
+ """)
91
+ messages = [
92
+ {"role": "system", "content": "Use only the information to answer the question"},
93
+ {"role": "user", "content": prompt},
94
+ ]
95
+ return tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
96
+ ```
97
 
98
+ Where `data_row` must be a dict:
99
 
100
+ ```
101
+ data_row = {
102
+ "question": "...",
103
+ "context": "..."
104
+ }
105
+ ```
106
 
107
+ ## Sample Predictions
108
 
109
+ Here's a sample of the predictions from *trained* and *untrained* models
110
 
111
+ ```txt
112
+ Example 1
113
 
114
+ answer: Delta Air Lines' agreements with its regional carriers typically last at least ten years with options for extensions. Delta controls operational aspects like scheduling and pricing.
115
 
116
+ trained: Delta Connection agreements typically last at least ten years with options for extensions.
117
 
118
+ untrained: According to the information, the terms of Delta Air Lines' agreements with its regional carriers through Delta Connection are:
119
 
120
+ 1. The agreements typically last at least ten years.
121
+ 2. There are options for extensions.
122
+ 3. Delta controls major operational aspects like scheduling and pricing.
123
+ 4. The regional carriers supply the services.
124
 
125
+ Example 2
126
 
127
+ answer: The company evaluates acquisition-related intangibles for impairment by comparing the asset's carrying amount to undiscounted future net cash flows expected from the asset. An impairment loss is recognized if the carrying amount exceeds the asset's recoverable amount.
128
 
129
+ trained: The company evaluates acquisition-related intangible and other long-lived assets for impairment whenever events or changes in circumstances indicate that the carrying amount may not be recoverable. This involves measuring the recoverability of the asset by comparing its carrying amount to the future undiscounted net cash flows expected to be generated by the asset group.
130
 
131
+ untrained: According to the information, the company uses the following criteria to determine whether an impairment loss should be recognized on acquisition-related intangible assets:
132
 
133
+ * Events or changes in circumstances that indicate the carrying amount of an asset may not be recoverable.
134
+ * A comparison of the carrying amount of an asset to future undiscounted net cash flows expected to be generated by the asset group.
135
 
136
+ In other words, the company evaluates whether the asset's carrying amount is recoverable by comparing it to the expected future cash flows, which involves assumptions about future prospects and computations of estimated future cash flows.
137
 
138
+ Example 3
139
 
140
+ answer: In the United States, the approval process for biosimilars is governed by the Public Health Service Act (PHSA) and the regulations implementing these statutes, specifically including provisions made under federal health care reform legislation enacted in March 2010.
141
 
142
+ trained: The Federal Food, Drug, and Cosmetic Act (the FFDCA) and the Public Health Service Act (PHSA)
143
 
144
+ untrained: The legal framework that governs the approval process for biosimilars in the United States is the Federal Food, Drug, and Cosmetic Act (FFDCA) and the Public Health Service Act (PHSA), as well as the regulations implementing these statutes.
145
 
146
+ Example 4
147
 
148
+ answer: Timothy S. Teter holds a B.S. degree in Mechanical Engineering from the University of California at Davis and a J.D. degree from Stanford Law School.
149
 
150
+ trained: B.S. in Mechanical Engineering from the University of California at Davis and a J.D. from Stanford Law School
151
 
152
+ untrained: According to the information, Timothy S. Teter holds:
153
 
154
+ 1. A B.S. degree in Mechanical Engineering from the University of California at Davis.
155
+ 2. A J.D. degree from Stanford Law School.
156
 
157
+ Example 5
158
 
159
+ answer: Beginning in fiscal year 2024, the company plans to exclude paused Connected Fitness subscriptions from its new 'Ending Paid Connected Fitness Subscriptions' metric and will treat a pause action as a churn event in its 'Average Net Monthly Paid Connected Fitness Subscription Churn' metric.
160
 
161
+ trained: Starting in fiscal year 2024, the company will no longer include paused Connected Fitness subscriptions in their Ending Paid Connected Fitness Subscriptions metric and will treat a pause action as a churn event in their Average Net Monthly Paid Connected Fitness Subscription Churn.
162
 
163
+ untrained: Starting in fiscal year 2024, the company will:
164
 
165
+ * No longer include paused Connected Fitness subscriptions in the Ending Paid Connected Fitness Subscriptions metric
166
+ * Treat a pause action as a churn event in the Average Net Monthly Paid Connected Fitness Subscription Churn
167
+ ```
168
 
169
+ ## License
170
 
171
+ Uses the original Llama 3 License.
172
+ A custom commercial license is available at: https://llama.meta.com/llama3/license