File size: 11,478 Bytes
73156f3
 
ca08fb3
73156f3
 
ca08fb3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45b57c5
ca08fb3
 
73156f3
 
c66a7b7
b5dbfe2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45b57c5
b5dbfe2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c66a7b7
 
b5dbfe2
 
 
 
72e8d4a
45b57c5
72e8d4a
 
 
 
 
 
 
 
45b57c5
 
b5dbfe2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73156f3
b5dbfe2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73156f3
b5dbfe2
73156f3
 
 
 
 
 
 
 
 
 
 
 
 
b5dbfe2
73156f3
b5dbfe2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ca08fb3
b5dbfe2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
---
language:
  - en
license: apache-2.0
tags:
  - united states air force
  - united states space force
  - department of defense
  - dod
  - usaf
  - ussf
  - afi
  - air force
  - space force
  - bullets
  - performance reports
  - evaluations
  - awards
  - opr
  - epr
  - narratives
  - interpreter
  - translation
  - t5
  - mbzuai
  - lamini-flan-t5-783m
  - flan-t5
  - google
  - opera
  - justinthelaw
widget:
  - text: "Using full sentences, expand upon the following Air and Space Force bullet statement by spelling-out acronyms and adding additional context: - Attended 4-hour EPD Instructor training; taught 3 2-hour Wing EPD & 4 1-hour bullet writing courses--prepared 164 for leadership"
    example_title: "Example Usage"
---

# Opera Bullet Interpreter

An unofficial United States Air Force and Space Force performance statement "translation" model. Takes a properly formatted performance statement, also known as a "bullet," as an input and outputs a long-form sentence, using plain english, describing the accomplishments captured within the bullet.

This checkpoint is a fine-tuned version of the LaMini-Flan-T5-783M, using the justinthelaw/opera-bullet-completions (private) dataset.

To learn more about this project, please visit the [Opera GitHub Repository](https://github.com/justinthelaw/opera).

# Table of Contents

- [Model Details](#model-details)
- [Uses](#uses)
- [Bias, Risks, and Limitations](#bias-risks-and-limitations)
- [Training Details](#training-details)
- [Evaluation](#evaluation)
- [Model Examination](#model-examination)
- [Environmental Impact](#environmental-impact)
- [Technical Specifications](#technical-specifications-optional)
- [Citation](#citation)
- [Model Card Authors](#model-card-authors-optional)
- [Model Card Contact](#model-card-contact)
- [How to Get Started with the Model](#how-to-get-started-with-the-model)

# Model Details

## Model Description

An unofficial United States Air Force and Space Force performance statement "translation" model. Takes a properly formatted performance statement, also known as a "bullet," as an input and outputs a long-form sentence, using plain english, describing the accomplishments captured within the bullet.

This is a fine-tuned version of the LaMini-Flan-T5-783M, using the justinthelaw/opera-bullet-completions (private) dataset.

- **Developed by:** Justin Law, Alden Davidson, Christopher Kodama, My Tran
- **Model type:** Language Model
- **Language(s) (NLP):** en
- **License:** apache-2.0
- **Parent Model:** [LaMini-Flan-T5-783M](https://huggingface.co/MBZUAI/LaMini-Flan-T5-783M)
- **Resources for more information:** More information needed
  - [GitHub Repo](https://github.com/justinthelaw/opera)
  - [Associated Paper](https://huggingface.co/MBZUAI/LaMini-Flan-T5-783M)

# Uses

**_DISCLAIMER_**: Use of the model using Hugging Face's Inference API widget, beside this card, on the Hugging Face website will produce poor results. Please see "[How to Get Started with the Model](#How-to-Get-Started-with-the-Model)" for more details on how to use this model properly.

## Direct Use

Used to programmatically produce training data for Opera's Bullet Forge (see GitHub repository for details).

The exact prompt to achieve the desired result is: "Using full sentences, expand upon the following Air and Space Force bullet statement by spelling-out acronyms and adding additional context: [INSERT BULLET HERE]"

Below are some examples of this model's v0.1.0 iteration generating acceptable translations of bullets that it was not previously exposed to during training nor validation:

| Bullet                                                                                                         | Translation to Sentence                                                                                                                                                                                                                 |
| :------------------------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| - Maintained 112 acft G-files; conducted 100% insp of T.Osjob guides--efforts key to flt's 96% LSEP pass rate  | Maintained 112 aircraft G-files and conducted 100% inspection of T.O.s job guides, contributing to the flight's 96% LSEP pass rate.                                                                                                     |
| - Spearheaded mx for 43 nuke-cert vehs$5.2M; achieved peak 99% MC rt--vital to SECAF #1 prioritynuc deterrence | Spearheaded the maintenance for 43 nuke-cert vehicles worth $5.2 million, achieving a peak 99% mission capability rating. This mission was vital to the Space and Defense Communications System (SECAF) #1 priority nuclear deterrence. |
| - Superb NCO; mng'd mobility ofc during LibyanISAF ops; continuously outshines peers--promote to MSgt now      | Superb Non-Commissioned Officer (NCO) who managed the mobility operation during Libyan ISAF operations. I continuously outshines my peers and deserve a promotion to MSgt now.                                                          |
| - Managed PMEL prgrm; maintained 300+ essential equipment calibration items--reaped 100% TMDE pass rt          | I managed the PMEL program and maintained over 300+ essential equipment calibration items, resulting in a 100% Test, Measurement, and Diagnostic Equipment (TMDE) pass rate.                                                            |

## Downstream Use

Used to quickly interpret bullets written by Airman (Air Force) or Guardians (Space Force), into long-form, plain English sentences.

## Out-of-Scope Use

Generating bullets from long-form, plain English sentences. General NLP functionality.

# Bias, Risks, and Limitations

Specialized acronyms or abbreviations specific to small units may not be transformed properly. Bullets in highly non-standard formats may result in lower quality results.

## Recommendations

Look-up acronyms to ensure the correct narrative is being formed. Double-check (spot check) bullets with slightly more complex acronyms and abbreviations for narrative precision.

# Training Details

## Training Data

The model was fine-tuned on the justinthelaw/opera-bullet-completions dataset, which can be partially found at the GitHub repository.

## Training Procedure

### Preprocessing

The justinthelaw/opera-bullet-completions dataset was created using a custom Python web-scraper, along with some custom cleaning functions, all of which can be found at the GitHub repository.

### Speeds, Sizes, Times

It takes approximately 3-5 seconds per inference when using any standard-sized Air and Space Force bullet statement.

# Evaluation

## Testing Data, Factors & Metrics

### Testing Data

20% of the justinthelaw/opera-bullet-completions dataset was used to validate the model's performance.

### Factors

Repitition, contextual loss, and bullet format are all loss factors tied into the backward propogation calculations and validation steps.

### Metrics

ROGUE scores were computed and averaged. These may be provided in future iterations of this model's development.

## Results

# Model Examination

More information needed

# Environmental Impact

- **Hardware Type:** 2019 MacBook Pro, 16 inch
- **Hours used:** 18
- **Cloud Provider:** N/A
- **Compute Region:** N/A
- **Carbon Emitted:** N/A

# Technical Specifications

### Hardware

2.6 GHz 6-Core Intel Core i7, 16 GB 2667 MHz DDR4, AMD Radeon Pro 5300M 4 GB

### Software

VSCode, Jupyter Notebook, Python3, PyTorch, Transformers, Pandas, Asyncio, Loguru, Rich

# Citation

**BibTeX:**

```
@article{lamini-lm,
  author       = {Minghao Wu and
                  Abdul Waheed and
                  Chiyu Zhang and
                  Muhammad Abdul-Mageed and
                  Alham Fikri Aji
                  },
  title        = {LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions},
  journal      = {CoRR},
  volume       = {abs/2304.14402},
  year         = {2023},
  url          = {https://arxiv.org/abs/2304.14402},
  eprinttype   = {arXiv},
  eprint       = {2304.14402}
}
```

# Model Card Authors

Justin Law, Alden Davidson, Christopher Kodama, My Tran

# Model Card Contact

Email: [email protected]

# How to Get Started with the Model

Use the code below to get started with the model.

<details>
<summary> Click to expand </summary>

```python
import torch
from transformers import T5ForConditionalGeneration, T5Tokenizer

bullet_data_creation_prefix = "Using full sentences, expand upon the following Air and Space Force bullet statement by spelling-out acronyms and adding additional context: "

# Path of the pre-trained model that will be used
model_path = "justinthelaw/opera-bullet-interpreter"
# Path of the pre-trained model tokenizer that will be used
# Must match the model checkpoint's signature
tokenizer_path = "justinthelaw/opera-bullet-interpreter"
# Max length of tokens a user may enter for summarization
# Increasing this beyond 512 may increase compute time significantly
max_input_token_length = 512
# Max length of tokens the model should output for the summary
# Approximately the number of tokens it may take to generate a bullet
max_output_token_length = 512
# Beams to use for beam search algorithm
# Increased beams means increased quality, but increased compute time
number_of_beams = 6
# Scales logits before soft-max to control randomness
# Lower values (~0) make output more deterministic
temperature = 0.5
# Limits generated tokens to top K probabilities
# Reduces chances of rare word predictions
top_k = 50
# Applies nucleus sampling, limiting token selection to a cumulative probability
# Creates a balance between randomness and determinism
top_p = 0.90

try:
    tokenizer = T5Tokenizer.from_pretrained(
        f"{model_path}",
        model_max_length=max_input_token_length,
        add_special_tokens=False,
    )
    input_model = T5ForConditionalGeneration.from_pretrained(f"{model_path}")
    logger.info(f"Loading {model_path}...")
    # Set device to be used based on GPU availability
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    # Model is sent to device for use
    model = input_model.to(device)  # type: ignore

    input_text = bullet_data_creation_prefix + input("Input a US Air or Space Force bullet: ")

    encoded_input_text = tokenizer.encode_plus(
        input_text,
        return_tensors="pt",
        truncation=True,
        max_length=max_input_token_length,
    )

    # Generate summary
    summary_ids = model.generate(
        encoded_input_text["input_ids"],
        attention_mask=encoded_input_text["attention_mask"],
        max_length=max_output_token_length,
        num_beams=number_of_beams,
        temperature=temperature,
        top_k=top_k,
        top_p=top_p,
        early_stopping=True,
    )

    output_text = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

    # input_text and output_text insert into data sets
    print(input_line["output"] + "\n\t" + output_text)

except KeyboardInterrupt:
    print("Received interrupt, stopping script...")
except Exception as e:
    print(f"An error occurred during generation: {e}")
```

</details>