File size: 9,217 Bytes

---
language:
  - en
license: apache-2.0
tags:
  - united states air force
  - united states space force
  - department of defense
  - dod
  - usaf
  - ussf
  - afi
  - air force
  - space force
  - bullets
  - performance reports
  - evaluations
  - awards
  - opr
  - epr
  - narratives
  - interpreter
  - translation
  - t5
  - mbzuai
  - lamini-flan-t5-783m
  - flan-t5
  - google
  - opera
  - justinthelaw
widget:
  - text: "Using full sentences, expand upon the following Air and Space Force bullet statement by spelling-out acronyms and adding additional context: - Attended 4-hour EPD Instructor training; taught 3 2-hour Wing EPD & 4 1-hour bullet writing courses--prepared 164 for leadership"
    example_title: "Example Usage"
---

# Opera Bullet Interpreter

An unofficial United States Air Force and Space Force performance statement "translation" model. Takes a properly formatted performance statement, also known as a "bullet," as an input and outputs a long-form sentence, using plain english, describing the accomplishments captured within the bullet.

This checkpoint is a fine-tuned version of the LaMini-Flan-T5-783M, using the justinthelaw/opera-bullet-completions (private) dataset.

To learn more about this project, please visit the [Opera GitHub Repository](https://github.com/justinthelaw/opera).

# Table of Contents

- [Model Details](#model-details)
- [Uses](#uses)
- [Bias, Risks, and Limitations](#bias-risks-and-limitations)
- [Training Details](#training-details)
- [Evaluation](#evaluation)
- [Model Examination](#model-examination)
- [Environmental Impact](#environmental-impact)
- [Technical Specifications](#technical-specifications-optional)
- [Citation](#citation)
- [Model Card Authors](#model-card-authors-optional)
- [Model Card Contact](#model-card-contact)
- [How to Get Started with the Model](#how-to-get-started-with-the-model)

# Model Details

## Model Description

An unofficial United States Air Force and Space Force performance statement "translation" model. Takes a properly formatted performance statement, also known as a "bullet," as an input and outputs a long-form sentence, using plain english, describing the accomplishments captured within the bullet.

This is a fine-tuned version of the LaMini-Flan-T5-783M, using the justinthelaw/opera-bullet-completions (private) dataset.

- **Developed by:** Justin Law, Alden Davidson, Christopher Kodama, My Tran
- **Model type:** Language Model
- **Language(s) (NLP):** en
- **License:** apache-2.0
- **Parent Model:** [LaMini-Flan-T5-783M](https://huggingface.co/MBZUAI/LaMini-Flan-T5-783M)
- **Resources for more information:** More information needed
  - [GitHub Repo](https://github.com/justinthelaw/opera)
  - [Associated Paper](https://huggingface.co/MBZUAI/LaMini-Flan-T5-783M)

# Uses

**_DISCLAIMER_**: Use of the model using Hugging Face's Inference API widget, beside this card, on the Hugging Face website will produce poor results. Please see "[How to Get Started with the Model](#How-to-Get-Started-with-the-Model)" for more details on how to use this model properly.

## Direct Use

Used to programmatically produce training data for Opera&#39;s Bullet Forge (see GitHub repository for details).

The exact prompt to achieve the desired result is:

```
Using full sentences, expand upon the following Air and Space Force bullet statement by spelling-out acronyms and adding additional context: <INSERT BULLET HERE>"
```

## Downstream Use

Used to quickly interpret bullets written by Airman (Air Force) or Guardians (Space Force), into long-form, plain English sentences.

## Out-of-Scope Use

Generating bullets from long-form, plain English sentences. General NLP functionality.

# Bias, Risks, and Limitations

Specialized acronyms or abbreviations specific to small units may not be transformed properly. Bullets in highly non-standard formats may result in lower quality results.

## Recommendations

Look-up acronyms to ensure the correct narrative is being formed. Double-check (spot check) bullets with slightly more complex acronyms and abbreviations for narrative precision.

# Training Details

## Training Data

The model was fine-tuned on the justinthelaw/opera-bullet-completions dataset, which can be partially found at the GitHub repository.

## Training Procedure

### Preprocessing

The justinthelaw/opera-bullet-completions dataset was created using a custom Python web-scraper, along with some custom cleaning functions, all of which can be found at the GitHub repository.

### Speeds, Sizes, Times

It takes approximately 3-5 seconds per inference when using any standard-sized Air and Space Force bullet statement.

# Evaluation

## Testing Data, Factors & Metrics

### Testing Data

20% of the justinthelaw/opera-bullet-completions dataset was used to validate the model's performance.

### Factors

Repitition, contextual loss, and bullet format are all loss factors tied into the backward propogation calculations and validation steps.

### Metrics

ROGUE scores were computed and averaged. These may be provided in future iterations of this model's development.

## Results

# Model Examination

More information needed

# Environmental Impact

- **Hardware Type:** 2019 MacBook Pro, 16 inch
- **Hours used:** 18
- **Cloud Provider:** N/A
- **Compute Region:** N/A
- **Carbon Emitted:** N/A

# Technical Specifications

### Hardware

2.6 GHz 6-Core Intel Core i7, 16 GB 2667 MHz DDR4, AMD Radeon Pro 5300M 4 GB

### Software

VSCode, Jupyter Notebook, Python3, PyTorch, Transformers, Pandas, Asyncio, Loguru, Rich

# Citation

**BibTeX:**

```
@article{lamini-lm,
  author       = {Minghao Wu and
                  Abdul Waheed and
                  Chiyu Zhang and
                  Muhammad Abdul-Mageed and
                  Alham Fikri Aji
                  },
  title        = {LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions},
  journal      = {CoRR},
  volume       = {abs/2304.14402},
  year         = {2023},
  url          = {https://arxiv.org/abs/2304.14402},
  eprinttype   = {arXiv},
  eprint       = {2304.14402}
}
```

# Model Card Authors

Justin Law, Alden Davidson, Christopher Kodama, My Tran

# Model Card Contact

Email: [email protected]

# How to Get Started with the Model

Use the code below to get started with the model.

<details>
<summary> Click to expand </summary>

```python
import torch
from transformers import T5ForConditionalGeneration, T5Tokenizer

bullet_data_creation_prefix = "Using full sentences, expand upon the following Air and Space Force bullet statement by spelling-out acronyms and adding additional context: "

# Path of the pre-trained model that will be used
model_path = "justinthelaw/opera-bullet-interpreter"
# Path of the pre-trained model tokenizer that will be used
# Must match the model checkpoint's signature
tokenizer_path = "justinthelaw/opera-bullet-interpreter"
# Max length of tokens a user may enter for summarization
# Increasing this beyond 512 may increase compute time significantly
max_input_token_length = 512
# Max length of tokens the model should output for the summary
# Approximately the number of tokens it may take to generate a bullet
max_output_token_length = 512
# Beams to use for beam search algorithm
# Increased beams means increased quality, but increased compute time
number_of_beams = 6
# Scales logits before soft-max to control randomness
# Lower values (~0) make output more deterministic
temperature = 0.5
# Limits generated tokens to top K probabilities
# Reduces chances of rare word predictions
top_k = 50
# Applies nucleus sampling, limiting token selection to a cumulative probability
# Creates a balance between randomness and determinism
top_p = 0.90

try:
    tokenizer = T5Tokenizer.from_pretrained(
        f"{model_path}",
        model_max_length=max_input_token_length,
        add_special_tokens=False,
    )
    input_model = T5ForConditionalGeneration.from_pretrained(f"{model_path}")
    logger.info(f"Loading {model_path}...")
    # Set device to be used based on GPU availability
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    # Model is sent to device for use
    model = input_model.to(device)  # type: ignore

    input_text = bullet_data_creation_prefix + input("Input a US Air or Space Force bullet: ")

    encoded_input_text = tokenizer.encode_plus(
        input_text,
        return_tensors="pt",
        truncation=True,
        max_length=max_input_token_length,
    )

    # Generate summary
    summary_ids = model.generate(
        encoded_input_text["input_ids"],
        attention_mask=encoded_input_text["attention_mask"],
        max_length=max_output_token_length,
        num_beams=number_of_beams,
        temperature=temperature,
        top_k=top_k,
        top_p=top_p,
        early_stopping=True,
    )

    output_text = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

    # input_text and output_text insert into data sets
    print(input_line["output"] + "\n\t" + output_text)

except KeyboardInterrupt:
    print("Received interrupt, stopping script...")
except Exception as e:
    print(f"An error occurred during generation: {e}")
```

</details>