|
|
|
# LLMDataParser |
|
|
|
**LLMDataParser** is a Python library that provides parsers for benchmark datasets used in evaluating Large Language Models (LLMs). It offers a unified interface for loading and parsing datasets like **MMLU** and **GSM8k**, simplifying dataset preparation for LLM evaluation. |
|
|
|
## Features |
|
|
|
- **Unified Interface**: Consistent `DatasetParser` for all datasets. |
|
- **LLM-Agnostic**: Independent of any specific language model. |
|
- **Easy to Use**: Simple methods and built-in Python types. |
|
- **Extensible**: Easily add support for new datasets. |
|
|
|
## Installation |
|
|
|
### Option 1: Using pip |
|
|
|
You can install the package directly using `pip`. Even with only a `pyproject.toml` file, this method works for standard installations. |
|
|
|
1. **Clone the Repository**: |
|
|
|
```bash |
|
git clone https://github.com/jeff52415/LLMDataParser.git |
|
cd LLMDataParser |
|
``` |
|
|
|
2. **Install Dependencies with pip**: |
|
|
|
```bash |
|
pip install . |
|
``` |
|
|
|
### Option 2: Using Poetry |
|
|
|
Poetry manages the virtual environment and dependencies automatically, so you don't need to create a conda environment first. |
|
|
|
1. **Install Dependencies with Poetry**: |
|
|
|
```bash |
|
poetry install |
|
``` |
|
|
|
2. **Activate the Virtual Environment**: |
|
|
|
```bash |
|
poetry shell |
|
``` |
|
|
|
|
|
## Available Parsers |
|
|
|
- **MMLUParser**: Parses the MMLU dataset. |
|
- **GSM8kParser**: Parses the GSM8k dataset. |
|
|
|
## Contributing |
|
|
|
Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. |
|
|
|
## License |
|
|
|
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details. |
|
|
|
## Contact |
|
|
|
For questions or support, please open an issue on GitHub or contact [[email protected]](mailto:[email protected]). |
|
|