Spaces:

JeffYang52415
/

LLMEval-Dataset-Parser

Sleeping

App Files Files Community

LLMEval-Dataset-Parser / README.md

JeffYang52415

feat: first commit

8cf2761 unverified 4 months ago

preview code

raw

history blame

1.72 kB


	# LLMDataParser

	LLMDataParser is a Python library that provides parsers for benchmark datasets used in evaluating Large Language Models (LLMs). It offers a unified interface for loading and parsing datasets like MMLU and GSM8k, simplifying dataset preparation for LLM evaluation.

	## Features

	- Unified Interface: Consistent `DatasetParser` for all datasets.
	- LLM-Agnostic: Independent of any specific language model.
	- Easy to Use: Simple methods and built-in Python types.
	- Extensible: Easily add support for new datasets.

	## Installation

	### Option 1: Using pip

	You can install the package directly using `pip`. Even with only a `pyproject.toml` file, this method works for standard installations.

	1. Clone the Repository:

	```bash
	git clone https://github.com/jeff52415/LLMDataParser.git
	cd LLMDataParser
	```

	2. Install Dependencies with pip:

	```bash
	pip install .
	```

	### Option 2: Using Poetry

	Poetry manages the virtual environment and dependencies automatically, so you don't need to create a conda environment first.

	1. Install Dependencies with Poetry:

	```bash
	poetry install
	```

	2. Activate the Virtual Environment:

	```bash
	poetry shell
	```


	## Available Parsers

	- MMLUParser: Parses the MMLU dataset.
	- GSM8kParser: Parses the GSM8k dataset.

	## Contributing

	Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

	## License

	This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

	## Contact

	For questions or support, please open an issue on GitHub or contact [[email protected]](mailto:[email protected]).