metadata

license: apache-2.0

Step-Audio

Step-Audio是StepFun开源的Step-Audio智能语音交互框架，Step-Audio框架内集成了语音识别、语义理解、对话管理、语音克隆和语音生成。这种统一的架构实现了低端到端延迟，可用于全双工交互，使其适用于实时应用。

Step-Audio is an open-source intelligent voice interaction framework developed by StepFun. The Step-Audio framework integrates speech recognition, semantic understanding, dialogue management, voice cloning, and speech generation. This unified architecture achieves low end-to-end latency and can be used for full-duplex interaction, making it suitable for real-time applications.

Step-Audio-Chat

本仓库是Step-Audio中的多模态大型语言模型(LLM)部分。它是一个 1300 亿参数的多模态大型语言模型 (LLM)，它负责理解和生成人类语音。该模型经过专门设计，能够无缝集成语音识别、语义理解、对话管理、语音克隆和语音生成等功能。

This repository contains the Multimodal Large Language Model (LLM) component of Step-Audio. It is a 130 billion parameter multimodal LLM that is responsible for understanding and generating human speech. The model is specifically designed to seamlessly integrate functions such as speech recognition, semantic understanding, dialogue management, voice cloning, and speech generation.

更多信息请参考我们的仓库： Step-Audio.

For more information, please refer to our repository: Step-Audio.