You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Stage 2 Model

ScrapeGoatMusic Generation API

A music generation system powered by ScrapeGoatMusic, optimized for NVIDIA H100 GPUs with FastAPI integration.

System Requirements

  • NVIDIA H100 GPU
  • CUDA 12.0 or higher
  • Python 3.8
  • 32GB+ RAM
  • Ubuntu 22.04 LTS or higher

Installation

  1. Create and activate a conda environment:
conda create -n ScrapeGoatMusic python=3.8
conda activate ScrapeGoatMusic
  1. Install PyTorch with CUDA support:
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
  1. Install dependencies:
pip install descript-audio-codec
pip install npy_append_array soundfile
pip install fastapi uvicorn python-multipart
pip install flash-attn --no-build-isolation
  1. Clone and install RepCodec:
cd inference/xcodec_mini_infer
git clone https://github.com/mct10/RepCodec.git
cd RepCodec
pip install .
  1. Download required model files:
# Download models from Hugging Face
git lfs install
cd inference
git clone https://huggingface.co/Nathan9/xcodec_mini_infer

API Setup

  1. Create a new file api.py:
from fastapi import FastAPI, UploadFile, File, Form
from fastapi.responses import FileResponse
import uvicorn
import torch
import os
import argparse
from pathlib import Path
import uuid
from typing import Optional

app = FastAPI(title="ScrapeGoatMusic Generation API")

# Initialize models and configurations
def init_models():
    parser = argparse.ArgumentParser()
    # Add all your existing arguments here
    args = parser.parse_args([])
    args.stage1_model = "Nathan9/ScrapeGoatMusic-s1-7B-anneal-en-cot"
    args.stage2_model = "Nathan9/ScrapeGoatMusic-s2-1B-general"
    args.max_new_tokens = 3000
    args.run_n_segments = 2
    args.stage2_batch_size = 4
    args.output_dir = "./output"
    args.cuda_idx = 0
    # Add other default arguments
    return args

@app.on_event("startup")
async def startup_event():
    global args
    args = init_models()
    os.makedirs(args.output_dir, exist_ok=True)

@app.post("/generate")
async def generate_music(
    genre_file: UploadFile = File(...),
    lyrics_file: UploadFile = File(...),
    audio_prompt: Optional[UploadFile] = File(None),
    prompt_start_time: float = Form(0.0),
    prompt_end_time: float = Form(30.0)
):
    # Create unique session ID
    session_id = str(uuid.uuid4())
    session_dir = Path(args.output_dir) / session_id
    os.makedirs(session_dir, exist_ok=True)

    # Save uploaded files
    genre_path = session_dir / "genre.txt"
    lyrics_path = session_dir / "lyrics.txt"
    
    with open(genre_path, "wb") as f:
        f.write(await genre_file.read())
    with open(lyrics_path, "wb") as f:
        f.write(await lyrics_file.read())

    # Handle optional audio prompt
    audio_prompt_path = None
    if audio_prompt:
        audio_prompt_path = session_dir / "audio_prompt.wav"
        with open(audio_prompt_path, "wb") as f:
            f.write(await audio_prompt.read())

    # Run inference
    try:
        # Import your inference code here
        from infer import run_inference
        output_path = run_inference(
            args,
            str(genre_path),
            str(lyrics_path),
            str(audio_prompt_path) if audio_prompt_path else None,
            prompt_start_time,
            prompt_end_time
        )
        
        return FileResponse(
            output_path,
            media_type="audio/mpeg",
            filename=f"generated_music_{session_id}.mp3"
        )
    except Exception as e:
        return {"error": str(e)}

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)
  1. Create a new file infer.py with your existing inference code, modified to be imported as a module.

Running the API

  1. Start the API server:
python api.py
  1. The API will be available at http://localhost:8000

API Endpoints

POST /generate

Generates music based on provided genre and lyrics.

Parameters:

  • genre_file: Text file containing genre tags (Required)
  • lyrics_file: Text file containing lyrics (Required)
  • audio_prompt: Audio file for prompt (Optional)
  • prompt_start_time: Start time for audio prompt (Default: 0.0)
  • prompt_end_time: End time for audio prompt (Default: 30.0)

Example using curl:

curl -X POST "http://localhost:8000/generate" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "genre_file=@/path/to/genre.txt" \
  -F "lyrics_file=@/path/to/lyrics.txt" \
  -F "prompt_start_time=0.0" \
  -F "prompt_end_time=30.0"

Example genre.txt format:

instrumental pop energetic female vocals

Example lyrics.txt format:

[verse]
Your lyrics here
[chorus]
Your chorus here

H100 Optimization

  1. Enable Flash Attention:
model = AutoModelForCausalLM.from_pretrained(
    stage1_model,
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2"
)
  1. Optimize memory usage:
# Add to your inference configuration
torch.cuda.set_device(0)  # Use first H100
torch.backends.cudnn.benchmark = True
  1. For multi-GPU setup, modify cuda_idx in the API configuration.

Monitoring

The API includes Swagger documentation at http://localhost:8000/docs for testing and monitoring endpoints.

Troubleshooting

  1. CUDA Out of Memory:
  • Reduce stage2_batch_size
  • Adjust max_new_tokens
  • Use gradient checkpointing
  1. Audio Quality Issues:
  • Check input audio format (16kHz, mono)
  • Verify genre tags format
  • Ensure lyrics follow the correct structure

Training

This model was created through a multi-stage training process optimized for music generation. You can further fine-tune the model on your own data using the following steps:

Data Preparation

  1. Prepare your training data using the provided script:
python prepare_training_data.py

The script expects the following directory structure:

training_data/
β”œβ”€β”€ audio_tracks/      # 16kHz mono WAV files
β”œβ”€β”€ lyrics/           # Corresponding lyrics files
└── genres/          # Genre tag files

Training Requirements

  • NVIDIA H100 GPU (recommended)
  • 32GB+ GPU memory
  • Training dataset with:
    • High-quality audio files (16kHz mono)
    • Aligned lyrics in structured format
    • Genre annotations
    • At least 10,000 samples recommended

Fine-tuning Steps

  1. Install additional training dependencies:
pip install accelerate datasets transformers
  1. Prepare your configuration:
# For Stage 1 model (7B)
export MODEL_PATH="Nathan9/ScrapeGoatMusic-s1-7B-anneal-en-cot"
export OUTPUT_DIR="./fine_tuned_model_s1"

# For Stage 2 model (1B)
export MODEL_PATH="Nathan9/ScrapeGoatMusic-s2-1B-general"
export OUTPUT_DIR="./fine_tuned_model_s2"
  1. Start training:
python train.py \
    --model_name_or_path $MODEL_PATH \
    --output_dir $OUTPUT_DIR \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --learning_rate 1e-5 \
    --warmup_steps 500 \
    --logging_steps 100 \
    --save_steps 1000 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --gradient_checkpointing true

Training Tips

  1. Stage 1 Model:
  • Use larger batch sizes (8-16) for better convergence
  • Enable gradient checkpointing for memory efficiency
  • Start with a lower learning rate (1e-5)
  • Train for at least 3 epochs
  1. Stage 2 Model:
  • Use smaller batch sizes (4-8)
  • Higher learning rate possible (2e-5)
  • Shorter training time needed
  • Focus on audio quality metrics
  1. Monitoring:
  • Use Weights & Biases for training visualization
  • Monitor loss curves for convergence
  • Validate generation quality periodically
  • Check for overfit on validation set
  1. Performance Optimization:
  • Enable Flash Attention during training
  • Use mixed precision training (bf16)
  • Distribute training across multiple GPUs if available
  • Implement proper gradient clipping

License

FULL ACCESS, ENJOY

Downloads last month
1
Safetensors
Model size
1.96B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.