TheBloke/Airoboros-L2-70B-2.1-GPTQ · Cuda extension not installed. i tried several things

Hey there :)
I have a problem with the following error message and hope you can help me :)
"Cuda Extension not installed"
Actual systemconfig (python -m torch.utils.collect_env)
Sammeln von Umgebungsinformationen... PyTorch Version: 2.1.0.dev20230902+cu121 Ist Debug-Build: Falsch CUDA verwendet, um PyTorch zu erstellen: 12.1 ROCM verwendet, um PyTorch zu erstellen: N/A


Betriebssystem: Microsoft Windows 10 Pro
GCC-Version: Konnte nicht ermittelt werden
Clang-Version: Konnte nicht gesammelt werden
CMake-Version: Konnte nicht gesammelt werden
Libc-Version: N/A
Python-Version: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] (64-Bit-Laufzeit)
Python-Plattform: Windows-10-10.0.19045-SP0
Ist CUDA verfügbar: Wahr
CUDA-Laufzeitversion: Konnte nicht gesammelt werden
CUDA_MODULE_LOADING eingestellt auf: LAZY
GPU-Modelle und Konfiguration:
Grafikkarte 0: NVIDIA GeForce RTX 4090
Grafikkarte 1: NVIDIA GeForce RTX 4090

Nvidia-Treiber Version: 536.23 cuDNN-Version: Konnte nicht gesammelt werden HIP-Laufzeitversion: N/A MIOpen-Laufzeitversion: N/A Ist XNNPACK verfügbar: True

Variant 1:
Current Nvidia driver installed. The Nvidia Cuda Toolkit is not installed.
auto-gptq v0.4.2
torch 2.1.0.dev
torch.cuda.is_available() = true

The model Airoboros-L2 is loaded into the GPU, but the calculations are very slow and seem to run over the CPU (see TaskManager), also the error message always appears:
CUDA extension not installed. But i can load and use the model
I read the following post about this: https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ/discussions/5
So when I do pip install . from the autogptq version the following error appears:
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 4294967295 ╰─> [1 lines of output] Building cuda extension requires PyTorch(>=1.13.0) been installed, please install PyTorch first! [end of output]

That's why I tested version 2:
Current Nvidia Studio driver installed
Installed Nvidia Cuda Toolkit v11.8
(previously deleted torch with pip3 uninstall torch torchvision torchaudio)
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
When I load the model Airoboros-L2 now, the error message does not appear anymore. But I get the following error:
"Embedding dimension 768 does not match collection dimensionality 1024"
Using the model is not possible
:(

Is it correct to downgrade to Cuda 11.8?
A version for torch 2.1.0 compatible auto-gptq version I have not found

I hope I could describe my problem exactly enough and thank you very much for the help :)