Meta LLaMA 7b on Win 11 RTX3070

I’ve been playing with Stable Diffusion locally for a while and meant to make a post about getting torch-directml setup to work on WSL with CUDA, but haven’t got around to it. This is basically the same process with even easier instructions to follow.

Here’s a Reddit post documenting the process: [D] Tutorial: Run LLaMA on 8gb vram on windows (thanks to bitsandbytes 8bit quantization)

This is a more complete guide: How to run LLaMA 4bit models on Windows
It includes instructions for WSL and running natively.

I’m not sure but I think the torch-directml library doesn’t get installed correctly. Follow these instructions to make sure you can import torch_directml: Enable PyTorch with DirectML on Windows

Insane to see the progress here. I was only able to run the 7b parameter model on my 3070, but hopefully, the 13b model will eventually shrink to fit in my 12Gb VRAM.

In my opinion, the 7b 4bit quantized model isn’t as good as the GPT-2 model which you can allow get running locally. The conda setup process is really pretty similar.

Leave a Comment