

If you are using CPU only, you need to look at very small models or the 2-bit quants.
Everything will be extremely slow otherwise:
GPU:
Loaded Power: 465W
Speed: 18.5 tokens/second
CPU: Loaded Power: 115W
Speed: 1.60 tokens/second
GPUs are at least 3 times faster for the same power draw.
“Essayons”