LLM Memory & Speed Inference Calculator

This app shows how much memory a GGUF model needs and how fast it can run on your Mac.

GGUF Model

Device

Unified memory available:

16 GB

8 GB

128 GB

Task

Customize Model Parameters (Advanced)

Memory Allocation

0 GB of 16 GB

Inference Speed

Time to First Token (Latency)

--

Generation Speed

--

What is a LLM Memory Calculator?

A LLM memory calculator is a tool that estimates how much memory a large language model (LLM) needs and how fast it can generate text on your device, helping you choose the right hardware and model size.

Where do these numbers come from?

The code and formulas of this calculator are open source. Click here to check it out or contribute.

How it works?

Select a LLM model from the dropdown list, then select your Mac, the memory available, and your task. The tool reads the model metadata (size, layers, hidden dimensions, KV cache, etc.), estimates the amount of memory required and generation speed, and displays the results.

Limitations

Please note that this tool provides estimates and actual performance can vary depending on your hardware and configuration.