How to Run Local LLMs with llama.cpp-nathanpenny

How to Run Local LLMs with llama.cpp-nathanpenny How to Run Local LLMs with llama.cppby nathanpennyStep 1 – Overview of the Author’s Laptop ConfigurationFirst, let’s outline the specifications of my MacBook Air: it is equipped with an M5 chip, 16GB of RAM, and 1TB of storage. This setup is sufficient to run small-scale models (e.g., 3GB in size). With that context, let’s proceed to the implementation steps.Step 2 – Preparatory WorkImportant Note: Terminal and command-line tools are required for this tutorial. Ensure these tools are installed on your system. If you are a complete beginner and unsure how to proceed, this section provides a step-by-step guide (including installing all necessary tools).1. Install Xcode Command Line Tools (Xcode Must Be Pre-installed)Open the Terminal app(alternatively, you can use Cursor or VS Code, which include integrated Terminal interfaces) andexecute the following command:xcode-select--installFollow the system prompts to complete the installation. Verify that the installation directory is added to your system’sPATHenvironment variable (this is typically done automatically).To confirm successful installation:Run the command below:clang--version# g --version yields the same resultThe output should resemble the following:Apple clang version17.0.0(clang-1700.6.4.2)Target: arm64-apple-darwin25.3.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin2. Install HomebrewWhat is Homebrew?Homebrew is a free, open-source package manager for macOS and Linux. It enables easy installation, update, and uninstallation of software via the command line.Stay in the Terminal and run this command:/bin/bash-c$(curl-fsSLhttps://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)Verify the installation by running:brew--versionExpected output:Homebrew5.1.03. Install a Stable Version of PythonmacOS comes with a pre-installed version of Python. To check the default version, open Terminal and run:Input:python3--versionOutput:python3.9.6This default version is relatively outdated. For better compatibility and performance, we recommend installing Python 3.10:Run the following command in Terminal:brewinstallpython3.10# Adjust the version number to install a different releaseConfigure an alias for easier access to Python 3.10:Open the.zshrcfile with the Vim editor:vim~/.zshrcEnter insert mode (pressi) and add the following line (replace the path with your actual Python 3.10 installation path):aliaspython310/opt/homebrew/bin/python3.10Save changes and exit Vim: pressesc, then type:wqand hitenter.Apply the changes to the current Terminal session:source~/.zshrcVerify the alias works:python310--versionThe output should be:python3.10.20# Or another patch version (e.g., 3.10.x)You can now use thepython310command to execute Python scripts with version 3.10.4. Install CMakeWhat is CMake?CMake is an open-source tool that facilitates cross-platform software building (Windows, macOS, Linux). It does not compile code directly; instead, it generates build files (e.g., Makefiles or IDE project files) that instruct the system on how to compile and link code.Run the following command in Terminal to install CMake:brewinstallcmakeVerify the installation:cmake--versionExpected output:cmake version4.2.3 CMake suite maintained and supported by Kitware(kitware.com/cmake).Step 3 – Obtain and Set Up llama.cpp Locally1. Clone the llama.cpp GitHub RepositoryThe official repository URL is Llama.cpp. Cloning the repository via Git is the most efficient method:Run the following commands sequentially in Terminal:mkdir~/Projects# Create a directory to store the repository (you may choose a different path)cd~/Projectsgitclone https://github.com/crc-org/llama.cpp.git# Use the official repository URLcdllama.cpp# The full path should be ~/Projects/llama.cpp (llama.cpp is a directory)2. Compile llama.cppOfficial build instructions are available inllama.cpp/docs/build.md. Below are key excerpts tailored for this tutorial:CPU Build# Execute these commands in the llama.cpp directorycmake-Bbuild cmake--buildbuild--configReleaseNotes:For faster compilation, add the-jflag to enable parallel job execution (e.g.,-j 8for 8 parallel jobs), or use an auto-parallelizing generator like Ninja:cmake-Bbuild cmake--buildbuild--configRelease-j8For static builds (all libraries compiled into the final executable, with no external dependencies), add-DBUILD_SHARED_LIBSOFF:cmake-Bbuild-DBUILD_SHARED_LIBSOFF cmake--buildbuild--configReleaseStatic build explanation: A static build embeds all required libraries directly into the final executable during compilation, making the executable self-contained and independent of external library files.Metal Build (macOS Only)Metal support is enabled by default on macOS, which offloads computation to the GPU. To disable Metal during compilation, use the-DGGML_METALOFFCMake flag:cmake-Bbuild-DGGML_METALOFF cmake--buildbuild--configReleaseIf Metal support is enabled, you can explicitly disable GPU inference at runtime with the--n-gpu-layers 0command-line argument.3. Create a Python Virtual EnvironmentWhat is venv?venv(short forvirtual environment) is a built-in Python tool that creates isolated Python environments. Its core functions are:Creating a dedicated folder for each project’s Python interpreter and dependencies.Ensuring projects use only their locally installed packages (eliminating cross-project dependency conflicts).Run the following commands in thellama.cppdirectory:python-mvenv .venv# Create a virtual environment named .venvsource.venv/bin/activate# Activate the virtual environmentVerify the activation:whichpython# Expected output like: /Users/nathanpenny/Projects/llama.cpp/.venv/bin/pythonYou will also see a visible change in the Terminal prompt (indicating the virtual environment is active):(.venv)nathanpennyniepans-MacBook-Air ~ %# The (.venv) prefix confirms activationInstall the required Python packages:pipinstall-rrequirements.txtStep 4 – Obtain an Open-Source LLM Modelllama.cpp supports the.ggufmodel format. Ensure you download a model in this format, or convert existing models to.ggufif needed.We use Qwen as an example:Download from Hugging Face: Visit unsloth/Qwen3.5-4B-GGUF to download the model. Select a variant that matches your hardware capabilities to ensure smooth operation.Move the Model to the llama.cpp Directory:cd~/Projects/llama.cpp# Navigate to the llama.cpp directorymkdircustom-models# Create a folder to store custom modelscd~/Downloads# Assume the .gguf file is in the Downloads directorymv[your-model-filename].gguf ~/Projects/llama.cpp/custom-models# Replace [your-model-filename] with the actual file nameAlternative: You can also move the.gguffile to thecustom-modelsfolder via the macOS graphical interface (GUI) for simplicity.Recommended Alternative Models:DeepSeek, Llama, GLM, Gemma, etc.All recommended models are available on Hugging Face in.ggufformat with weights compatible with llama.cpp.Step 5 – Interact with the Local ModelRun the following command to start the model in interactive mode:build/bin/llama-cli-mcustom-models/[your-model-filename].ggufThe model will now wait for your input. Ensure your system has sufficient RAM to run the model smoothly.Type your questions after the[prompt and wait for the model’s responses.Additional Step – Run llama.cpp as a Web ServerFor a graphical interface (instead of the command line), run the model as a local web server:build/bin/llama-server-mcustom-models/[your-model-filename].gguf--port8080You will see a local URL (e.g.,http://127.0.0.1:8080/) in the Terminal. Open this URL in a web browser to access a user-friendly GUI for interacting with the model.