Run Frontier Models on Your Own Hardware
Open-weights models now match proprietary performance at 90-97% of frontier benchmarks. Here's how to run them locally.
Quick Start: 5 Minutes to Local AI
Ollama (Easiest)
One-command install, pull models, start chatting.
# Install (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Run a frontier model
ollama run llama4:8b
# Or a coding specialist
ollama run qwen3-coder:7b
LM Studio (GUI)
Desktop app with model browser and chat UI.
# Download from lmstudio.ai
# Then search & download models:
# - llama-4-8b
# - qwen3-coder-7b
# - deepseek-v3
OpenClaw ⚡
Run local models with a full AI agent interface. Chat, automate, and integrate with your tools.
# Install OpenClaw
npm install -g openclaw
# Configure with local model
openclaw config set model local
# Start chatting
openclaw chat
Learn more about OpenClaw → Open Weights Model Comparison
Top open-weights models ranked by performance
| Model | Params | Context | Best For | Min GPU | Score |
|---|---|---|---|---|---|
| GPT-OSS-120BNew | 120B | 128K | General, Coding, Reasoning | 80GB | 9.2 |
| Llama 4 405B | 405B | 128K | Customization, Research | 8×80GB | 9.1 |
| GLM-5 | 744B MoE | 200K | Bilingual, Agents | 80GB | 9.2 |
| Qwen3-Coder-Next | 80B | 128K | Coding | 48GB | 9.3 |
| DeepSeek V3.2 | 37B | 128K | Budget, High-volume | 24GB | 8.9 |
| Llama 4 8B | 8B | 128K | Consumer GPUs | 8GB | 8.5 |
Scores are weighted averages of coding, reasoning, and tool-use benchmarks.
Local LLM Tools Compared
Ollama
- + Easiest setup
- + Large model library
- + Cross-platform
- + REST API built-in
- - Less control over inference
- - No GPU selection
Best for: Getting started fast
LM Studio
- + Beautiful GUI
- + Model browser
- + GPU selection
- + OpenAI-compatible server
- - Desktop only
- - No CLI
Best for: Non-developers, experimentation
vLLM
- + Highest throughput
- + Production-ready
- + Paged attention
- + Multi-GPU support
- - Complex setup
- - Requires Python
Best for: Production deployments
llama.cpp
- + Runs anywhere
- + Apple Silicon optimized
- + Minimal dependencies
- + Quantization support
- - Manual model management
- - CLI-focused
Best for: Maximum compatibility
Hardware Requirements by Model Size
8B Models
8GB VRAM
- RTX 4070 / 3080
- Apple M1/M2 16GB
- RTX 3060 12GB
Llama 4 8B, Qwen3 7B
70B Models
48GB VRAM
- RTX 6000 Ada
- 2× RTX 4090
- RTX A6000
Llama 4 70B, Qwen3 72B
120B+ Models
80GB VRAM
- H100 / H200
- MI300X
- RTX 6000 Ada (quantized)
GPT-OSS-120B, GLM-5
400B+ Models
Multi-GPU
- 8× H100
- 4× H200
- Cloud inference
Llama 4 405B (full)
Open Weights vs Proprietary: The Gap is Gone
Source: WhatLLM.org February 2026 benchmarks. Open-weights models now routinely match or exceed proprietary models on coding and reasoning tasks.
See Full Comparison →