Home Models Compare Local Models Pricing Scorecards Evals OpenClaw Methodology

Run Frontier Models on Your Own Hardware

Open-weights models now match proprietary performance at 90-97% of frontier benchmarks. Here's how to run them locally.

Quick Start: 5 Minutes to Local AI

Ollama (Easiest)

One-command install, pull models, start chatting.

# Install (macOS/Linux) curl -fsSL https://ollama.com/install.sh | sh # Run a frontier model ollama run llama4:8b # Or a coding specialist ollama run qwen3-coder:7b

LM Studio (GUI)

Desktop app with model browser and chat UI.

# Download from lmstudio.ai # Then search & download models: # - llama-4-8b # - qwen3-coder-7b # - deepseek-v3

OpenClaw ⚡

Run local models with a full AI agent interface. Chat, automate, and integrate with your tools.

# Install OpenClaw npm install -g openclaw # Configure with local model openclaw config set model local # Start chatting openclaw chat Learn more about OpenClaw →

Open Weights Model Comparison

Top open-weights models ranked by performance

Model Params Context Best For Min GPU Score
GPT-OSS-120BNew 120B 128K General, Coding, Reasoning 80GB 9.2
Llama 4 405B 405B 128K Customization, Research 8×80GB 9.1
GLM-5 744B MoE 200K Bilingual, Agents 80GB 9.2
Qwen3-Coder-Next 80B 128K Coding 48GB 9.3
DeepSeek V3.2 37B 128K Budget, High-volume 24GB 8.9
Llama 4 8B 8B 128K Consumer GPUs 8GB 8.5

Scores are weighted averages of coding, reasoning, and tool-use benchmarks.

Local LLM Tools Compared

Ollama

  • + Easiest setup
  • + Large model library
  • + Cross-platform
  • + REST API built-in
  • - Less control over inference
  • - No GPU selection

Best for: Getting started fast

LM Studio

  • + Beautiful GUI
  • + Model browser
  • + GPU selection
  • + OpenAI-compatible server
  • - Desktop only
  • - No CLI

Best for: Non-developers, experimentation

vLLM

  • + Highest throughput
  • + Production-ready
  • + Paged attention
  • + Multi-GPU support
  • - Complex setup
  • - Requires Python

Best for: Production deployments

llama.cpp

  • + Runs anywhere
  • + Apple Silicon optimized
  • + Minimal dependencies
  • + Quantization support
  • - Manual model management
  • - CLI-focused

Best for: Maximum compatibility

Hardware Requirements by Model Size

8B Models

8GB VRAM

  • RTX 4070 / 3080
  • Apple M1/M2 16GB
  • RTX 3060 12GB

Llama 4 8B, Qwen3 7B

70B Models

48GB VRAM

  • RTX 6000 Ada
  • 2× RTX 4090
  • RTX A6000

Llama 4 70B, Qwen3 72B

120B+ Models

80GB VRAM

  • H100 / H200
  • MI300X
  • RTX 6000 Ada (quantized)

GPT-OSS-120B, GLM-5

400B+ Models

Multi-GPU

  • 8× H100
  • 4× H200
  • Cloud inference

Llama 4 405B (full)

💡 Pro tip: Use quantized models (4-bit, 8-bit) to run larger models on consumer hardware. A 70B model fits in 24GB at 4-bit quantization with minimal quality loss.

Open Weights vs Proprietary: The Gap is Gone

90% LiveCodeBench score vs frontier
97% AIME 2025 math benchmark
$0 Per-token cost (self-hosted)

Source: WhatLLM.org February 2026 benchmarks. Open-weights models now routinely match or exceed proprietary models on coding and reasoning tasks.

See Full Comparison →