How to Install Gemma 4 Locally Using Ollama (Mac & Windows Guide)

A complete step-by-step guide to running Google's Gemma 4 on your own machine using Ollama. Works on Mac (Apple Silicon & Intel) and Windows. From installation to first response in under 10 minutes.

This is the guide I wish existed when I first tried running a local model. By the end of this page, you'll have Gemma 4 running on your own machine, responding to prompts — zero cloud, zero API keys.

Time to complete: ~10 minutes (+ model download time)
Difficulty: Beginner
Tested on: macOS 14 Sonoma (M2), Windows 11 (RTX 3080)

What is Ollama?

Ollama is a free, open-source tool that makes running large language models locally as simple as docker run. It handles:

Model downloads and version management
GPU/CPU detection and optimization
A local REST API (OpenAI-compatible)
Background service management

Think of it as "Docker, but for LLMs."

Step 1: Check Your Hardware

Before downloading anything, verify your machine meets the minimums:

# macOS — check available memory
sysctl hw.memsize | awk '{printf "RAM: %.0f GB\n", $2/1073741824}'

# Windows PowerShell
(Get-CimInstance Win32_ComputerSystem).TotalPhysicalMemory / 1GB

Minimum for Gemma 4 4B: 4 GB free RAM, ~4 GB free disk space.

Not sure which variant to pick? Read our Hardware Requirements guide first.

Step 2: Install Ollama

On macOS

Option A — Download the app (easiest):

Go to ollama.com and click Download for Mac
Open the .dmg file and drag Ollama to your Applications folder
Launch Ollama from your Applications — a llama icon appears in your menu bar

Option B — Homebrew:

brew install ollama

Then start the service:

ollama serve

On Windows

Go to ollama.com and click Download for Windows
Run the installer — it's a standard .exe wizard
Ollama starts automatically and appears in the system tray

NVIDIA GPU users: Make sure you have the latest NVIDIA drivers installed. Ollama will auto-detect your GPU.

On Linux

curl -fsSL https://ollama.com/install.sh | sh

This script detects your OS, installs Ollama, and registers it as a systemd service.

Step 3: Verify Ollama is Running

Open a terminal and run:

ollama --version

You should see something like:

ollama version 0.6.4

If you get command not found, make sure Ollama is running (look for the icon in your menu bar / system tray) and try again.

Step 4: Pull the Gemma 4 Model

Now for the exciting part. Pick your variant:

# Gemma 4 4B — recommended for most users (fastest, ~3.5 GB download)
ollama pull gemma3:4b

# Gemma 4 12B — more capable, needs 16 GB RAM (~8 GB download)
ollama pull gemma3:12b

# Gemma 4 27B — workstation grade (~18 GB download)
ollama pull gemma3:27b

Note: Google's Gemma 4 is listed as gemma3 in Ollama's library (it's the 4th-gen Gemma architecture). The numbers (4b, 12b, 27b) refer to parameter count.

You'll see a progress bar:

pulling manifest...
pulling 8eeb52dfb3bb... 100% ▕████████████████████▏ 3.5 GB
pulling 56bb8bd477a5... 100% ▕████████████████████▏  96 B
verifying sha256 digest
writing manifest
success

Download time depends on your internet speed — a 3.5 GB model takes about 3–7 minutes on a typical broadband connection.

Step 5: Run Your First Prompt

Once the download completes, run:

ollama run gemma3:4b

You'll enter an interactive chat session:

>>> Send a message (/? for help)

Try asking it something:

>>> Explain what a neural network is in 2 sentences, like I'm 12.

Gemma 4 will respond directly in your terminal. Hit Ctrl+D or type /bye to exit.

Step 6: Test the Vision Capability

Gemma 4 is multimodal — you can pass it images directly from the CLI:

# Pass a local image
ollama run gemma3:4b "Describe what you see in this image" /path/to/your/image.jpg

# Or from a URL
ollama run gemma3:4b "What's in this image?" https://example.com/photo.jpg

This works entirely locally — the image never leaves your machine.

Step 7: Use the REST API (Optional)

Ollama automatically starts a local server at http://localhost:11434. This is OpenAI API-compatible, so you can use it in any app that talks to ChatGPT.

# Test the API endpoint
curl http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma3:4b",
    "prompt": "Why is the sky blue?",
    "stream": false
  }'

Or use the OpenAI Python SDK with a one-line change:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",  # Point to local Ollama
    api_key="ollama",                       # Any string works
)

response = client.chat.completions.create(
    model="gemma3:4b",
    messages=[{"role": "user", "content": "Summarize the theory of relativity."}],
)

print(response.choices[0].message.content)

No OpenAI account needed. No billing. 100% local.

Useful Ollama Commands

Here's a cheat sheet for the commands you'll use most:

# List all downloaded models
ollama list

# Pull a new model
ollama pull <model-name>

# Remove a model (free up disk space)
ollama rm gemma3:27b

# See currently running models
ollama ps

# Get model details
ollama show gemma3:4b

# Run a one-off prompt (non-interactive)
ollama run gemma3:4b "What is 17 * 43?"

Troubleshooting Common Issues

"Error: model requires more system memory"

You don't have enough free RAM. Either:

Close other applications to free RAM
Use a smaller variant (e.g., switch from 12b to 4b)
Upgrade to a model with more RAM

Ollama is very slow (< 3 tokens/second)

Ollama is probably running on CPU instead of GPU. Check:

# macOS — is Metal GPU being used?
ollama run gemma3:4b ""
# Look for "GPU layers: X" in the output

For NVIDIA GPUs, make sure drivers are up to date:

nvidia-smi  # Should show your GPU

"connection refused" when calling the API

The Ollama service isn't running. Start it manually:

# macOS / Linux
ollama serve

# Windows — relaunch from the system tray

Model download fails mid-way

Ollama supports resumable downloads. Just run ollama pull again — it will resume from where it left off.

Monitor Resource Usage

Want to see how much your GPU/CPU is sweating?

# macOS — Activity Monitor or:
sudo powermetrics --samplers gpu_power -n 1

# Windows — open Task Manager > Performance > GPU

# Linux
watch -n 1 nvidia-smi

What's Next?

You've got Gemma 4 running locally. Here's what to explore next:

Try the 12B model if you have 16 GB RAM — the quality jump is noticeable
Connect it to a frontend — tools like Open WebUI give you a ChatGPT-style UI for free
Use it in your code — the OpenAI-compatible API means you can drop it into any existing project
Keep an eye on our Roadmap — we're publishing guides on model selection benchmarks and building your first AI app next

If this guide helped you, share it with someone who's been putting off running local AI. It's easier than it looks.