Gemma4All logoGemma4All
Gemma 4OpenCodeMac MiniMac StudioLocal AI

How to Set Up Gemma 4 in OpenCode on Mac Mini or Mac Studio

A step-by-step guide to running Google's Gemma 4 as your local AI coding assistant in OpenCode on Mac Mini or Mac Studio. Covers model selection by memory, Ollama setup, OpenCode configuration, and optimization tips for daily use.

April 7, 20268 min read

If you've got a Mac Mini or Mac Studio sitting on your desk, you're looking at one of the most cost-effective setups for a local AI coding assistant. Apple Silicon's unified memory means the GPU and CPU share the same RAM — perfect for running large language models without a dedicated graphics card. And with OpenCode, you get a terminal-native AI coding agent that connects to your local model with zero API costs.

This guide walks you through the full setup: picking the right Gemma 4 model for your Mac's memory, installing the tools, configuring everything, and optimizing for daily use.

What You'll Need

  • A Mac Mini or Mac Studio with Apple Silicon (M1 or later)
  • macOS 14 Sonoma or newer
  • Terminal access (Terminal.app, iTerm2, or Warp all work)
  • About 20 minutes for the initial setup (plus model download time)

No cloud accounts or API keys required. Everything runs locally.

Picking the Right Gemma 4 Model

The model you choose depends entirely on how much unified memory your Mac has. Here's the breakdown based on Google's official memory requirements at Q4_0 (4-bit) quantization:

Your MacMemoryRecommended ModelBase Memory (Q4_0)
Mac Mini M4 (base)16 GBGemma 4 E4B5 GB
Mac Mini M4 (upgraded)24–32 GBGemma 4 26B A4B15.6 GB
Mac Mini M4 Pro24–48 GB26B A4B or 31B (at 48 GB)15.6–17.4 GB
Mac Studio M2 Max32–96 GB26B A4B or 31B15.6–17.4 GB
Mac Studio M2 Ultra64–192 GB31B (comfortable)17.4 GB

Since macOS and background apps typically use 3–5 GB, a 16 GB Mac Mini has roughly 11–13 GB available — plenty for the E4B model (5 GB base) and enough headroom for the KV cache during normal coding sessions.

If you have 24 GB or more, the 26B A4B MoE model is the sweet spot for OpenCode. It activates only 3.8 billion parameters per token (so it's fast), but has 26 billion total parameters (so it's smart). It runs nearly as fast as the E4B while producing noticeably better code.

For detailed memory planning, see our Hardware Requirements guide.

Step 1: Install Ollama and Pull Gemma 4

OpenCode doesn't run models directly — it connects to them through a local API. Ollama provides that API and handles all the model management.

Install Ollama

brew install --cask ollama

Or download the installer from ollama.com. After installation, launch Ollama — a llama icon appears in your menu bar.

Pull Your Gemma 4 Model

# For 16 GB Mac Mini — E4B (default)
ollama pull gemma4

# For 24+ GB — 26B MoE (best value for OpenCode)
ollama pull gemma4:26b

# For 48+ GB — full 31B dense
ollama pull gemma4:31b

Verify the download:

ollama list

You should see your model listed with its tag and size.

Set a Larger Context Window

By default, Ollama uses a modest context window. For coding with OpenCode, you'll want more room — a larger context lets the model see more of your codebase at once:

OLLAMA_CONTEXT_LENGTH=64000 ollama serve

If Ollama is already running from the menu bar, quit it first (click the llama icon > Quit Ollama), then start it from the terminal with the custom context length.

To make this permanent, add to your ~/.zshrc:

export OLLAMA_CONTEXT_LENGTH=64000

Step 2: Install OpenCode

OpenCode is a terminal-based AI coding agent — think of it as a local, private alternative to cloud-based coding assistants. Install it with the one-liner:

curl -fsSL https://opencode.ai/install | bash

Or via Homebrew:

brew install anomalyco/tap/opencode

After installation, verify it works:

opencode --version

OpenCode requires a terminal with true color and Unicode support. iTerm2, Alacritty, WezTerm, and the built-in macOS Terminal all work fine.

Step 3: Configure OpenCode to Use Gemma 4

This is the key step — telling OpenCode to use your local Gemma 4 model instead of a cloud API.

Create the OpenCode Config

Edit (or create) your global config file:

mkdir -p ~/.config/opencode

Create ~/.config/opencode/opencode.jsonc with the following content:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "gemma4:latest": {
          "name": "Gemma 4 E4B"
        }
      }
    }
  },
  "model": "ollama/gemma4:latest"
}

If you pulled a specific variant, adjust the model key accordingly:

// For 26B MoE
"models": {
  "gemma4:26b": {
    "name": "Gemma 4 26B MoE"
  }
}

// And set the default
"model": "ollama/gemma4:26b"

Add Authentication

OpenCode expects an auth entry even for local models. Create ~/.local/share/opencode/auth.json:

mkdir -p ~/.local/share/opencode
{
  "ollama": {
    "type": "api",
    "key": "ollama"
  }
}

The key value doesn't matter — Ollama doesn't require authentication — but OpenCode needs the entry to recognize the provider.

Verify the Setup

Launch OpenCode in any project directory:

cd ~/your-project
opencode

Once the TUI loads, type /models to see available models. You should see your Gemma 4 model listed under the Ollama provider. Select it and you're ready to go.

Step 4: Start Coding with Gemma 4

With everything connected, OpenCode works as a full coding agent. Here are a few things to try:

Ask it to explain code:

What does the function on line 42 of src/utils/parser.ts do?

Generate new code:

Write a TypeScript function that debounces API calls with a configurable delay.

Fix bugs:

The tests in tests/auth.test.ts are failing. Can you look at the error and fix the issue?

Refactor:

Refactor the database module to use connection pooling instead of individual connections.

OpenCode has access to your file system (with your permission), so it can read your project files, understand the context, make edits, and run commands — all powered by Gemma 4 running on your Mac.

Optimizing for Daily Use

A few tweaks that make a noticeable difference in the day-to-day experience:

Keep the Model Loaded

By default, Ollama unloads models after 5 minutes of inactivity. Since loading a model takes several seconds, this creates a noticeable pause when you come back to OpenCode after a break. To keep the model in memory indefinitely:

export OLLAMA_KEEP_ALIVE="-1"

Add this to your ~/.zshrc alongside the context length setting:

# Gemma 4 + OpenCode optimization
export OLLAMA_CONTEXT_LENGTH=64000
export OLLAMA_KEEP_ALIVE="-1"

Launch Ollama at Login

On Mac Mini or Mac Studio, you probably want Ollama running all the time. Open System Settings > General > Login Items and add Ollama to the list. This way, your AI assistant is always ready when you open a terminal.

Configure a Project-Specific Model

If you work on multiple projects, you can set different models per project. Drop an opencode.jsonc in the project root:

{
  "$schema": "https://opencode.ai/config.json",
  "model": "ollama/gemma4:26b"
}

This overrides your global config for that specific project, so you can use a bigger model for complex projects and a faster one for quick scripts.

Mac Mini as a Dedicated AI Server

One of the best things about the Mac Mini is its small footprint and low power draw. If you have a spare Mac Mini, consider running it as a dedicated AI server:

1. Enable Remote Login — Go to System Settings > General > Sharing > Remote Login. This lets you SSH in from your laptop.

2. Run Ollama headlessly — Start Ollama via ollama serve in a background session (or via launchd). No monitor needed.

3. Access from other devices — Ollama's API listens on localhost:11434 by default. To expose it on your local network:

OLLAMA_HOST=0.0.0.0 ollama serve

Then point OpenCode (or any other tool) on your laptop to the Mac Mini's IP:

{
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://192.168.1.100:11434/v1"
      },
      "models": {
        "gemma4:26b": {
          "name": "Gemma 4 26B (Mac Mini)"
        }
      }
    }
  },
  "model": "ollama/gemma4:26b"
}

Now your laptop offloads all AI work to the Mac Mini — your laptop stays cool and quiet while the Mac Mini handles inference. This is especially useful if your laptop has limited memory but your Mac Mini is well-equipped.

What's Next?

You've got a fully local AI coding assistant running on your Mac. No API bills, no data leaving your network, and no latency from round-tripping to a cloud server.

A few directions to explore from here:

  • Try different models — switch between the E4B and 26B to compare speed vs. quality for your specific tasks
  • Set up project-specific agents — OpenCode supports custom agents with tailored system prompts for code review, testing, or documentation
  • Check performance benchmarks — see how Gemma 4 stacks up in our Benchmarks breakdown
  • Compare with Ollama directly — if you prefer a simpler setup, see our Run Gemma 4 with Ollama guide

The Mac Mini might be Apple's smallest desktop, but with Gemma 4 and OpenCode, it's a surprisingly capable AI workstation.