Run Gemma 4 with LM Studio: The Visual Way to Local AI

A step-by-step guide to running Google's Gemma 4 locally using LM Studio — the desktop app with a built-in chat UI, model browser, and OpenAI-compatible API. No command line required.

Not everyone wants to type commands into a terminal to run a local AI model. If you'd rather have a proper desktop app with a search bar, a chat window, and a one-click download button, LM Studio is what you're looking for.

This guide walks you through setting up LM Studio and getting Gemma 4 running on your machine — from installation to your first conversation, plus how to use the built-in API server for development.

What Is LM Studio?

LM Studio is a free desktop application for running large language models locally on macOS, Windows, and Linux. Think of it as a complete local AI workstation with a graphical interface.

What sets it apart from CLI tools like Ollama:

Built-in model browser — search, discover, and download models without leaving the app
Chat interface — a clean UI for conversations, similar to ChatGPT
Hardware-aware loading — LM Studio detects your GPU and RAM, then suggests the best quantization and settings automatically
OpenAI-compatible API — serves models as a local API that works with any tool expecting the OpenAI format
Supports GGUF and MLX formats — GGUF for all platforms, MLX for optimized Apple Silicon performance

LM Studio supports Gemma models in both formats, so whether you're on a MacBook, a Windows desktop, or a Linux workstation, it has you covered.

Step 1: Install LM Studio

Download the installer for your OS from lmstudio.ai/download:

macOS: Download the .dmg, drag to Applications, and launch
Windows: Run the .exe installer
Linux: Follow the instructions on the download page

Once installed, open LM Studio. You'll see the main interface with a model search bar, a chat panel, and a sidebar for settings.

Step 2: Download Gemma 4

There are two ways to get Gemma 4 into LM Studio.

Option A: In-App Model Browser (Easiest)

Press ⌘ + Shift + M (Mac) or Ctrl + Shift + M (Windows/Linux) to open the model search
Type "Gemma 4" in the search bar
Browse the results — LM Studio will highlight which variants are compatible with your hardware
Pick a variant and click Download

LM Studio automatically suggests the best quantization level for your machine. If you have 16 GB of RAM, it'll steer you toward a quantized E4B. If you have 32 GB+, it may suggest the 26B MoE.

Option B: From Hugging Face

If you prefer to browse models on Hugging Face first:

Enable LM Studio in your Hugging Face Local Apps Settings
Go to any Gemma 4 model page on Hugging Face
Click the "Use this model" dropdown and select LM Studio
The model downloads and appears in LM Studio automatically

This is handy if you've found a specific community-quantized version or a fine-tuned variant you want to try.

Which Model to Download?

If you're not sure, here's a quick reference:

Your RAM	Recommended Model	Why
8 GB	Gemma 4 E4B (Q4)	Fits with room for the OS
16 GB	Gemma 4 E4B (Q8) or 26B A4B (Q4)	Better quality, or step up to MoE
24–32 GB	Gemma 4 26B A4B (Q4–Q8)	Best balance of quality and speed
48 GB+	Gemma 4 31B	Full dense model, maximum quality

Step 3: Start Chatting

Once the download finishes:

Click on the model name in your model list to load it
A progress bar shows the model loading into memory
When it's ready, the chat window activates — type a message and hit Enter

That's it. No configuration files, no terminal commands, no environment variables. You're chatting with Gemma 4 locally.

A few things to try in your first session:

"Explain how a car engine works, like I'm 10 years old." — Tests general knowledge and tone control
"Write a Python function that finds duplicate files in a directory." — Tests code generation
"What are the pros and cons of TypeScript vs. JavaScript?" — Tests structured reasoning

Working with Images

Gemma 4 is multimodal — it understands images natively. In LM Studio's chat interface, you can drag and drop an image directly into the message box, or use the attachment button to browse for a file.

Try these:

Drop in a screenshot and ask "What does this UI show?"
Attach a photo of a receipt and ask "Extract the total and date"
Paste a chart and ask "What trend does this show?"

The model processes the image entirely on your machine — nothing is uploaded anywhere.

Using the Local API

LM Studio can serve your loaded model as a local API server, which is useful for integrating Gemma 4 into your own apps, scripts, or development tools.

Starting the Server (GUI)

Go to the Developer tab in LM Studio
Load a model by pressing ⌘/Ctrl + L and selecting Gemma 4 from your downloaded list
The server starts automatically on http://localhost:1234

Starting the Server (CLI)

If you prefer the terminal, LM Studio includes a CLI called lms:

# List your downloaded models
lms ls

# Load a model
lms load <model_key>

# Start the API server
lms server start

Making API Calls

The API is OpenAI-compatible, so any code or tool that works with the OpenAI API works with LM Studio — just change the base URL:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="lm-studio",
)

response = client.chat.completions.create(
    model="gemma-4-e4b",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
)

print(response.choices[0].message.content)

# Or with curl
curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-4-e4b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

LM Studio also has its own native SDKs — lmstudio-python and lmstudio-js — that offer additional features like stateful chats and model management (download, load, unload) through code.

Tool Use and Function Calling

LM Studio supports Gemma 4's native function calling capabilities through the API. You can define tools in the standard OpenAI format and the model will generate structured function calls:

response = client.chat.completions.create(
    model="gemma-4-e4b",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"}
                },
                "required": ["city"]
            }
        }
    }]
)

This enables building agentic workflows where Gemma 4 decides when to call external tools and how to chain results together — all served from your local machine.

LM Studio vs. Ollama: Which Should You Use?

Both are excellent tools for running Gemma 4 locally. The choice depends on your workflow:

	LM Studio	Ollama
Interface	GUI app with chat UI	Command-line only
Model discovery	Built-in search and browse	Manual `pull` by name
Hardware optimization	Auto-detects and suggests settings	Auto-detects GPU
API	OpenAI-compatible on port 1234	OpenAI-compatible on port 11434
Format support	GGUF + MLX	GGUF
Best for	Visual workflows, exploration, beginners	Scripting, automation, servers

Use LM Studio if you want a polished desktop experience, prefer browsing models visually, or are new to local AI.

Use Ollama if you're comfortable with the terminal, want to script model management, or are running a headless server.

Both serve an OpenAI-compatible API, so your downstream code works with either one — switching between them is just a URL change.

Tips for a Better Experience

Let LM Studio choose your settings first. It detects your hardware and picks reasonable defaults for GPU layers, context length, and thread count. Only tweak these after you've confirmed the model runs.

Start with a smaller context length. The default might be set to the model's maximum (128K or 256K tokens). If you're just chatting, reducing this to 4K–8K frees up significant memory and speeds up loading.

Use MLX on Apple Silicon. If you're on a Mac with an M-series chip, look for MLX-format models — they're optimized for Apple's GPU architecture and typically run faster than GGUF on the same hardware.

Import custom models with the CLI. If you've converted or fine-tuned a Gemma 4 model to GGUF format yourself, use lms import to add it to LM Studio. It will auto-detect the model architecture and make it available in the app.

What's Next?

You've got Gemma 4 running in LM Studio with a chat UI and a local API. A few directions from here:

Compare models side by side — download multiple Gemma 4 variants and switch between them to see how quality scales with size
Connect it to your tools — the OpenAI-compatible API means VS Code extensions, Obsidian plugins, and other apps can use your local Gemma 4 as a backend
Check our other guides — Hardware Requirements if you want to optimize, or Benchmarks to understand what each variant is capable of

Running AI locally shouldn't feel like a systems administration task. With LM Studio and Gemma 4, it doesn't have to.