Gemma4All logoGemma4All
Gemma 4LM StudioLocal AIGetting Started

Run Gemma 4 with LM Studio: The Visual Way to Local AI

A step-by-step guide to running Google's Gemma 4 locally using LM Studio — the desktop app with a built-in chat UI, model browser, and OpenAI-compatible API. No command line required.

April 7, 20268 min read

Not everyone wants to type commands into a terminal to run a local AI model. If you'd rather have a proper desktop app with a search bar, a chat window, and a one-click download button, LM Studio is what you're looking for.

This guide walks you through setting up LM Studio and getting Gemma 4 running on your machine — from installation to your first conversation, plus how to use the built-in API server for development.

What Is LM Studio?

LM Studio is a free desktop application for running large language models locally on macOS, Windows, and Linux. Think of it as a complete local AI workstation with a graphical interface.

What sets it apart from CLI tools like Ollama:

  • Built-in model browser — search, discover, and download models without leaving the app
  • Chat interface — a clean UI for conversations, similar to ChatGPT
  • Hardware-aware loading — LM Studio detects your GPU and RAM, then suggests the best quantization and settings automatically
  • OpenAI-compatible API — serves models as a local API that works with any tool expecting the OpenAI format
  • Supports GGUF and MLX formats — GGUF for all platforms, MLX for optimized Apple Silicon performance

LM Studio supports Gemma models in both formats, so whether you're on a MacBook, a Windows desktop, or a Linux workstation, it has you covered.

Step 1: Install LM Studio

Download the installer for your OS from lmstudio.ai/download:

  • macOS: Download the .dmg, drag to Applications, and launch
  • Windows: Run the .exe installer
  • Linux: Follow the instructions on the download page

Once installed, open LM Studio. You'll see the main interface with a model search bar, a chat panel, and a sidebar for settings.

Step 2: Download Gemma 4

There are two ways to get Gemma 4 into LM Studio.

Option A: In-App Model Browser (Easiest)

  1. Press ⌘ + Shift + M (Mac) or Ctrl + Shift + M (Windows/Linux) to open the model search
  2. Type "Gemma 4" in the search bar
  3. Browse the results — LM Studio will highlight which variants are compatible with your hardware
  4. Pick a variant and click Download

LM Studio automatically suggests the best quantization level for your machine. If you have 16 GB of RAM, it'll steer you toward a quantized E4B. If you have 32 GB+, it may suggest the 26B MoE.

Option B: From Hugging Face

If you prefer to browse models on Hugging Face first:

  1. Enable LM Studio in your Hugging Face Local Apps Settings
  2. Go to any Gemma 4 model page on Hugging Face
  3. Click the "Use this model" dropdown and select LM Studio
  4. The model downloads and appears in LM Studio automatically

This is handy if you've found a specific community-quantized version or a fine-tuned variant you want to try.

Which Model to Download?

If you're not sure, here's a quick reference:

Your RAMRecommended ModelWhy
8 GBGemma 4 E4B (Q4)Fits with room for the OS
16 GBGemma 4 E4B (Q8) or 26B A4B (Q4)Better quality, or step up to MoE
24–32 GBGemma 4 26B A4B (Q4–Q8)Best balance of quality and speed
48 GB+Gemma 4 31BFull dense model, maximum quality

Step 3: Start Chatting

Once the download finishes:

  1. Click on the model name in your model list to load it
  2. A progress bar shows the model loading into memory
  3. When it's ready, the chat window activates — type a message and hit Enter

That's it. No configuration files, no terminal commands, no environment variables. You're chatting with Gemma 4 locally.

A few things to try in your first session:

  • "Explain how a car engine works, like I'm 10 years old." — Tests general knowledge and tone control
  • "Write a Python function that finds duplicate files in a directory." — Tests code generation
  • "What are the pros and cons of TypeScript vs. JavaScript?" — Tests structured reasoning

Working with Images

Gemma 4 is multimodal — it understands images natively. In LM Studio's chat interface, you can drag and drop an image directly into the message box, or use the attachment button to browse for a file.

Try these:

  • Drop in a screenshot and ask "What does this UI show?"
  • Attach a photo of a receipt and ask "Extract the total and date"
  • Paste a chart and ask "What trend does this show?"

The model processes the image entirely on your machine — nothing is uploaded anywhere.

Using the Local API

LM Studio can serve your loaded model as a local API server, which is useful for integrating Gemma 4 into your own apps, scripts, or development tools.

Starting the Server (GUI)

  1. Go to the Developer tab in LM Studio
  2. Load a model by pressing ⌘/Ctrl + L and selecting Gemma 4 from your downloaded list
  3. The server starts automatically on http://localhost:1234

Starting the Server (CLI)

If you prefer the terminal, LM Studio includes a CLI called lms:

# List your downloaded models
lms ls

# Load a model
lms load <model_key>

# Start the API server
lms server start

Making API Calls

The API is OpenAI-compatible, so any code or tool that works with the OpenAI API works with LM Studio — just change the base URL:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="lm-studio",
)

response = client.chat.completions.create(
    model="gemma-4-e4b",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
)

print(response.choices[0].message.content)
# Or with curl
curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-4-e4b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

LM Studio also has its own native SDKs — lmstudio-python and lmstudio-js — that offer additional features like stateful chats and model management (download, load, unload) through code.

Tool Use and Function Calling

LM Studio supports Gemma 4's native function calling capabilities through the API. You can define tools in the standard OpenAI format and the model will generate structured function calls:

response = client.chat.completions.create(
    model="gemma-4-e4b",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"}
                },
                "required": ["city"]
            }
        }
    }]
)

This enables building agentic workflows where Gemma 4 decides when to call external tools and how to chain results together — all served from your local machine.

LM Studio vs. Ollama: Which Should You Use?

Both are excellent tools for running Gemma 4 locally. The choice depends on your workflow:

LM StudioOllama
InterfaceGUI app with chat UICommand-line only
Model discoveryBuilt-in search and browseManual pull by name
Hardware optimizationAuto-detects and suggests settingsAuto-detects GPU
APIOpenAI-compatible on port 1234OpenAI-compatible on port 11434
Format supportGGUF + MLXGGUF
Best forVisual workflows, exploration, beginnersScripting, automation, servers

Use LM Studio if you want a polished desktop experience, prefer browsing models visually, or are new to local AI.

Use Ollama if you're comfortable with the terminal, want to script model management, or are running a headless server.

Both serve an OpenAI-compatible API, so your downstream code works with either one — switching between them is just a URL change.

Tips for a Better Experience

Let LM Studio choose your settings first. It detects your hardware and picks reasonable defaults for GPU layers, context length, and thread count. Only tweak these after you've confirmed the model runs.

Start with a smaller context length. The default might be set to the model's maximum (128K or 256K tokens). If you're just chatting, reducing this to 4K–8K frees up significant memory and speeds up loading.

Use MLX on Apple Silicon. If you're on a Mac with an M-series chip, look for MLX-format models — they're optimized for Apple's GPU architecture and typically run faster than GGUF on the same hardware.

Import custom models with the CLI. If you've converted or fine-tuned a Gemma 4 model to GGUF format yourself, use lms import to add it to LM Studio. It will auto-detect the model architecture and make it available in the app.

What's Next?

You've got Gemma 4 running in LM Studio with a chat UI and a local API. A few directions from here:

  • Compare models side by side — download multiple Gemma 4 variants and switch between them to see how quality scales with size
  • Connect it to your tools — the OpenAI-compatible API means VS Code extensions, Obsidian plugins, and other apps can use your local Gemma 4 as a backend
  • Check our other guidesHardware Requirements if you want to optimize, or Benchmarks to understand what each variant is capable of

Running AI locally shouldn't feel like a systems administration task. With LM Studio and Gemma 4, it doesn't have to.