hochbichler.com - Tech Log

Build an MCP Server with Spring Boot 4

Thomas Hochbichler — Fri, 13 Mar 2026 00:00:00 GMT

Build an MCP Server with Spring Boot 4

Every MCP tutorial starts the same way: "First, install Python." Or TypeScript. Or Go. If you are a Java developer with a Spring Boot stack, you have been waiting for a Java option.

Spring Boot 4 changes that. Combined with Spring AI's @McpTool annotations, you can build an MCP server in Java that is just as concise as Python — with dependency injection, native image support, and the full Spring ecosystem behind it.

TL;DR: We build a lightweight MCP server with Spring Boot 4 that monitors external Spring Boot applications via their Actuator endpoints. Connect it to Claude Code and ask "is the order service healthy?" in natural language. Full working code included.

Companion code: spring-ai-mcp-actuator — three independent Maven projects you can build and run in minutes.

What Is MCP?

The Model Context Protocol (MCP) is an open standard for connecting AI applications to external tools and data. Think of it as a USB-C port for AI: one protocol, many connections.

The architecture is simple:

Client: the AI application (Claude Code, Claude Desktop, Cursor)
Server: your service that exposes capabilities
Three capability types: Tools (actions the AI can call), Resources (data the AI can read), Prompts (reusable templates)

MCP is now governed by the Agentic AI Foundation (AAIF) under the Linux Foundation and adopted by Google, Microsoft, OpenAI, and Amazon. It is not a niche experiment anymore — it is becoming the default integration layer for AI tooling.

For the full specification, see modelcontextprotocol.io. We will focus on building, not theory.

MCP vs Claude Code Skills: Skills (like /article-reviewer) are prompt-driven workflows that run inside Claude Code. MCP servers are standalone tool servers that follow an open protocol — any MCP client can connect to them, not just Claude Code. Think of Skills as internal scripts and MCP servers as external services.

Java vs Python: The Verbosity Myth

Before we start, let me address the common assumption. Most developers assume Java means more code. Here is a side-by-side comparison.

Python (FastMCP):

@mcp.tool()
def check_health(app_name: str = "") -> str:
    """Check the health of a monitored Spring Boot application"""
    return get_health(app_name)

Java (Spring AI):

@McpTool(description = "Check the health of a monitored Spring Boot application")
public String checkHealth(String appName) {
    return getHealth(appName);
}

Three lines vs three lines. The difference is cosmetic. But with Spring Boot you also get dependency injection, Spring Security, Spring Data, and the entire Spring ecosystem. For free.

What We Are Building

We will build a lightweight MCP server that monitors external Spring Boot applications via their Actuator endpoints. The MCP server itself does not run a web server — it communicates with Claude Code over STDIO and calls your apps' Actuator endpoints over HTTP.

When we are done, you can open Claude Code and have conversations like this:

You: Is localhost:8080 healthy?
Claude: [calls check-health(appName="localhost:8080")]
        localhost:8080 (http://localhost:8080) is UP
        {"status":"UP","components":{"db":{"status":"UP","details":
        {"database":"PostgreSQL","validationQuery":"isValid()"}},...}}

You: Check all apps
Claude: [calls check-health()]
        localhost:8080 (http://localhost:8080): UP
        localhost:8081 (http://localhost:8081): UP

You: What is the JVM memory usage on localhost:8080?
Claude: [calls get-metric(appName="localhost:8080", metricName="jvm.memory.used")]
        localhost:8080 — jvm.memory.used: {"name":"jvm.memory.used",
        "measurements":[{"statistic":"VALUE","value":1.34217728E8}],
        "baseUnit":"bytes"}

This is a practical pattern. Every Spring Boot app ships with Actuator. After this tutorial, you can point this MCP server at any running Spring Boot application and monitor it through natural language.

Prerequisites

Java 21 or later
Spring Boot 4.0 (GA, released November 2025)
Spring AI 2.0.0-M2 (current milestone as of March 2026)
Claude Code installed (code.claude.com)
Basic familiarity with Spring Boot

Note: Spring AI 2.0 is at milestone 2, not GA yet. APIs may change before the final release — no official GA date has been announced, but mid-2026 is a reasonable community estimate. The annotation-based approach shown here has been stable since M1.

Project Setup

Go to start.spring.io and configure:

Project: Maven
Language: Java
Spring Boot: 4.0.x
Group: com.hochbichler
Artifact: mcp-actuator
Java: 21
Dependencies: Spring Web

Download and unzip. Then add the Spring AI MCP Server dependency to your pom.xml:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.aigroupId>
            <artifactId>spring-ai-bomartifactId>
            <version>2.0.0-M2version>
            <type>pomtype>
            <scope>importscope>
        dependency>
    dependencies>
dependencyManagement>

<dependencies>
    
    <dependency>
        <groupId>org.springframework.aigroupId>
        <artifactId>spring-ai-starter-mcp-serverartifactId>
    dependency>

    
    <dependency>
        <groupId>org.springframework.bootgroupId>
        <artifactId>spring-boot-starter-webartifactId>
    dependency>
dependencies>

Watch out: older tutorials reference spring-ai-mcp-server-spring-boot-starter. That artifact name was renamed in Spring AI 1.0.0-M7. The correct name is spring-ai-starter-mcp-server.

We include spring-boot-starter-web for RestClient — Spring Boot 4's modern HTTP client. The web server will not conflict with STDIO because we explicitly set spring.main.web-application-type=none, disabling the embedded web server.

Now configure application.properties:

# MCP Server configuration
spring.ai.mcp.server.stdio=true
spring.ai.mcp.server.type=SYNC
spring.ai.mcp.server.annotation-scanner.enabled=true

# Application name
spring.application.name=mcp-actuator

# No web server — STDIO only
spring.main.web-application-type=none

Three MCP properties and one explicit web-type override. That is all the framework configuration you need.

stdio=true — use STDIO transport (Claude Code launches your JAR as a subprocess)
type=SYNC — synchronous server (filters out any Mono/Flux return types)
annotation-scanner.enabled=true — auto-discover @McpTool methods at startup
web-application-type=none — no embedded web server (required when using spring-boot-starter-web alongside the STDIO transport)

The target app URLs are passed as CLI arguments: --apps=http://localhost:8080,http://localhost:8081. We will parse those next.

App Registry: Parsing CLI Arguments

Create an AppRegistry component that parses the --apps argument and stores the target applications:

package com.hochbichler.mcpactuator;

import java.net.URI;
import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.Map;

import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;

@Component
public class AppRegistry {

    private final Map apps = new LinkedHashMap<>();

    public AppRegistry(@Value("${apps:}") String appsArg) {
        if (!appsArg.isBlank()) {
            for (String url : appsArg.split(",")) {
                url = url.trim();
                String name = extractAppName(url);
                apps.put(name, url);
            }
        }
    }

    private String extractAppName(String url) {
        URI uri = URI.create(url);
        String host = uri.getHost();
        int port = uri.getPort();
        return host + (port > 0 ? ":" + port : "");
    }

    public Map getApps() {
        return Collections.unmodifiableMap(apps);
    }

    public String getUrl(String appName) {
        return apps.get(appName);
    }
}

Spring Boot maps --apps=value on the command line to the apps property. The @Value("${apps:}") annotation injects it with an empty default. The registry derives a name from each URL — localhost:8080, localhost:8081, etc. — and stores the mapping.

Run the server with:

java -jar mcp-actuator.jar --apps=http://localhost:8080,http://localhost:8081

Your First MCP Tool: Health Check

Create a new class ActuatorMcpTools.java:

package com.hochbichler.mcpactuator;

import java.util.Map;

import org.springaicommunity.mcp.annotation.McpTool;
import org.springaicommunity.mcp.annotation.McpToolParam;
import org.springframework.stereotype.Component;
import org.springframework.web.client.RestClient;
import org.springframework.web.client.RestClientException;

@Component
public class ActuatorMcpTools {

    private final AppRegistry appRegistry;
    private final RestClient restClient;

    public ActuatorMcpTools(AppRegistry appRegistry) {
        this.appRegistry = appRegistry;
        this.restClient = RestClient.create();
    }

    @McpTool(
        name = "check-health",
        description = "Check the health of a monitored Spring Boot application. "
            + "Leave appName empty to check all apps.")
    public String checkHealth(
            @McpToolParam(description = "App name (e.g. localhost:8080) or leave empty to check all",
                          required = false)
            String appName) {

        if (appName == null || appName.isBlank()) {
            return checkAllApps();
        }

        String url = appRegistry.getUrl(appName);
        if (url == null) {
            return "Unknown app: " + appName
                + ". Registered apps: " + appRegistry.getApps().keySet();
        }

        return fetchHealth(appName, url);
    }

    private String checkAllApps() {
        var sb = new StringBuilder();
        for (var entry : appRegistry.getApps().entrySet()) {
            sb.append(fetchHealth(entry.getKey(), entry.getValue())).append("\n");
        }
        return sb.toString().trim();
    }

    private String fetchHealth(String name, String url) {
        try {
            String response = restClient.get()
                .uri(url + "/actuator/health")
                .retrieve()
                .body(String.class);
            return name + " (" + url + "): " + response;
        } catch (RestClientException e) {
            return name + " (" + url + "): DOWN — " + e.getMessage();
        }
    }
}

Why org.springaicommunity? The @McpTool and @McpToolParam annotations are not yet included in the official Spring AI 2.0.0-M2 starters. They live in the spring-ai-community/mcp-annotations incubating project (org.springaicommunity:spring-ai-mcp-annotations). Once they graduate into mainline Spring AI (expected in a later milestone), the package will change to org.springframework.ai.mcp.annotation. For now, add the community dependency to your pom.xml:
<dependency>
    <groupId>org.springaicommunitygroupId>
    <artifactId>spring-ai-mcp-annotationsartifactId>
    <version>0.0.3version>
dependency>

That is it. The @McpTool annotation tells Spring AI to:

Register this method as an MCP tool named check-health
Generate a JSON schema from the method signature (including the optional appName parameter)
Make it callable by any connected MCP client

The description field is important. MCP clients show this to the AI model so it knows when to use the tool. Be specific.

Notice the error handling: if a target app is down or unreachable, we catch the RestClientException and report it as DOWN instead of crashing. The MCP server stays healthy even when monitored apps are not.

Adding a Metrics Tool

Add this method to the same ActuatorMcpTools class:

@McpTool(
    name = "get-metric",
    description = "Get a specific metric from a monitored app. "
        + "Common metrics: jvm.memory.used, http.server.requests, "
        + "system.cpu.usage, process.uptime",
    annotations = @McpTool.McpAnnotations(
        readOnlyHint = true,
        destructiveHint = false
    ))
public String getMetric(
        @McpToolParam(description = "App name (e.g. localhost:8080)",
                      required = true)
        String appName,
        @McpToolParam(description = "Metric name, e.g. jvm.memory.used",
                      required = true)
        String metricName) {

    String url = appRegistry.getUrl(appName);
    if (url == null) {
        return "Unknown app: " + appName
            + ". Registered apps: " + appRegistry.getApps().keySet();
    }

    try {
        String response = restClient.get()
            .uri(url + "/actuator/metrics/" + metricName)
            .retrieve()
            .body(String.class);
        return appName + " — " + metricName + ": " + response;
    } catch (RestClientException e) {
        return "Failed to fetch " + metricName + " from " + appName
            + ": " + e.getMessage();
    }
}

A few things to notice:

@McpToolParam adds metadata to each parameter. The description tells the AI model what format to use. The required = true flag means the client must provide this value. Both appName and metricName are required here — unlike check-health, which makes appName optional for the "check all" convenience.

@McpTool.McpAnnotations was introduced in Spring AI 1.1 via the community annotations project and is available in Spring AI 2.0. The readOnlyHint tells the client this tool does not change any state. The destructiveHint = false confirms it is safe. These hints help AI models decide when to call your tools without asking for confirmation.

The description lists common metric names. This is a practical trick: when the AI model reads the tool description, it knows which values are valid. Without this, the model has to guess or ask the user.

Let us also add a tool to list all available metrics for a given app:

@McpTool(
    name = "list-metrics",
    description = "List all available metric names for a monitored app",
    annotations = @McpTool.McpAnnotations(readOnlyHint = true))
public String listMetrics(
        @McpToolParam(description = "App name (e.g. localhost:8080)",
                      required = true)
        String appName) {

    String url = appRegistry.getUrl(appName);
    if (url == null) {
        return "Unknown app: " + appName
            + ". Registered apps: " + appRegistry.getApps().keySet();
    }

    try {
        return restClient.get()
            .uri(url + "/actuator/metrics")
            .retrieve()
            .body(String.class);
    } catch (RestClientException e) {
        return "Failed to fetch metrics from " + appName + ": " + e.getMessage();
    }
}

Exposing App Info as an MCP Resource

Before adding the resource, it helps to understand why MCP distinguishes resources from tools at all — and why health status is not a good fit for a resource.

The official MCP specification draws a clear line: tools are model-controlled, resources are application-driven.

A tool is something the AI invokes — it decides when to call it, picks the arguments, and acts on the result. Tools are designed for interaction: querying a database, calling an API, running a computation.
A resource is something the AI (or the host application) reads — it is a URI-addressable piece of context: a file, a schema, a configuration snapshot. The MCP spec says resources exist to "share data that provides context to language models".

The key question when choosing between the two is: how often does this data change?

Health status changes every few seconds — an app can go from UP to DOWN while you are mid-conversation. If you expose health as a resource, Claude might read it once at the start and act on stale data. Health belongs as a tool: invoked on demand, always fresh.
Build info (version number, artifact name, git commit) is written at compile time and never changes while the app is running. This is safe to read once as background context. App info belongs as a resource.

The rule of thumb: expose data as a resource when it is stable during runtime (versions, registered apps, configuration). Expose it as a tool when it changes frequently or requires parameters to fetch a specific value.

Add a new class ActuatorMcpResources.java:

package com.hochbichler.mcpactuator;

import org.springaicommunity.mcp.annotation.McpResource;
import org.springframework.stereotype.Component;
import org.springframework.web.client.RestClient;
import org.springframework.web.client.RestClientException;

@Component
public class ActuatorMcpResources {

    private final AppRegistry appRegistry;
    private final RestClient restClient;

    public ActuatorMcpResources(AppRegistry appRegistry) {
        this.appRegistry = appRegistry;
        this.restClient = RestClient.create();
    }

    @McpResource(
        uri = "apps://info",
        name = "App Registry",
        description = "Registered apps and their static build info from /actuator/info")
    public String getAppInfo() {
        var sb = new StringBuilder();
        for (var entry : appRegistry.getApps().entrySet()) {
            String name = entry.getKey();
            String url = entry.getValue();
            sb.append("=== ").append(name).append(" ===\n");
            sb.append("URL: ").append(url).append("\n");
            try {
                String info = restClient.get()
                    .uri(url + "/actuator/info")
                    .retrieve()
                    .body(String.class);
                sb.append(info).append("\n");
            } catch (RestClientException e) {
                sb.append("info: not available\n");
            }
            sb.append("\n");
        }
        return sb.toString().trim();
    }
}

When Claude Code connects, it can read apps://info to learn which apps are registered and what versions they are running — without you having to ask. That context is stable for the entire session. When you then ask "why is the order service slow?", Claude already knows the service exists and what version it is; it only needs to call the check-health or get-metric tools for the live data.

Connecting to Claude Code

Build the JAR:

./mvnw clean package -DskipTests

Make sure your target Spring Boot applications are running with Actuator enabled. For example, if you have two apps running on ports 8080 and 8081, add the MCP server to Claude Code:

claude mcp add --transport stdio spring-actuator \
  -- java -jar /absolute/path/to/target/mcp-actuator-0.0.1-SNAPSHOT.jar \
  --apps=http://localhost:8080,http://localhost:8081

Important: use the absolute path to your JAR. Relative paths break because Claude Code launches the process from a different working directory.

Verify the connection inside Claude Code:

/mcp

You should see spring-actuator listed with status "connected" and three tools: check-health, get-metric, list-metrics.

The architecture here is key: Claude Code launches the MCP server JAR as a subprocess (via STDIO). The MCP server itself does not run a web server — it is a lightweight process that calls the Actuator endpoints on your target apps over HTTP using RestClient. Your actual applications run independently and just need Actuator exposed.

Try it:

You: Is the order service healthy?
You: What metrics are available on localhost:8080?
You: How much JVM memory is the order service using?

Claude Code calls your MCP tools, which fetch the Actuator data from the target apps over HTTP, and responds in natural language.

Spring Boot Cold Start and MCP_TIMEOUT

You might hit a connection timeout on first launch. Spring Boot needs a few seconds to start, and Claude Code's default MCP timeout may be too short.

Fix it by setting the timeout before starting Claude Code:

MCP_TIMEOUT=10000 claude

This gives your server 10 seconds to start. We will address this properly in the native image section.

When Things Go Wrong

Here are the issues I have run into and how to fix them:

Problem	Cause	Fix
Tool does not appear in `/mcp`	Method returns `Mono` but server type is `SYNC`	Change return type to a plain type, or set `spring.ai.mcp.server.type=ASYNC`
`Connection closed` error	Wrong JAR path or JAR does not exist	Use absolute path, run `ls` on the JAR to verify
`ENOENT` on Windows	Windows cannot execute `java` directly via STDIO	Use `cmd /c java -jar ...` as the command
Tool exists but AI never calls it	Description is too vague	Make the description specific. List valid input values if possible
`annotation-scanner` finds nothing	Class is missing `@Component`	Add `@Component` to your tool class. It must be a Spring-managed bean

Where to find logs: Claude Code logs MCP communication. Run /mcp and check the server status. For Spring-side logs, add logging.level.org.springframework.ai.mcp=DEBUG to application.properties.

The silent async trap: if you write a method that returns Mono in a SYNC server, Spring AI drops it with a warning in the startup log. No error. The tool just does not show up. Check your startup logs if tools are missing.

Beyond STDIO: HTTP Transport for Teams

STDIO works great for local development. But it requires Claude Code to launch your JAR as a subprocess. For team use or remote servers, switch to Streamable HTTP transport.

Step 1: swap the dependency in pom.xml:


<dependency>
    <groupId>org.springframework.aigroupId>
    <artifactId>spring-ai-starter-mcp-server-webmvcartifactId>
dependency>

Step 2: update application.properties:

# Remove: spring.ai.mcp.server.stdio=true
# Add:
spring.ai.mcp.server.protocol=STREAMABLE
server.port=8081

Step 3: start the app normally and connect Claude Code:

./mvnw spring-boot:run
claude mcp add --transport http spring-actuator http://localhost:8081/mcp

When to use which:

	STDIO	Streamable HTTP
Use when	Local dev, single user	Team use, remote servers, CI/CD
Startup	Claude Code launches the JAR	You run the server independently
Networking	None (in-process pipes)	HTTP, can run anywhere
Trade-off	Cold start delay	Need to manage a running server

Native Image: Instant Startup for MCP Servers

The STDIO cold start problem has a clean solution: GraalVM native image.

A Spring Boot 4 native image compiles your app ahead of time into a standalone binary. The result: startup in ~100ms instead of 3-5 seconds. No JVM needed at runtime. For a minimal MCP server like ours, the binary is typically around 50-80 MB (larger apps with more dependencies can exceed 100 MB).

For an MCP server — a small, single-purpose tool — this is a perfect fit.

Add the GraalVM native support to your pom.xml (Spring Boot 4 includes the plugin by default, you just need to activate the profile):

./mvnw -Pnative native:compile

This produces a binary at target/mcp-actuator. Add it to Claude Code without the java -jar wrapper:

claude mcp add --transport stdio spring-actuator \
  -- /absolute/path/to/target/mcp-actuator \
  --apps=http://localhost:8080,http://localhost:8081

No more MCP_TIMEOUT workaround. The server starts before Claude Code even finishes sending the initialization handshake.

Trade-offs:

Build time is significantly longer (2-5 minutes vs seconds for a regular JAR)
Reflection-based libraries may need GraalVM configuration hints
Spring Boot 4's improved AOT engine handles most cases automatically, but test your native build before relying on it
You need GraalVM installed locally or use a CI pipeline with native image support

For local development, stick with the regular JAR and MCP_TIMEOUT. Use native image for the version you distribute to your team or deploy as a shared tool.

Conclusion

The architectural choices worth remembering:

STDIO for local dev, Streamable HTTP for teams — same tool code, different transport dependency
Tools for live data, resources for stable context — health status changes every second; build info does not
Native image eliminates the cold start problem — a 100ms startup means no more MCP_TIMEOUT hacks

Every Spring Boot application already ships with Actuator — point this MCP server at any running instance and you can monitor it through natural language. To extend this further, add tools for /actuator/env, /actuator/loggers, or /actuator/threaddump using the same @McpTool pattern. Fork the companion code at spring-ai-mcp-actuator and try it with your own services.

ROPC Is Dead: How to Get User Tokens Without It

Thomas Hochbichler — Tue, 10 Mar 2026 00:00:00 GMT

ROPC Is Dead: How to Get User Tokens Without It

A practical migration guide for headless CLIs and APIs that need user-context tokens now that OAuth2 ROPC is prohibited.

Who this is for: Developers who use grant_type=password in CLIs, APIs, or scripts and need to migrate to a supported OAuth flow.

Reading time: ~18 minutes | Companion repository: ropc-alternative-flows-poc (Spring Boot 3.4 + Keycloak PoC)

TL;DR: RFC 9700 (January 2025) says ROPC "MUST NOT be used" — the strongest prohibition the IETF has. OAuth 2.1 removes it entirely. If your CLI or headless API collects usernames and passwords to get tokens via grant_type=password, you need to migrate. The Device Authorization Grant (RFC 8628) is the primary replacement for headless scenarios. Auth Code + PKCE with a localhost redirect works when a browser is available. This guide walks through both with complete HTTP examples, a decision tree, and a migration checklist.

1. What Happened to ROPC?

RFC 9700, titled "Best Current Practice for OAuth 2.0 Security", was published in January 2025. Section 2.4 is direct:

"The resource owner password credentials grant MUST NOT be used."

That's IETF "MUST NOT" — not a suggestion, not a deprecation warning. It's the strongest prohibition in RFC vocabulary. The reasoning: ROPC exposes user credentials directly to the client application, bypasses MFA, enables credential stuffing, and eliminates any chance of the authorization server enforcing its own security policies.

OAuth 2.1 (draft-ietf-oauth-v2-1-15, current as of March 2026) goes even further. ROPC isn't deprecated; it's removed entirely. Section 1.3 lists three grant types: authorization code, refresh token, and client credentials. ROPC simply doesn't exist anymore.

This article won't repeat the security arguments. Scott Brady's breakdown lists 9 specific problems, and WorkOS's RFC 9700 summary covers the standards context well. Instead, this guide focuses on what you should use instead.

2. The Real Problem: User Tokens Without a Browser

If you're reading this, you probably chose ROPC for a reason. You had a CLI, a headless API, or a batch script that needed to act on behalf of a specific user. grant_type=password was one HTTP request:

POST /oauth/token HTTP/1.1
Host: auth.example.com
Content-Type: application/x-www-form-urlencoded

grant_type=password&username=alice&password=s3cret&client_id=my-cli&scope=openid

One POST. One response. You have a token. No browser, no redirects, no local server. It was genuinely simple, and that simplicity is exactly why it was dangerous. The client application never needed to see Alice's password, but ROPC forced it to act as an intermediary that handles her credentials directly.

Now every recommended alternative seems to require a browser. If you're running a CLI over SSH into a headless server, "just add a redirect" isn't helpful.

The good news: the industry already solved this problem. GitHub CLI, Azure CLI, and AWS CLI all authenticate users from headless environments without ROPC. The patterns exist. You just need to know which one fits your scenario.

3. Decision Tree: Which Flow Replaces ROPC?

Not every ROPC migration looks the same. Your replacement flow depends on two questions: does the user's machine have a browser? and does the token need to represent a specific user?

Does the token need to represent a specific user?
├── No → Client Credentials (grant_type=client_credentials)
│         You never needed ROPC. Service accounts work fine.
│
└── Yes → Is a browser available on the device running the CLI?
    ├── Yes → Authorization Code + PKCE with localhost redirect
    │         CLI opens browser, catches the callback on localhost.
    │         (Section 5)
    │
    └── No  → What kind of usage?
        ├── Interactive (user present) → Device Authorization Grant
        │   CLI shows a URL + code, user authenticates elsewhere.
        │   (Section 4)
        │
        └── Non-interactive (CI/CD, scripts) → Personal Access Tokens
            User generates a PAT via web UI, configures it in CLI.
            (Section 7)

One more scenario: if your service already has a user token and needs to call a downstream API preserving user identity, that's Token Exchange (RFC 8693). See Section 6.

Scenario	Flow	Section
Headless CLI, user present	Device Authorization Grant	4
CLI with browser available	Auth Code + PKCE (localhost)	5
Service-to-service, user context	Token Exchange / On-Behalf-Of	6
Scripts, CI/CD, automation	Personal Access Tokens	7
No user context needed	Client Credentials	N/A

4. The Device Authorization Grant — Your Primary ROPC Replacement

The Device Authorization Grant (RFC 8628) was designed precisely for devices that can't open a browser — smart TVs, IoT sensors, and yes, headless CLIs. It's what GitHub CLI uses when you run gh auth login.

Here is how it works at a high level: your CLI requests a short code from the authorization server and displays it to the user. The user then opens a browser on any other device (their phone, their laptop) and enters that code to authenticate. Meanwhile, the CLI keeps checking the token endpoint until the user finishes.

The Full Flow: Step by Step

Step 1: Request device and user codes

Your CLI sends a POST to the authorization server's device authorization endpoint:

POST /realms/my-realm/protocol/openid-connect/auth/device HTTP/1.1
Host: keycloak.example.com
Content-Type: application/x-www-form-urlencoded

client_id=my-cli-app&scope=openid profile offline_access

Step 2: Server responds with codes

{
  "device_code": "df1b060b-4e36-4bbe-98aa-5dcb11909f5f",
  "user_code": "DRTD-NTJC",
  "verification_uri": "https://keycloak.example.com/realms/my-realm/device",
  "verification_uri_complete": "https://keycloak.example.com/realms/my-realm/device?user_code=DRTD-NTJC",
  "expires_in": 600,
  "interval": 5
}

Key fields:

device_code: Backend identifier for polling — never shown to the user
user_code: Short, human-readable code like DRTD-NTJC — this is what the user types
verification_uri: Where the user goes to authenticate
interval: Minimum seconds between poll requests (respect this or get rate-limited)
expires_in: The codes expire after this many seconds (typically 600)

Step 3: Display instructions to the user

Your CLI prints something like:

To sign in, open https://keycloak.example.com/realms/my-realm/device
and enter code: DRTD-NTJC

Waiting for authentication...

Some CLIs copy the code to the clipboard automatically (GitHub CLI does this). If verification_uri_complete is available, you can also display a QR code.

Step 4: User authenticates on another device

The user opens the URL on any device with a browser — their phone, a laptop, whatever. They enter the user code, log in with their credentials (including MFA if configured), and approve the authorization. This happens entirely on the authorization server's own login page. Your CLI never sees the password.

Step 5: CLI polls the token endpoint

While the user is authenticating, your CLI polls:

POST /realms/my-realm/protocol/openid-connect/token HTTP/1.1
Host: keycloak.example.com
Content-Type: application/x-www-form-urlencoded

grant_type=urn:ietf:params:oauth:grant-type:device_code
&device_code=df1b060b-4e36-4bbe-98aa-5dcb11909f5f
&client_id=my-cli-app

Before the user completes login, you get:

HTTP/1.1 400 Bad Request
{ "error": "authorization_pending" }

This is expected — keep polling at the interval rate.

Step 6: Receive tokens

After the user authorizes:

HTTP/1.1 200 OK
{
  "access_token": "eyJhbGciOiJSUzI1NiIs...",
  "refresh_token": "eyJhbGciOiJIUzI1NiIs...",
  "token_type": "Bearer",
  "expires_in": 3600,
  "scope": "openid profile offline_access"
}

You now have a user-context token — representing the specific user who authenticated — without ever touching their password.

Handling Polling Errors

Your polling loop needs to handle four error codes:

Error	Meaning	What to Do
`authorization_pending`	User hasn't finished authenticating	Keep polling at the same interval
`slow_down`	You're polling too frequently	Add 5 seconds to your interval
`expired_token`	The codes expired (user took too long)	Restart the entire flow from Step 1
`access_denied`	User denied the authorization	Stop polling, show an error message

On connection timeouts, use exponential backoff. Don't send rapid repeated requests to the endpoint.

ROPC vs. Device Flow — Side by Side

Property	ROPC	Device Flow
Credentials exposed to client	Yes — client sees the password	No — user authenticates on IdP's page
MFA support	No	Yes — IdP handles MFA natively
SSO support	No	Yes — same IdP session across apps
Phishing resistance	None	Higher — codes entered on trusted domain
Browser on same device	Not required	Not required
User context in token	Yes	Yes
Complexity	1 HTTP request	Polling loop + user interaction

You trade simplicity for security. One HTTP request becomes a polling loop. But your CLI never sees a password, MFA works without any extra configuration, and the authorization server stays in control of the login experience.

Security Note: Device Code Phishing (Storm-2372)

The Device Authorization Grant is not immune to phishing attacks. In February 2025, Microsoft reported that a Russia-linked group called Storm-2372 used device code flows to steal tokens. The attack works like this:

Attacker generates a legitimate device code
Sends the user_code to victims via WhatsApp, Signal, or Teams
Victim enters the code on the real IdP login page
Attacker's device receives the token via polling

Mitigations:

Restrict device code flow to applications that truly need it (Conditional Access policies in Entra ID, client-level toggles in Keycloak)
Educate users: device codes should only come from actions you initiated
Deploy anomaly detection for unusual device code patterns
Monitor token usage after device code grants

This does not make Device Flow worse than ROPC. ROPC hands the password directly to the client, which is a bigger risk. But you should be aware of this attack vector when adopting Device Flow.

Companion PoC: Spring Boot 3.4 + Keycloak

Hands-on reference: The ropc-alternative-flows-poc repository demonstrates a complete Device Authorization Flow implementation using Spring Boot 3.4 and Keycloak 26. It includes a headless CLI client that polls for tokens, a protected resource server, Keycloak realm configuration, and Docker Compose for local development. Clone it and run the full flow locally in under 10 minutes.

5. Auth Code + PKCE with Localhost Redirect

When a browser is available on the same machine (for example, a developer workstation rather than a headless server), Authorization Code + PKCE with a localhost redirect provides a better user experience. The CLI opens the system browser, the user logs in, and the browser redirects back to a temporary local server that receives the authorization code.

This is what Azure CLI (az login) and AWS CLI (aws sso login) use by default.

How It Works

CLI generates a PKCE code_verifier (random 43-128 character string) and derives code_challenge (SHA256, base64url-encoded)
CLI starts a temporary HTTP server on http://localhost:
CLI opens the system browser to the authorization URL
User authenticates on the IdP's login page
IdP redirects to http://localhost:?code=&state=
Local server catches the redirect, extracts the authorization code
CLI exchanges authorization code + code_verifier for tokens
Local server shuts down

Code Example: Node.js with openid-client 5.x

import { Issuer, generators } from 'openid-client'; // v5.7.0
import http from 'node:http';
import open from 'open'; // v10.1.0

const REDIRECT_PORT = 6363;
const REDIRECT_URI = `http://localhost:${REDIRECT_PORT}`;

// 1. Discover IdP endpoints via OIDC discovery
const issuer = await Issuer.discover('https://keycloak.example.com/realms/my-realm');
const client = new issuer.Client({
  client_id: 'my-cli-app',
  redirect_uris: [REDIRECT_URI],
  response_types: ['code'],
  token_endpoint_auth_method: 'none', // public client — no client secret
});

// 2. Generate PKCE values
const codeVerifier = generators.codeVerifier();
const codeChallenge = generators.codeChallenge(codeVerifier);

// 3. Build authorization URL
const authUrl = client.authorizationUrl({
  scope: 'openid profile offline_access',
  code_challenge: codeChallenge,
  code_challenge_method: 'S256',
});

// 4. Start localhost server, open browser, exchange code for tokens
const tokenSet = await new Promise((resolve, reject) => {
  const server = http.createServer(async (req, res) => {
    try {
      const params = client.callbackParams(req);
      const tokens = await client.oauthCallback(REDIRECT_URI, params, {
        code_verifier: codeVerifier,
      });
      res.writeHead(200, { 'Content-Type': 'text/html' });
      res.end('Authenticated. You can close this tab.
');
      server.close();
      resolve(tokens);
    } catch (err) {
      res.writeHead(500);
      res.end('Authentication failed.');
      server.close();
      reject(err);
    }
  });

  server.listen(REDIRECT_PORT, () => {
    console.log(`Listening on ${REDIRECT_URI}`);
    open(authUrl); // opens system browser
  });
});

console.log('Access token:', tokenSet.access_token);

Port Conflict Handling

Your localhost server might fail to bind if the port is already in use. Two strategies:

Fixed port with retry: Try a predefined list of ports (e.g., 6363, 6364, 6365). Register all of them as valid redirect URIs in your IdP.
Dynamic port: Bind to port 0 (OS assigns a free port). This requires your IdP to support wildcard or dynamic redirect URIs — most don't.

Terraform's docs recommend configuring ports 10000-10010 as the redirect port range. It works, even if the approach is simple.

When to Choose This Over Device Flow

Factor	Device Flow	Localhost PKCE
Browser on same machine	Not required	Required
UX	Copy code → switch to browser	Browser opens automatically
Headless server / SSH	Works	Doesn't work
Port conflicts	None	Possible
Phishing risk	Device code phishing (Storm-2372)	Localhost redirect harder to intercept

The industry pattern: default to localhost PKCE when a browser is detected, fall back to Device Flow when it isn't. Azure CLI and AWS CLI both do this.

6. What the Big CLIs Actually Do

These aren't theoretical alternatives. Here is how major CLI tools actually handle authentication today:

CLI Tool	Default Flow	Fallback	Notes
GitHub CLI (`gh auth login`)	Device code flow	`--with-token` for PATs	Displays code (clipboard copy via `--clipboard` flag). Uses `https://github.com/login/device`.
Azure CLI (`az login`)	Auth Code + PKCE (opens browser)	`--use-device-code`	Switched to browser-based as the default in recent versions. Windows defaults to WAM since v2.61.0.
AWS CLI (`aws sso login`)	Auth Code + PKCE (since v2.22.0)	`--use-device-code`	Switched from device code to PKCE as default.
Terraform (`terraform login`)	Auth Code + PKCE (localhost)	N/A	Uses localhost with configurable port range. No refresh tokens.
kubelogin (kubectl OIDC)	Auth Code + PKCE (opens browser)	N/A	Caches ID + refresh tokens locally.

The trend is clear: Auth Code + PKCE by default, Device Flow as fallback, PATs for automation. This dual-mode approach is the new industry standard.

7. Token Exchange and Personal Access Tokens

Two more alternatives complete the overview. Neither is a direct ROPC replacement for end-user login, but both solve scenarios where developers previously used ROPC.

Token Exchange / On-Behalf-Of (RFC 8693)

This solves a different problem: your API already received a user token (from Device Flow, Auth Code, etc.) and needs to call a downstream API while preserving the user's identity.

POST /oauth/token HTTP/1.1
Host: auth.example.com
Content-Type: application/x-www-form-urlencoded

grant_type=urn:ietf:params:oauth:grant-type:token-exchange
&subject_token=eyJhbGciOiJSUzI1NiIs...
&subject_token_type=urn:ietf:params:oauth:token-type:access_token
&audience=https://downstream-api.example.com
&scope=read write

The authorization server issues a new token that represents the same user but is scoped for the downstream API. Supported by Entra ID (as "On-Behalf-Of"), Okta, Keycloak, and Auth0.

Use Token Exchange when: a microservice needs to call another service on behalf of the user who initiated the request. Don't use it as a standalone login mechanism — it requires an existing user token as input.

Personal Access Tokens (PATs)

When a full OAuth flow is more than you need (CI/CD pipelines, simple scripts, automation), PATs are the practical choice. The user generates a token through a web UI and pastes it into their CLI config or an environment variable.

GitHub, GitLab, npm, and Docker Hub all use this pattern. GitHub's fine-grained PATs even let you scope permissions per-repository.

Factor	PATs	OAuth Flow (Device/PKCE)
Setup complexity	Low — generate in web UI	Higher — implement flow in client
Token lifetime	Long-lived	Short-lived access + refresh
Revocation	Manual via web UI	Automatic expiration
MFA enforcement	At creation time only	At each authentication
Best for	Scripts, CI/CD, simple tools	Production CLIs, user-facing apps

Use PATs when: the "user" is a build pipeline or a one-off script that doesn't need interactive login. Don't use them as a general-purpose ROPC replacement in production CLIs — they lack automatic expiration and per-session MFA.

8. Identity Provider Support Matrix

Before you choose a flow, verify that your identity provider (IdP) supports it. Device Flow has broad support, with one notable exception.

IdP	Device Flow	Auth Code + PKCE	Token Exchange	Notes
Keycloak 26.2+	Native	Native	Native (GA since 26.2; preview in 26.0)	Enable "OAuth 2.0 Device Authorization Grant" per client. Configure code lifespan and polling interval.
Microsoft Entra ID	Native	Native	Native (OBO)	Restrict via Conditional Access. Use `--use-device-code` in Azure CLI.
Auth0	Native	Native	Native (Token Vault)	Enable Device Authorization grant on the application settings page.
Okta	Native	Native	Native	Enable grant type on the app + authorization server policy rule.
AWS Cognito	Not native	Native	Not native	Device Flow requires a Lambda + DynamoDB workaround.

If you use Cognito and need Device Flow, the AWS-provided workaround uses Lambda to implement the flow on top of Cognito. It works, but it requires significantly more infrastructure than native support. You should evaluate whether switching to a different IdP would reduce complexity enough to justify the effort.

9. Migration Checklist

Follow these steps to move from ROPC to a modern flow without breaking existing users.

Step 1: Audit

[ ] Search your codebase for grant_type=password
[ ] Identify every client that uses ROPC — CLI tools, SDKs, internal scripts, CI/CD pipelines
[ ] Document which of those need user-context tokens vs. service accounts (client credentials)
[ ] Check your IdP's ROPC deprecation timeline (some will force-disable it)

Step 2: Choose Your Flow

Use the decision tree from Section 3:

Headless + user present → Device Authorization Grant
Browser available → Auth Code + PKCE (localhost redirect)
Non-interactive automation → PATs or Client Credentials
Service-to-service user propagation → Token Exchange

Step 3: Implement

[ ] Register a new OAuth client in your IdP with the appropriate grant type enabled
[ ] For Device Flow: implement the polling loop with proper error handling (authorization_pending, slow_down, expired_token, access_denied)
[ ] For Auth Code + PKCE: implement the localhost redirect server with port fallback
[ ] Request offline_access scope to get refresh tokens — your CLI shouldn't re-authenticate on every invocation
[ ] Store tokens securely (OS keychain, encrypted file, not plaintext in ~/.config)

Step 4: Parallel Run

[ ] Ship the new flow alongside ROPC (e.g., --use-device-code flag)
[ ] Log ROPC usage to track migration progress
[ ] Communicate the deprecation timeline to your users. Give them at least one release cycle to switch

Step 5: Deprecate ROPC

[ ] Remove grant_type=password from client code
[ ] Disable ROPC on the IdP (Keycloak: uncheck "Direct Access Grants Enabled"; Entra ID: block via Conditional Access)
[ ] Verify no remaining clients are using ROPC via IdP logs

What's Next

Pick one client that uses grant_type=password and migrate it to Device Flow this week. Start with the ropc-alternative-flows-poc PoC if you want a working reference — it has a Spring Boot 3.4 resource server, a Keycloak 26 realm, and a CLI client that demonstrates the full polling flow.

If your CLI runs on machines with browsers, implement the dual-mode pattern: Auth Code + PKCE by default, Device Flow via a --use-device-code flag. That's what Azure CLI and AWS CLI converged on, and it covers every deployment scenario.

ROPC was simple. Its replacements require more steps, but they are fundamentally safer. Your CLI stops handling user passwords and becomes what it should have been from the start: a token consumer that never touches credentials.

If this article saved you time, consider buying me a coffee:

EU-Compliant Claude Code with Mistral: Setup Guide

Thomas Hochbichler — Sun, 08 Mar 2026 20:46:01 GMT

Series: EU-Compliant Claude Code with Mistral Part 1: Setup Guide (this article) | Part 2: Testing the Limits (coming soon) | Part 3: Alternatives (coming soon)

A practical guide to routing Claude Code through Mistral's EU-hosted API — with configuration templates, model recommendations, and presets for cloud and local setups.

Reading time: ~14 minutes | Companion repository: claude-code-mistral

What this part covers:

What data Claude Code sends and why EU developers should care
Mistral.ai's compliance credentials
Architecture of the claude-code-router proxy
Step-by-step configuration with cloud and local presets
Model selection guide
Troubleshooting and smoke testing your setup

1. Introduction

Every time you run Claude Code, your source code leaves your machine. File contents, terminal output, directory structures, environment state — all of it streams to an LLM provider for processing. For most developers, that's a reasonable trade-off. For EU-based developers working with client code, personal data, or regulated infrastructure, it's a legal question.

The EU's regulatory framework for data protection has teeth. GDPR fines totalled over EUR 1.2 billion in 2025 alone. The AI Act imposes new obligations on providers of general-purpose AI models. NIS2 demands documented supply chain security assessments. Sending source code to a US-hosted AI provider without addressing these regulations isn't just risky — it's increasingly untenable.

This guide shows you how to keep using Claude Code — the tool you already know — while routing all requests through Mistral.ai's EU-hosted API. The result: your code stays in the EU, processed by a French company headquartered outside US jurisdiction, with SOC 2 Type II, ISO 27001, and ISO 27701 certifications.

The companion repository provides everything you need: a configuration template, cloud and local presets, and an automated setup script. Clone, run, and start coding in under five minutes.

2. The Problem: Your Code Leaves the EU

Before you configure anything, it's worth understanding exactly what data leaves your machine and why that matters under EU law.

Claude Code operates as an agentic tool. It doesn't just receive the prompt you type — it actively gathers context from your workspace. This includes:

Source code files: Full file contents, not snippets. The agent reads, writes, and edits files directly.
Directory structures: Project layout, import paths, and file relationships.
Terminal output: Command results, error messages, build output, and test results.
Environment state: Working directory, shell context, and system information.
Git metadata: Branch names, commit history, and diff output.

In an agentic workflow, this context gathering is automatic. Claude Code decides which files to read, which commands to run, and which context to include — often pulling in files you didn't explicitly reference. That's by design: it's what makes agentic coding assistants powerful.

Source code routinely contains personal data: hardcoded email addresses, user records in seed files, test fixtures with real names, API keys tied to individuals, and database connection strings. When an agentic tool processes your entire workspace, it processes all of this.

Why Anonymization Doesn't Solve This

You can't practically anonymize or sanitize this data before it reaches the LLM. Agentic workflows require full semantic context — complete files with valid import paths, working directory structures, and unmodified terminal output. Strip the personal data and you break the tool. Three specific reasons:

Semantic integrity: Code must compile, execute, and maintain valid cross-file references. Stripping personal data breaks functionality.
Automatic context gathering: The agent decides what context to read — intercepting and sanitizing this in real time isn't practical.
Regulatory opinion: The EDPB has set a high threshold for demonstrating true anonymization in LLM contexts, requiring rigorous case-by-case assessment (Opinion 28/2024). Personal data protections apply to data processed through language models even when the model itself doesn't store the data.

The Regulatory Picture

When source code containing personal data flows to a US-hosted provider, three EU regulations apply:

GDPR: Cross-border data transfers require a legal basis under Chapter V. The EU-US Data Privacy Framework (DPF) currently provides one, but it remains structurally fragile — the US CLOUD Act creates an unresolvable conflict with GDPR Article 48, and further legal challenges are expected. If you'd rather not monitor the evolving stability of the DPF, eliminating the cross-border transfer entirely is the most robust approach.

AI Act: Since August 2025, providers of general-purpose AI systems face documentation and due diligence obligations (deployer obligations for high-risk systems follow in August 2026). Choosing a provider with documented compliance credentials simplifies your assessment.

NIS2: Organizations in regulated sectors must assess the cybersecurity practices of their service providers. An AI coding assistant that processes source code is a supply chain dependency — sending code to a third-party LLM without a documented risk assessment is a compliance gap.

The practical alternative is to ensure the data never leaves the jurisdiction in the first place.

Disclaimer: This article is a technical guide for configuring AI development tools. It's not legal advice. For questions about GDPR compliance, data processing obligations, or regulatory requirements specific to your organization, consult a qualified Data Protection Officer (DPO) or legal counsel.

3. Mistral.ai: Why This Provider

Mistral AI is a French company legally domiciled in Paris under EU jurisdiction.

EU Data Residency: Mistral hosts data in the EU by default. The API endpoint https://api.mistral.ai/v1 routes through EU infrastructure, with encrypted backups replicated across EU availability zones.

Certifications:

SOC 2 Type II: Independently audited security controls (trust.mistral.ai)
ISO 27001: Information security management system
ISO 27701: Privacy information management (GDPR-aligned)

Data Processing Agreement: A DPA is available at legal.mistral.ai, covering GDPR requirements, subprocessor management, Standard Contractual Clauses, and 30-day data deletion on termination.

CLOUD Act exposure: The US CLOUD Act applies to providers with US legal presence. Unlike AWS, Azure, or Google Cloud — all US-incorporated and fully within CLOUD Act scope — Mistral is a French-headquartered company with no US parent. However, Mistral does maintain a US office and uses US cloud providers for some infrastructure. Organizations with strict sovereignty requirements should review this carefully. For most EU teams using Mistral's EU-hosted API (api.mistral.ai), the practical risk profile is substantially lower than routing through a US-headquartered provider.

This combination — EU residency by default, certifications, a DPA, and a non-US corporate structure — makes Mistral a strong candidate for EU-compliant AI coding workflows.

Mistral also offers Vibe CLI, their own open-source AI coding assistant that uses Mistral models natively with no proxy required. We compare Claude Code routing vs. Vibe CLI in detail in Part 3.

4. Architecture: The claude-code-router Proxy

The routing approach relies on claude-code-router (CCR), an open-source local proxy with 29,000+ GitHub stars. CCR intercepts Claude Code's API calls and forwards them to alternative LLM providers.

Important: claude-code-router is a community project, not endorsed by Anthropic. However, the mechanism it uses — the ANTHROPIC_BASE_URL environment variable — is an officially supported Claude Code feature for pointing the CLI at alternative API endpoints.

How It Works

Claude Code sends requests in Anthropic Messages API format to localhost:3456.
CCR applies a transformer pipeline: converts Anthropic format to OpenAI-compatible format and strips cache_control fields (the cleancache transformer).
The transformed request forwards to https://api.mistral.ai/v1.
Mistral processes the request and returns a response.
CCR converts the response back to Anthropic format and returns it to Claude Code.

Why Two Custom Transformers Are Required

Claude Code sends Anthropic-specific parameters that Mistral's API rejects with 422 errors:

cleancache (built-in): Strips cache_control: {"type": "ephemeral"} metadata from messages — part of Anthropic's prompt caching system.
stripreasoning (custom plugin): Strips the reasoning parameter (e.g., {"effort": "high", "enabled": false}) that Claude Code sends for extended thinking configuration.

You need both in the transformer pipeline. Without them, Mistral returns "Extra inputs are not permitted" validation errors. The stripreasoning plugin is included in the companion repository under plugins/strip-reasoning.js.

Startup

CCR gives you two startup methods:

ccr code — All-in-one: starts the proxy, reads ~/.claude-code-router/config.json, sets environment variables, and launches Claude Code as a subprocess. This is the recommended approach.
ccr start + eval "$(ccr activate)" + claude — Manual: start the proxy server, export env vars in your shell, then run Claude Code normally. Useful for shell integration.

5. Step-by-Step Configuration Walkthrough

Tested with: claude-code-router v2.0.0 | Node.js 20+ | Claude Code latest Models verified: Devstral 2 (2512), Codestral 2 (2501), Mistral Large 3, Mistral Small 3.1

Prefer automation? The companion repository includes a setup.sh script that handles every step below — install, configuration, and verification — in a single command. Clone it and skip this walkthrough entirely.

Prerequisites

node --version    # Must be >= 20.0.0
claude --version  # Must be installed
echo $MISTRAL_API_KEY  # Must be set

Get your API key at console.mistral.ai.

Install claude-code-router

npm install -g @musistudio/claude-code-router

Create the Configuration

Create ~/.claude-code-router/config.json with the following contents. Each field is explained inline:

{
  // Passthrough token — not a real secret, just used for
  // Claude Code to authenticate with the local proxy
  "APIKEY": "sk-mistral-router",

  // Enable logging for troubleshooting (disable after verifying)
  "LOG": true,
  "LOG_LEVEL": "info",

  "Providers": [
    {
      "name": "mistral",
      // Full Mistral EU endpoint — CCR uses this URL directly
      "api_base_url": "https://api.mistral.ai/v1/chat/completions",
      // Environment variable — never hardcode your key
      "api_key": "$MISTRAL_API_KEY",
      "models": [
        "devstral-latest",
        "codestral-latest",
        "mistral-large-latest",
        "mistral-small-latest"
      ],
      "transformer": {
        // cleancache: strips Anthropic cache_control fields (422 fix)
        // stripreasoning: strips reasoning params Mistral rejects
        "use": ["cleancache", "stripreasoning"]
      }
    }
  ],

  "Router": {
    // Default coding model — best SWE-bench score
    "default": "mistral,devstral-latest",
    // Lightweight tasks — cost-effective small model
    "background": "mistral,mistral-small-latest",
    // Reasoning-heavy tasks — largest model
    "think": "mistral,mistral-large-latest",
    // Large context requests — 256K window
    "longContext": "mistral,mistral-large-latest",
    // Switch to longContext model above 60K tokens
    "longContextThreshold": 60000
  },

  // Visual confirmation of active model in terminal
  "StatusLine": {
    "enabled": true
  }
}

The configuration maps four task types to four models:

Route	Model	When It's Used
`default`	Devstral	Standard coding tasks (file editing, search, generation)
`background`	Mistral Small	Lightweight background tasks (indexing, summaries)
`think`	Mistral Large	Complex reasoning (plan mode, architecture decisions)
`longContext`	Mistral Large	Requests exceeding 60K tokens

Verify

ccr code
# Run a simple task, then check:
cat ~/.claude-code-router/logs/ccr-*.log | grep "api.mistral.ai"

The statusline should display the active model name (e.g., devstral-latest). Log entries should show requests to api.mistral.ai.

6. Model Selection Guide

Mistral offers four models suitable for AI coding workflows. All four support tool-calling — a critical requirement for Claude Code's agentic capabilities (file editing, search, command execution).

Devstral (`devstral-latest`)

The primary coding model. 123B dense transformer with a 256K token context window.

SWE-bench Verified: 72.2% (also 61.3% on SWE-bench Multilingual)
Tool-calling: Full support, on par with best closed models
Pricing: $0.40/M input, $2.00/M output
Best for: Default route — everyday coding tasks

Devstral 2 is the recommended default. Its combination of coding performance, tool-calling reliability, and cost makes it the strongest choice for the default route.

Codestral (`codestral-latest`)

A code-specialized model with 256K context and fill-in-the-middle (FIM) support.

FIM: State-of-the-art (HumanEvalFIM 85.9%)
Tool-calling: Full function calling and parallel function calling supported
Best for: Code completion workflows, FIM tasks

Mistral Large 3 (`mistral-large-2512`)

The flagship model. 675B total parameters (41B active) using Mixture of Experts architecture with a 256K context window.

Architecture: MoE — only 41B parameters active per inference, keeping latency manageable despite 675B total
Tool-calling: Native function calling and multi-tool orchestration
Best for: think and longContext routes — complex reasoning and large codebases

Alias caveat: mistral-large-latest may still point to Large 2.1 (128K context) rather than Large 3 (256K context). If you need the 256K window reliably, pin to mistral-large-2512 explicitly.

Mistral Small 3.1 (`mistral-small-2503`)

A 24B parameter model optimized for low-latency responses with 128K context.

Tool-calling: Full support with strong agentic capabilities for its size
Pricing: Significantly cheaper than larger models
Best for: background route — lightweight tasks where speed matters more than depth

Summary: Recommended Route Assignments

Route	Model	Rationale
`default`	`devstral-latest`	Best coding benchmark, full tool-calling, good cost
`background`	`mistral-small-latest`	Fastest, cheapest, sufficient for simple tasks
`think`	`mistral-large-latest`	Strongest reasoning for complex decisions
`longContext`	`mistral-large-latest`	Largest context window (256K)

7. Presets: One-Command Switching Between Cloud and Local

The manual configuration from Section 5 works, but claude-code-router's preset system offers a more portable approach. A preset is a directory containing a manifest.json — a self-contained configuration package you can install with a single command.

The companion repository includes three presets:

Note: CCR's preset install command has two bugs (#1256): it does not create the preset directory before writing, and its "already installed" guard fires if the directory already exists. The workarounds below copy preset files directly — bypassing the installer for local presets — and use CCR's name-based reconfigure flow only where an interactive prompt is needed (cloud API key).

Cloud Preset (`presets/mistral-cloud/`)

Routes requests to Mistral's EU-hosted API. During installation, CCR prompts for your API key using a secure password input field — the key is never stored in the manifest file itself.

cp -r presets/mistral-cloud ~/.claude-code-router/presets/
# start coding session with mistral-cloud settings
ccr mistral-cloud

The cloud preset uses a schema field to define install-time prompts. The {{apiKey}} placeholder in the manifest gets replaced with your input during installation:

{
  "schema": [
    {
      "id": "apiKey",
      "type": "password",
      "label": "Mistral API Key",
      "prompt": "Enter your Mistral API key (from console.mistral.ai)"
    }
  ]
}

This approach keeps the preset file shareable — no secrets in the repository, no environment variables to configure first.

Local Preset — Ollama (`presets/mistral-ollama/`)

Routes requests to a local Ollama instance. Data never leaves your machine — no API key, no cloud dependency.

1. Install Ollama

Download and install Ollama from ollama.com. On macOS:

brew install ollama

2. Pull the model

ollama pull devstral-small:latest
# Downloads ~14 GB — requires at least 16 GB RAM (24 GB recommended)

Verify the model is available:

ollama list
# NAME                    ID              SIZE    MODIFIED
# devstral-small:latest   abc123...       14 GB   ...

3. Start Ollama (if not already running as a background service)

ollama serve
# Ollama is running on http://localhost:11434

4. Install and activate the preset

cp -r presets/mistral-ollama ~/.claude-code-router/presets/
# start coding session with mistral-ollama settings
ccr mistral-ollama

The preset targets http://localhost:11434/v1 with a dummy api_key of "ollama" — Ollama's OpenAI-compatible server requires no authentication. The schema is empty: no prompts during installation.

The preset omits think and longContext routes. A 24B model on consumer hardware has practical limits — all task types fall back to the default route.

Local Preset — LM Studio (`presets/mistral-lm-studio/`)

Routes requests to a local LM Studio server. LM Studio provides a GUI for browsing, downloading, and running quantized models — no command line required for model management.

1. Install LM Studio

Download from lmstudio.ai and install. LM Studio is available for macOS, Windows, and Linux.

2. Download the model

Open LM Studio and go to the Models tab. Search for devstral-small and download Devstral Small 2 (mistralai/devstral-small-2-2512). The quantized GGUF variant fits in ~14 GB of RAM.

3. Load the model and start the local server

Go to the Developer tab (or Local Server in newer versions)
Select mistralai/devstral-small-2-2512 from the model dropdown
Click Start Server

LM Studio's server starts on http://localhost:1234 and exposes an OpenAI-compatible API. The model identifier it reports is mistralai/devstral-small-2-2512 — this must match what's in the preset manifest, which it already does.

4. Install and activate the preset

cp -r presets/mistral-lm-studio ~/.claude-code-router/presets/
# start coding session with mistral-lm-studio settings
ccr mistral-lm-studio

Like the Ollama preset, schema is empty and think/longContext routes are omitted. The endpoint targets http://localhost:1234/v1/chat/completions with a dummy api_key of "lm-studio".

Model identifier note: The preset uses mistralai/devstral-small-2-2512 as the model ID. If you load a different model in LM Studio, update the models array and Router values in presets/mistral-lm-studio/manifest.json to match the identifier shown in LM Studio's server UI before copying the preset.

Switching

Switching between presets is a single command:

# Start coding session with Mistral Cloud
ccr code # because Mistral Cloud is our default config ~/.claude-code-router/config.json
# Start coding session with Mistral Ollama
ccr mistral-ollama
# Start coding session with Mistral LM-Studio
ccr mistral-lm-studio

Important: When changing the presets or config.json, you have to restart the CCR server

ccr restart

Persistent Shell Integration

For automatic routing in every terminal session, add the activation to your shell profile:

# Add to ~/.zshrc or ~/.bashrc
export MISTRAL_API_KEY="your-key-here"
eval "$(ccr activate)"

With this in place, you can use claude directly — all requests route through the Mistral proxy automatically.

8. Troubleshooting Guide

This section covers the most common issues you'll encounter when routing Claude Code through Mistral via claude-code-router.

422 API Error: "Extra inputs are not permitted"

Symptom: 422 Unprocessable Entity with "Extra inputs are not permitted" for cache_control or reasoning fields.

Cause: Claude Code sends Anthropic-specific parameters that Mistral doesn't recognize:

cache_control: {"type": "ephemeral"} on messages (prompt caching)
reasoning: {"effort": "high", "enabled": false} on the request body (extended thinking config)

Fix: Make sure your config.json includes both transformers in the use array:

"transformer": {
  "use": ["cleancache", "stripreasoning"]
}

Also ensure the custom stripreasoning plugin is registered in the top-level transformers array:

"transformers": [
  {"path": "/path/to/.claude-code-router/plugins/strip-reasoning.js"}
]

The setup.sh script handles this automatically. If you configured manually, copy plugins/strip-reasoning.js from the repo to ~/.claude-code-router/plugins/ and add both config entries.

Missing API Key

Symptom: Authentication errors or MISTRAL_API_KEY not set.

Fix: Set the environment variable before starting CCR:

export MISTRAL_API_KEY="your-key-here"
ccr code

For persistent configuration, add the export to ~/.zshrc or ~/.bashrc.

Node.js Version Error

Symptom: CCR fails to install or start with compatibility errors.

Fix: CCR requires Node.js 20+. Check with node --version and update via nvm:

nvm install 20
nvm use 20

Connection Timeout

Symptom: Requests hang or time out when reaching Mistral's API.

Fix:

Verify your API key is valid at console.mistral.ai.
Check network connectivity to api.mistral.ai.
If you're behind a corporate proxy, configure PROXY_URL in config.json.

Existing Configuration Conflict

Symptom: setup.sh warns about an existing configuration, or Claude Code behaves unexpectedly after setup.

Fix: The setup script creates a backup at ~/.claude-code-router/config.json.bak before overwriting. If you need to restore your previous configuration:

cp ~/.claude-code-router/config.json.bak ~/.claude-code-router/config.json

Model Deprecation and Alias Changes

Symptom: A model ID stops working or behaves differently than expected.

Fix: Mistral updates model aliases over time. mistral-large-latest and mistral-small-latest will point to newer versions as they're released. If you need consistent behavior, pin to specific version IDs:

"default": "mistral,devstral-latest",
"think": "mistral,mistral-large-2512"

Check Mistral's model documentation for current alias mappings.

9. Smoke Test: Verifying Your Setup

Before using this setup for real work, run through this quick verification checklist.

1. Start the proxy

ccr code

Confirm: The statusline at the bottom of the terminal displays a Mistral model name (e.g., devstral-latest). If you see no statusline, check that StatusLine.enabled is true in your config.

2. Run a simple task

In the Claude Code session, type a straightforward request:

Create a file called hello.txt with the text "Hello from Mistral"

Confirm: Claude Code creates the file. The response completes without errors. The statusline shows the model that handled the request.

3. Check the logs

In a separate terminal:

cat ~/.claude-code-router/logs/ccr-*.log | grep "api.mistral.ai"

Confirm: Log entries show requests to api.mistral.ai. No requests to api.anthropic.com.

Pass/Fail

Check	Expected
Statusline shows Mistral model	Model name visible in terminal footer
Simple task completes	File created, no errors
Logs show `api.mistral.ai`	All requests routed to EU endpoint
No `api.anthropic.com` in logs	Zero requests to Anthropic

If all four checks pass, your setup is working. You're routing Claude Code through Mistral's EU infrastructure.

If any check fails, refer to the Troubleshooting Guide above.

10. Conclusion and What's Next

You now have Claude Code routing through Mistral's EU-hosted API. Your source code stays within EU borders, processed by a provider with SOC 2 Type II, ISO 27001, and ISO 27701 certifications, with a substantially lower US CLOUD Act exposure than routing through a US-headquartered provider. The same Claude Code workflow — keybindings, MCP servers, skills, CLAUDE.md files — all preserved.

The companion repository at github.com/hochbichler/claude-code-mistral provides:

setup.sh: Automated setup in under 5 minutes
config.json: Pre-configured template with four-model routing
Three presets: Switch between Mistral's EU API, local Ollama, and local LM Studio with a single command

This is a technical configuration guide, not a compliance certification. Using an EU-hosted provider is one component of a broader data protection strategy. Talk to your DPO or legal counsel about your specific obligations.

Coming Next

Part 2: Testing the Limits (coming soon) — We put this setup through real-world coding tasks: tool-calling reliability per model, MCP server compatibility, skills evaluation, extended thinking behavior, and an honest assessment of what works and what breaks.

Part 3: Beyond Mistral (coming soon) — Alternative EU-compliant setups: Mistral Vibe CLI deep dive, other EU-hosted providers (Scaleway, OVHcloud, Aleph Alpha), self-hosted open-weight models, multi-provider routing, and enterprise deployment patterns.

*Written by Thomas Hochbichler — I help development teams integrate AI coding tools into compliant workflows.

This article is a technical guide for configuring AI development tools. It's not legal advice. For questions about GDPR compliance, data processing obligations, or regulatory requirements specific to your organization, consult a qualified Data Protection Officer (DPO) or legal counsel.

hochbichler.com - Tech Log

Build an MCP Server with Spring Boot 4

Build an MCP Server with Spring Boot 4

What Is MCP?

Java vs Python: The Verbosity Myth

What We Are Building

Prerequisites

Project Setup

App Registry: Parsing CLI Arguments

Your First MCP Tool: Health Check

Adding a Metrics Tool

Exposing App Info as an MCP Resource

Connecting to Claude Code

Spring Boot Cold Start and MCP_TIMEOUT

When Things Go Wrong

Beyond STDIO: HTTP Transport for Teams

Native Image: Instant Startup for MCP Servers

Conclusion

ROPC Is Dead: How to Get User Tokens Without It

ROPC Is Dead: How to Get User Tokens Without It

1. What Happened to ROPC?

2. The Real Problem: User Tokens Without a Browser

3. Decision Tree: Which Flow Replaces ROPC?

4. The Device Authorization Grant — Your Primary ROPC Replacement

The Full Flow: Step by Step

Handling Polling Errors

ROPC vs. Device Flow — Side by Side

Security Note: Device Code Phishing (Storm-2372)

Companion PoC: Spring Boot 3.4 + Keycloak

5. Auth Code + PKCE with Localhost Redirect

How It Works

Code Example: Node.js with openid-client 5.x

Authenticated. You can close this tab.

Port Conflict Handling

When to Choose This Over Device Flow

6. What the Big CLIs Actually Do

7. Token Exchange and Personal Access Tokens

Token Exchange / On-Behalf-Of (RFC 8693)

Personal Access Tokens (PATs)

8. Identity Provider Support Matrix

9. Migration Checklist

Step 1: Audit

Step 2: Choose Your Flow

Step 3: Implement

Step 4: Parallel Run

Step 5: Deprecate ROPC

What's Next

EU-Compliant Claude Code with Mistral: Setup Guide

1. Introduction

2. The Problem: Your Code Leaves the EU

Why Anonymization Doesn't Solve This

The Regulatory Picture

3. Mistral.ai: Why This Provider

4. Architecture: The claude-code-router Proxy

How It Works

Why Two Custom Transformers Are Required

Startup

5. Step-by-Step Configuration Walkthrough

Prerequisites

Install claude-code-router

Create the Configuration

Verify

6. Model Selection Guide

Devstral (devstral-latest)

Codestral (codestral-latest)

Mistral Large 3 (mistral-large-2512)

Mistral Small 3.1 (mistral-small-2503)

Summary: Recommended Route Assignments

7. Presets: One-Command Switching Between Cloud and Local

Cloud Preset (presets/mistral-cloud/)

Local Preset — Ollama (presets/mistral-ollama/)

Local Preset — LM Studio (presets/mistral-lm-studio/)

Switching

Persistent Shell Integration

8. Troubleshooting Guide

422 API Error: "Extra inputs are not permitted"

Missing API Key

Node.js Version Error

Connection Timeout

Existing Configuration Conflict

Devstral (`devstral-latest`)

Codestral (`codestral-latest`)

Mistral Large 3 (`mistral-large-2512`)

Mistral Small 3.1 (`mistral-small-2503`)

Cloud Preset (`presets/mistral-cloud/`)

Local Preset — Ollama (`presets/mistral-ollama/`)

Local Preset — LM Studio (`presets/mistral-lm-studio/`)