<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[hochbichler.com - Tech Log]]></title><description><![CDATA[Expert articles in the context of agentic AI and enterprise software development.]]></description><link>https://blog.hochbichler.com</link><image><url>https://cdn.hashnode.com/uploads/logos/69a98ea33728a9dc35843f0b/2d5568ec-1d1c-4eea-843a-72e1e0edf7bc.png</url><title>hochbichler.com - Tech Log</title><link>https://blog.hochbichler.com</link></image><generator>RSS for Node</generator><lastBuildDate>Fri, 24 Apr 2026 23:05:33 GMT</lastBuildDate><atom:link href="https://blog.hochbichler.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Build an MCP Server with Spring Boot 4]]></title><description><![CDATA[Build an MCP Server with Spring Boot 4
Every MCP tutorial starts the same way: "First, install Python." Or TypeScript. Or Go. If you are a Java developer with a Spring Boot stack, you have been waiting for a Java option.
Spring Boot 4 changes that. C...]]></description><link>https://blog.hochbichler.com/spring-boot-mcp-server</link><guid isPermaLink="true">https://blog.hochbichler.com/spring-boot-mcp-server</guid><category><![CDATA[spring-boot]]></category><category><![CDATA[spring ai]]></category><category><![CDATA[mcp]]></category><category><![CDATA[Java]]></category><category><![CDATA[AI]]></category><dc:creator><![CDATA[Thomas Hochbichler]]></dc:creator><pubDate>Fri, 13 Mar 2026 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1773595023547/8f888c7e-9c6b-4312-a269-5c69c2b4bc0f.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-build-an-mcp-server-with-spring-boot-4">Build an MCP Server with Spring Boot 4</h1>
<p>Every MCP tutorial starts the same way: "First, install Python." Or TypeScript. Or Go. If you are a Java developer with a Spring Boot stack, you have been waiting for a Java option.</p>
<p><strong>Spring Boot 4</strong> changes that. Combined with Spring AI's <code>@McpTool</code> annotations, you can build an MCP server in Java that is just as concise as Python — with dependency injection, native image support, and the full Spring ecosystem behind it.</p>
<p><strong>TL;DR</strong>: We build a lightweight MCP server with Spring Boot 4 that monitors external Spring Boot applications via their Actuator endpoints. Connect it to Claude Code and ask "is the order service healthy?" in natural language. Full working code included.</p>
<p><strong>Companion code</strong>: <a target="_blank" href="https://github.com/thomas-hochbichler/spring-ai-mcp-actuator">spring-ai-mcp-actuator</a> — three independent Maven projects you can build and run in minutes.</p>
<h2 id="heading-what-is-mcp">What Is MCP?</h2>
<p>The <strong>Model Context Protocol (MCP)</strong> is an open standard for connecting AI applications to external tools and data. Think of it as a USB-C port for AI: one protocol, many connections.</p>
<p>The architecture is simple:</p>
<ul>
<li><strong>Client</strong>: the AI application (Claude Code, Claude Desktop, Cursor)</li>
<li><strong>Server</strong>: your service that exposes capabilities</li>
<li><strong>Three capability types</strong>: Tools (actions the AI can call), Resources (data the AI can read), Prompts (reusable templates)</li>
</ul>
<p>MCP is now governed by the <strong>Agentic AI Foundation (AAIF)</strong> under the <strong>Linux Foundation</strong> and adopted by Google, Microsoft, OpenAI, and Amazon. It is not a niche experiment anymore — it is becoming the default integration layer for AI tooling.</p>
<p>For the full specification, see <a target="_blank" href="https://modelcontextprotocol.io">modelcontextprotocol.io</a>. We will focus on building, not theory.</p>
<blockquote>
<p><strong>MCP vs Claude Code Skills</strong>: Skills (like <code>/article-reviewer</code>) are prompt-driven workflows that run inside Claude Code. MCP servers are standalone tool servers that follow an open protocol — any MCP client can connect to them, not just Claude Code. Think of Skills as internal scripts and MCP servers as external services.</p>
</blockquote>
<h2 id="heading-java-vs-python-the-verbosity-myth">Java vs Python: The Verbosity Myth</h2>
<p>Before we start, let me address the common assumption. Most developers assume Java means more code. Here is a side-by-side comparison.</p>
<p><strong>Python (FastMCP):</strong></p>
<pre><code class="lang-python"><span class="hljs-meta">@mcp.tool()</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">check_health</span>(<span class="hljs-params">app_name: str = <span class="hljs-string">""</span></span>) -&gt; str:</span>
    <span class="hljs-string">"""Check the health of a monitored Spring Boot application"""</span>
    <span class="hljs-keyword">return</span> get_health(app_name)
</code></pre>
<p><strong>Java (Spring AI):</strong></p>
<pre><code class="lang-java"><span class="hljs-meta">@McpTool(description = "Check the health of a monitored Spring Boot application")</span>
<span class="hljs-function"><span class="hljs-keyword">public</span> String <span class="hljs-title">checkHealth</span><span class="hljs-params">(String appName)</span> </span>{
    <span class="hljs-keyword">return</span> getHealth(appName);
}
</code></pre>
<p>Three lines vs three lines. The difference is cosmetic. But with Spring Boot you also get dependency injection, Spring Security, Spring Data, and the entire Spring ecosystem. For free.</p>
<h2 id="heading-what-we-are-building">What We Are Building</h2>
<p>We will build a lightweight MCP server that monitors <strong>external</strong> Spring Boot applications via their Actuator endpoints. The MCP server itself does not run a web server — it communicates with Claude Code over STDIO and calls your apps' Actuator endpoints over HTTP.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1773595012250/acdcd28b-5f67-4091-bed1-048b6994d8e8.png" alt="Diagram" /></p>
<p>When we are done, you can open Claude Code and have conversations like this:</p>
<pre><code>You: Is localhost:<span class="hljs-number">8080</span> healthy?
Claude: [calls check-health(appName=<span class="hljs-string">"localhost:8080"</span>)]
        <span class="hljs-attr">localhost</span>:<span class="hljs-number">8080</span> (http:<span class="hljs-comment">//localhost:8080) is UP</span>
        {<span class="hljs-string">"status"</span>:<span class="hljs-string">"UP"</span>,<span class="hljs-string">"components"</span>:{<span class="hljs-string">"db"</span>:{<span class="hljs-string">"status"</span>:<span class="hljs-string">"UP"</span>,<span class="hljs-string">"details"</span>:
        {<span class="hljs-string">"database"</span>:<span class="hljs-string">"PostgreSQL"</span>,<span class="hljs-string">"validationQuery"</span>:<span class="hljs-string">"isValid()"</span>}},...}}

<span class="hljs-attr">You</span>: Check all apps
<span class="hljs-attr">Claude</span>: [calls check-health()]
        <span class="hljs-attr">localhost</span>:<span class="hljs-number">8080</span> (http:<span class="hljs-comment">//localhost:8080): UP</span>
        localhost:<span class="hljs-number">8081</span> (http:<span class="hljs-comment">//localhost:8081): UP</span>

You: What is the JVM memory usage on localhost:<span class="hljs-number">8080</span>?
Claude: [calls get-metric(appName=<span class="hljs-string">"localhost:8080"</span>, metricName=<span class="hljs-string">"jvm.memory.used"</span>)]
        <span class="hljs-attr">localhost</span>:<span class="hljs-number">8080</span> — jvm.memory.used: {<span class="hljs-string">"name"</span>:<span class="hljs-string">"jvm.memory.used"</span>,
        <span class="hljs-string">"measurements"</span>:[{<span class="hljs-string">"statistic"</span>:<span class="hljs-string">"VALUE"</span>,<span class="hljs-string">"value"</span>:<span class="hljs-number">1.34217728E8</span>}],
        <span class="hljs-string">"baseUnit"</span>:<span class="hljs-string">"bytes"</span>}
</code></pre><p>This is a practical pattern. Every Spring Boot app ships with Actuator. After this tutorial, you can point this MCP server at any running Spring Boot application and monitor it through natural language.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<ul>
<li><strong>Java 21</strong> or later</li>
<li><strong>Spring Boot 4.0</strong> (GA, released November 2025)</li>
<li><strong>Spring AI 2.0.0-M2</strong> (current milestone as of March 2026)</li>
<li><strong>Claude Code</strong> installed (<a target="_blank" href="https://code.claude.com">code.claude.com</a>)</li>
<li>Basic familiarity with Spring Boot</li>
</ul>
<blockquote>
<p><strong>Note</strong>: Spring AI 2.0 is at milestone 2, not GA yet. APIs may change before the final release — no official GA date has been announced, but mid-2026 is a reasonable community estimate. The annotation-based approach shown here has been stable since M1.</p>
</blockquote>
<h2 id="heading-project-setup">Project Setup</h2>
<p>Go to <a target="_blank" href="https://start.spring.io">start.spring.io</a> and configure:</p>
<ul>
<li><strong>Project</strong>: Maven</li>
<li><strong>Language</strong>: Java</li>
<li><strong>Spring Boot</strong>: 4.0.x</li>
<li><strong>Group</strong>: <code>com.hochbichler</code></li>
<li><strong>Artifact</strong>: <code>mcp-actuator</code></li>
<li><strong>Java</strong>: 21</li>
<li><strong>Dependencies</strong>: Spring Web</li>
</ul>
<p>Download and unzip. Then add the Spring AI MCP Server dependency to your <code>pom.xml</code>:</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">dependencyManagement</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">dependencies</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">dependency</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">groupId</span>&gt;</span>org.springframework.ai<span class="hljs-tag">&lt;/<span class="hljs-name">groupId</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">artifactId</span>&gt;</span>spring-ai-bom<span class="hljs-tag">&lt;/<span class="hljs-name">artifactId</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">version</span>&gt;</span>2.0.0-M2<span class="hljs-tag">&lt;/<span class="hljs-name">version</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">type</span>&gt;</span>pom<span class="hljs-tag">&lt;/<span class="hljs-name">type</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">scope</span>&gt;</span>import<span class="hljs-tag">&lt;/<span class="hljs-name">scope</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">dependency</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">dependencies</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">dependencyManagement</span>&gt;</span>

<span class="hljs-tag">&lt;<span class="hljs-name">dependencies</span>&gt;</span>
    <span class="hljs-comment">&lt;!-- Spring AI MCP Server (STDIO transport, no web server) --&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">dependency</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">groupId</span>&gt;</span>org.springframework.ai<span class="hljs-tag">&lt;/<span class="hljs-name">groupId</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">artifactId</span>&gt;</span>spring-ai-starter-mcp-server<span class="hljs-tag">&lt;/<span class="hljs-name">artifactId</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">dependency</span>&gt;</span>

    <span class="hljs-comment">&lt;!-- RestClient for calling Actuator endpoints on target apps --&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">dependency</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">groupId</span>&gt;</span>org.springframework.boot<span class="hljs-tag">&lt;/<span class="hljs-name">groupId</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">artifactId</span>&gt;</span>spring-boot-starter-web<span class="hljs-tag">&lt;/<span class="hljs-name">artifactId</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">dependency</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">dependencies</span>&gt;</span>
</code></pre>
<blockquote>
<p><strong>Watch out</strong>: older tutorials reference <code>spring-ai-mcp-server-spring-boot-starter</code>. That artifact name was renamed in Spring AI 1.0.0-M7. The correct name is <code>spring-ai-starter-mcp-server</code>.</p>
</blockquote>
<p>We include <code>spring-boot-starter-web</code> for <code>RestClient</code> — Spring Boot 4's modern HTTP client. The web server will not conflict with STDIO because we explicitly set <code>spring.main.web-application-type=none</code>, disabling the embedded web server.</p>
<p>Now configure <code>application.properties</code>:</p>
<pre><code class="lang-properties"># MCP Server configuration
spring.ai.mcp.server.stdio=true
spring.ai.mcp.server.type=SYNC
spring.ai.mcp.server.annotation-scanner.enabled=true

# Application name
spring.application.name=mcp-actuator

# No web server — STDIO only
spring.main.web-application-type=none
</code></pre>
<p>Three MCP properties and one explicit web-type override. That is all the framework configuration you need.</p>
<ul>
<li><code>stdio=true</code> — use STDIO transport (Claude Code launches your JAR as a subprocess)</li>
<li><code>type=SYNC</code> — synchronous server (filters out any <code>Mono</code>/<code>Flux</code> return types)</li>
<li><code>annotation-scanner.enabled=true</code> — auto-discover <code>@McpTool</code> methods at startup</li>
<li><code>web-application-type=none</code> — no embedded web server (required when using <code>spring-boot-starter-web</code> alongside the STDIO transport)</li>
</ul>
<p>The target app URLs are passed as CLI arguments: <code>--apps=http://localhost:8080,http://localhost:8081</code>. We will parse those next.</p>
<h3 id="heading-app-registry-parsing-cli-arguments">App Registry: Parsing CLI Arguments</h3>
<p>Create an <code>AppRegistry</code> component that parses the <code>--apps</code> argument and stores the target applications:</p>
<pre><code class="lang-java"><span class="hljs-keyword">package</span> com.hochbichler.mcpactuator;

<span class="hljs-keyword">import</span> java.net.URI;
<span class="hljs-keyword">import</span> java.util.Collections;
<span class="hljs-keyword">import</span> java.util.LinkedHashMap;
<span class="hljs-keyword">import</span> java.util.Map;

<span class="hljs-keyword">import</span> org.springframework.beans.factory.annotation.Value;
<span class="hljs-keyword">import</span> org.springframework.stereotype.Component;

<span class="hljs-meta">@Component</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">AppRegistry</span> </span>{

    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> Map&lt;String, String&gt; apps = <span class="hljs-keyword">new</span> LinkedHashMap&lt;&gt;();

    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-title">AppRegistry</span><span class="hljs-params">(<span class="hljs-meta">@Value("${apps:}")</span> String appsArg)</span> </span>{
        <span class="hljs-keyword">if</span> (!appsArg.isBlank()) {
            <span class="hljs-keyword">for</span> (String url : appsArg.split(<span class="hljs-string">","</span>)) {
                url = url.trim();
                String name = extractAppName(url);
                apps.put(name, url);
            }
        }
    }

    <span class="hljs-function"><span class="hljs-keyword">private</span> String <span class="hljs-title">extractAppName</span><span class="hljs-params">(String url)</span> </span>{
        URI uri = URI.create(url);
        String host = uri.getHost();
        <span class="hljs-keyword">int</span> port = uri.getPort();
        <span class="hljs-keyword">return</span> host + (port &gt; <span class="hljs-number">0</span> ? <span class="hljs-string">":"</span> + port : <span class="hljs-string">""</span>);
    }

    <span class="hljs-function"><span class="hljs-keyword">public</span> Map&lt;String, String&gt; <span class="hljs-title">getApps</span><span class="hljs-params">()</span> </span>{
        <span class="hljs-keyword">return</span> Collections.unmodifiableMap(apps);
    }

    <span class="hljs-function"><span class="hljs-keyword">public</span> String <span class="hljs-title">getUrl</span><span class="hljs-params">(String appName)</span> </span>{
        <span class="hljs-keyword">return</span> apps.get(appName);
    }
}
</code></pre>
<p>Spring Boot maps <code>--apps=value</code> on the command line to the <code>apps</code> property. The <code>@Value("${apps:}")</code> annotation injects it with an empty default. The registry derives a name from each URL — <code>localhost:8080</code>, <code>localhost:8081</code>, etc. — and stores the mapping.</p>
<p>Run the server with:</p>
<pre><code class="lang-bash">java -jar mcp-actuator.jar --apps=http://localhost:8080,http://localhost:8081
</code></pre>
<h2 id="heading-your-first-mcp-tool-health-check">Your First MCP Tool: Health Check</h2>
<p>Create a new class <code>ActuatorMcpTools.java</code>:</p>
<pre><code class="lang-java"><span class="hljs-keyword">package</span> com.hochbichler.mcpactuator;

<span class="hljs-keyword">import</span> java.util.Map;

<span class="hljs-keyword">import</span> org.springaicommunity.mcp.annotation.McpTool;
<span class="hljs-keyword">import</span> org.springaicommunity.mcp.annotation.McpToolParam;
<span class="hljs-keyword">import</span> org.springframework.stereotype.Component;
<span class="hljs-keyword">import</span> org.springframework.web.client.RestClient;
<span class="hljs-keyword">import</span> org.springframework.web.client.RestClientException;

<span class="hljs-meta">@Component</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ActuatorMcpTools</span> </span>{

    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> AppRegistry appRegistry;
    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> RestClient restClient;

    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-title">ActuatorMcpTools</span><span class="hljs-params">(AppRegistry appRegistry)</span> </span>{
        <span class="hljs-keyword">this</span>.appRegistry = appRegistry;
        <span class="hljs-keyword">this</span>.restClient = RestClient.create();
    }

    <span class="hljs-meta">@McpTool(
        name = "check-health",
        description = "Check the health of a monitored Spring Boot application. "
            + "Leave appName empty to check all apps.")</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> String <span class="hljs-title">checkHealth</span><span class="hljs-params">(
            <span class="hljs-meta">@McpToolParam(description = "App name (e.g. localhost:8080) or leave empty to check all",
                          required = false)</span>
            String appName)</span> </span>{

        <span class="hljs-keyword">if</span> (appName == <span class="hljs-keyword">null</span> || appName.isBlank()) {
            <span class="hljs-keyword">return</span> checkAllApps();
        }

        String url = appRegistry.getUrl(appName);
        <span class="hljs-keyword">if</span> (url == <span class="hljs-keyword">null</span>) {
            <span class="hljs-keyword">return</span> <span class="hljs-string">"Unknown app: "</span> + appName
                + <span class="hljs-string">". Registered apps: "</span> + appRegistry.getApps().keySet();
        }

        <span class="hljs-keyword">return</span> fetchHealth(appName, url);
    }

    <span class="hljs-function"><span class="hljs-keyword">private</span> String <span class="hljs-title">checkAllApps</span><span class="hljs-params">()</span> </span>{
        <span class="hljs-keyword">var</span> sb = <span class="hljs-keyword">new</span> StringBuilder();
        <span class="hljs-keyword">for</span> (<span class="hljs-keyword">var</span> entry : appRegistry.getApps().entrySet()) {
            sb.append(fetchHealth(entry.getKey(), entry.getValue())).append(<span class="hljs-string">"\n"</span>);
        }
        <span class="hljs-keyword">return</span> sb.toString().trim();
    }

    <span class="hljs-function"><span class="hljs-keyword">private</span> String <span class="hljs-title">fetchHealth</span><span class="hljs-params">(String name, String url)</span> </span>{
        <span class="hljs-keyword">try</span> {
            String response = restClient.get()
                .uri(url + <span class="hljs-string">"/actuator/health"</span>)
                .retrieve()
                .body(String.class);
            <span class="hljs-keyword">return</span> name + <span class="hljs-string">" ("</span> + url + <span class="hljs-string">"): "</span> + response;
        } <span class="hljs-keyword">catch</span> (RestClientException e) {
            <span class="hljs-keyword">return</span> name + <span class="hljs-string">" ("</span> + url + <span class="hljs-string">"): DOWN — "</span> + e.getMessage();
        }
    }
}
</code></pre>
<blockquote>
<p><strong>Why <code>org.springaicommunity</code>?</strong> The <code>@McpTool</code> and <code>@McpToolParam</code> annotations are not yet included in the official Spring AI 2.0.0-M2 starters. They live in the <a target="_blank" href="https://github.com/spring-ai-community/mcp-annotations">spring-ai-community/mcp-annotations</a> incubating project (<code>org.springaicommunity:spring-ai-mcp-annotations</code>). Once they graduate into mainline Spring AI (expected in a later milestone), the package will change to <code>org.springframework.ai.mcp.annotation</code>. For now, add the community dependency to your <code>pom.xml</code>:</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">dependency</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">groupId</span>&gt;</span>org.springaicommunity<span class="hljs-tag">&lt;/<span class="hljs-name">groupId</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">artifactId</span>&gt;</span>spring-ai-mcp-annotations<span class="hljs-tag">&lt;/<span class="hljs-name">artifactId</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">version</span>&gt;</span>0.0.3<span class="hljs-tag">&lt;/<span class="hljs-name">version</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">dependency</span>&gt;</span>
</code></pre>
</blockquote>
<p>That is it. The <code>@McpTool</code> annotation tells Spring AI to:</p>
<ol>
<li>Register this method as an MCP tool named <code>check-health</code></li>
<li>Generate a JSON schema from the method signature (including the optional <code>appName</code> parameter)</li>
<li>Make it callable by any connected MCP client</li>
</ol>
<p>The <code>description</code> field is important. MCP clients show this to the AI model so it knows <strong>when</strong> to use the tool. Be specific.</p>
<p>Notice the error handling: if a target app is down or unreachable, we catch the <code>RestClientException</code> and report it as <code>DOWN</code> instead of crashing. The MCP server stays healthy even when monitored apps are not.</p>
<h2 id="heading-adding-a-metrics-tool">Adding a Metrics Tool</h2>
<p>Add this method to the same <code>ActuatorMcpTools</code> class:</p>
<pre><code class="lang-java"><span class="hljs-meta">@McpTool(
    name = "get-metric",
    description = "Get a specific metric from a monitored app. "
        + "Common metrics: jvm.memory.used, http.server.requests, "
        + "system.cpu.usage, process.uptime",
    annotations = @McpTool.McpAnnotations(
        readOnlyHint = true,
        destructiveHint = false
    ))</span>
<span class="hljs-function"><span class="hljs-keyword">public</span> String <span class="hljs-title">getMetric</span><span class="hljs-params">(
        <span class="hljs-meta">@McpToolParam(description = "App name (e.g. localhost:8080)",
                      required = true)</span>
        String appName,
        <span class="hljs-meta">@McpToolParam(description = "Metric name, e.g. jvm.memory.used",
                      required = true)</span>
        String metricName)</span> </span>{

    String url = appRegistry.getUrl(appName);
    <span class="hljs-keyword">if</span> (url == <span class="hljs-keyword">null</span>) {
        <span class="hljs-keyword">return</span> <span class="hljs-string">"Unknown app: "</span> + appName
            + <span class="hljs-string">". Registered apps: "</span> + appRegistry.getApps().keySet();
    }

    <span class="hljs-keyword">try</span> {
        String response = restClient.get()
            .uri(url + <span class="hljs-string">"/actuator/metrics/"</span> + metricName)
            .retrieve()
            .body(String.class);
        <span class="hljs-keyword">return</span> appName + <span class="hljs-string">" — "</span> + metricName + <span class="hljs-string">": "</span> + response;
    } <span class="hljs-keyword">catch</span> (RestClientException e) {
        <span class="hljs-keyword">return</span> <span class="hljs-string">"Failed to fetch "</span> + metricName + <span class="hljs-string">" from "</span> + appName
            + <span class="hljs-string">": "</span> + e.getMessage();
    }
}
</code></pre>
<p>A few things to notice:</p>
<p><strong><code>@McpToolParam</code></strong> adds metadata to each parameter. The <code>description</code> tells the AI model what format to use. The <code>required = true</code> flag means the client must provide this value. Both <code>appName</code> and <code>metricName</code> are required here — unlike <code>check-health</code>, which makes <code>appName</code> optional for the "check all" convenience.</p>
<p><strong><code>@McpTool.McpAnnotations</code></strong> was introduced in Spring AI 1.1 via the community annotations project and is available in Spring AI 2.0. The <code>readOnlyHint</code> tells the client this tool does not change any state. The <code>destructiveHint = false</code> confirms it is safe. These hints help AI models decide when to call your tools without asking for confirmation.</p>
<p><strong>The description lists common metric names.</strong> This is a practical trick: when the AI model reads the tool description, it knows which values are valid. Without this, the model has to guess or ask the user.</p>
<p>Let us also add a tool to list all available metrics for a given app:</p>
<pre><code class="lang-java"><span class="hljs-meta">@McpTool(
    name = "list-metrics",
    description = "List all available metric names for a monitored app",
    annotations = @McpTool.McpAnnotations(readOnlyHint = true))</span>
<span class="hljs-function"><span class="hljs-keyword">public</span> String <span class="hljs-title">listMetrics</span><span class="hljs-params">(
        <span class="hljs-meta">@McpToolParam(description = "App name (e.g. localhost:8080)",
                      required = true)</span>
        String appName)</span> </span>{

    String url = appRegistry.getUrl(appName);
    <span class="hljs-keyword">if</span> (url == <span class="hljs-keyword">null</span>) {
        <span class="hljs-keyword">return</span> <span class="hljs-string">"Unknown app: "</span> + appName
            + <span class="hljs-string">". Registered apps: "</span> + appRegistry.getApps().keySet();
    }

    <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">return</span> restClient.get()
            .uri(url + <span class="hljs-string">"/actuator/metrics"</span>)
            .retrieve()
            .body(String.class);
    } <span class="hljs-keyword">catch</span> (RestClientException e) {
        <span class="hljs-keyword">return</span> <span class="hljs-string">"Failed to fetch metrics from "</span> + appName + <span class="hljs-string">": "</span> + e.getMessage();
    }
}
</code></pre>
<h2 id="heading-exposing-app-info-as-an-mcp-resource">Exposing App Info as an MCP Resource</h2>
<p>Before adding the resource, it helps to understand why MCP distinguishes resources from tools at all — and why health status is <em>not</em> a good fit for a resource.</p>
<p>The official MCP specification draws a clear line: <strong>tools are model-controlled</strong>, <strong>resources are application-driven</strong>.</p>
<ul>
<li>A <strong>tool</strong> is something the AI <em>invokes</em> — it decides when to call it, picks the arguments, and acts on the result. Tools are designed for interaction: querying a database, calling an API, running a computation.</li>
<li>A <strong>resource</strong> is something the AI (or the host application) <em>reads</em> — it is a URI-addressable piece of context: a file, a schema, a configuration snapshot. The MCP spec says resources exist to "share data that provides context to language models".</li>
</ul>
<p>The key question when choosing between the two is: <strong>how often does this data change?</strong></p>
<ul>
<li><strong>Health status</strong> changes every few seconds — an app can go from UP to DOWN while you are mid-conversation. If you expose health as a resource, Claude might read it once at the start and act on stale data. Health belongs as a <strong>tool</strong>: invoked on demand, always fresh.</li>
<li><strong>Build info</strong> (version number, artifact name, git commit) is written at compile time and never changes while the app is running. This is safe to read once as background context. App info belongs as a <strong>resource</strong>.</li>
</ul>
<p>The rule of thumb: expose data as a <strong>resource</strong> when it is stable during runtime (versions, registered apps, configuration). Expose it as a <strong>tool</strong> when it changes frequently or requires parameters to fetch a specific value.</p>
<p>Add a new class <code>ActuatorMcpResources.java</code>:</p>
<pre><code class="lang-java"><span class="hljs-keyword">package</span> com.hochbichler.mcpactuator;

<span class="hljs-keyword">import</span> org.springaicommunity.mcp.annotation.McpResource;
<span class="hljs-keyword">import</span> org.springframework.stereotype.Component;
<span class="hljs-keyword">import</span> org.springframework.web.client.RestClient;
<span class="hljs-keyword">import</span> org.springframework.web.client.RestClientException;

<span class="hljs-meta">@Component</span>
<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ActuatorMcpResources</span> </span>{

    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> AppRegistry appRegistry;
    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> RestClient restClient;

    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-title">ActuatorMcpResources</span><span class="hljs-params">(AppRegistry appRegistry)</span> </span>{
        <span class="hljs-keyword">this</span>.appRegistry = appRegistry;
        <span class="hljs-keyword">this</span>.restClient = RestClient.create();
    }

    <span class="hljs-meta">@McpResource(
        uri = "apps://info",
        name = "App Registry",
        description = "Registered apps and their static build info from /actuator/info")</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> String <span class="hljs-title">getAppInfo</span><span class="hljs-params">()</span> </span>{
        <span class="hljs-keyword">var</span> sb = <span class="hljs-keyword">new</span> StringBuilder();
        <span class="hljs-keyword">for</span> (<span class="hljs-keyword">var</span> entry : appRegistry.getApps().entrySet()) {
            String name = entry.getKey();
            String url = entry.getValue();
            sb.append(<span class="hljs-string">"=== "</span>).append(name).append(<span class="hljs-string">" ===\n"</span>);
            sb.append(<span class="hljs-string">"URL: "</span>).append(url).append(<span class="hljs-string">"\n"</span>);
            <span class="hljs-keyword">try</span> {
                String info = restClient.get()
                    .uri(url + <span class="hljs-string">"/actuator/info"</span>)
                    .retrieve()
                    .body(String.class);
                sb.append(info).append("\n");
            } <span class="hljs-keyword">catch</span> (RestClientException e) {
                sb.append("info: not available\n");
            }
            sb.append("\n");
        }
        <span class="hljs-keyword">return</span> sb.toString().trim();
    }
}
</code></pre>
<p>When Claude Code connects, it can read <code>apps://info</code> to learn which apps are registered and what versions they are running — without you having to ask. That context is stable for the entire session. When you then ask "why is the order service slow?", Claude already knows the service exists and what version it is; it only needs to call the <code>check-health</code> or <code>get-metric</code> tools for the live data.</p>
<h2 id="heading-connecting-to-claude-code">Connecting to Claude Code</h2>
<p>Build the JAR:</p>
<pre><code class="lang-bash">./mvnw clean package -DskipTests
</code></pre>
<p>Make sure your target Spring Boot applications are running with Actuator enabled. For example, if you have two apps running on ports 8080 and 8081, add the MCP server to Claude Code:</p>
<pre><code class="lang-bash">claude mcp add --transport stdio spring-actuator \
  -- java -jar /absolute/path/to/target/mcp-actuator-0.0.1-SNAPSHOT.jar \
  --apps=http://localhost:8080,http://localhost:8081
</code></pre>
<blockquote>
<p><strong>Important</strong>: use the absolute path to your JAR. Relative paths break because Claude Code launches the process from a different working directory.</p>
</blockquote>
<p>Verify the connection inside Claude Code:</p>
<pre><code>/mcp
</code></pre><p>You should see <code>spring-actuator</code> listed with status "connected" and three tools: <code>check-health</code>, <code>get-metric</code>, <code>list-metrics</code>.</p>
<p>The architecture here is key: Claude Code launches the MCP server JAR as a subprocess (via STDIO). The MCP server itself does <strong>not</strong> run a web server — it is a lightweight process that calls the Actuator endpoints on your target apps over HTTP using <code>RestClient</code>. Your actual applications run independently and just need Actuator exposed.</p>
<p>Try it:</p>
<pre><code>You: Is the order service healthy?
You: What metrics are available on localhost:<span class="hljs-number">8080</span>?
You: How much JVM memory is the order service using?
</code></pre><p>Claude Code calls your MCP tools, which fetch the Actuator data from the target apps over HTTP, and responds in natural language.</p>
<h3 id="heading-spring-boot-cold-start-and-mcptimeout">Spring Boot Cold Start and MCP_TIMEOUT</h3>
<p>You might hit a connection timeout on first launch. Spring Boot needs a few seconds to start, and Claude Code's default MCP timeout may be too short.</p>
<p>Fix it by setting the timeout before starting Claude Code:</p>
<pre><code class="lang-bash">MCP_TIMEOUT=10000 claude
</code></pre>
<p>This gives your server 10 seconds to start. We will address this properly in the native image section.</p>
<h2 id="heading-when-things-go-wrong">When Things Go Wrong</h2>
<p>Here are the issues I have run into and how to fix them:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Problem</td><td>Cause</td><td>Fix</td></tr>
</thead>
<tbody>
<tr>
<td>Tool does not appear in <code>/mcp</code></td><td>Method returns <code>Mono&lt;T&gt;</code> but server type is <code>SYNC</code></td><td>Change return type to a plain type, or set <code>spring.ai.mcp.server.type=ASYNC</code></td></tr>
<tr>
<td><code>Connection closed</code> error</td><td>Wrong JAR path or JAR does not exist</td><td>Use absolute path, run <code>ls</code> on the JAR to verify</td></tr>
<tr>
<td><code>ENOENT</code> on Windows</td><td>Windows cannot execute <code>java</code> directly via STDIO</td><td>Use <code>cmd /c java -jar ...</code> as the command</td></tr>
<tr>
<td>Tool exists but AI never calls it</td><td>Description is too vague</td><td>Make the description specific. List valid input values if possible</td></tr>
<tr>
<td><code>annotation-scanner</code> finds nothing</td><td>Class is missing <code>@Component</code></td><td>Add <code>@Component</code> to your tool class. It must be a Spring-managed bean</td></tr>
</tbody>
</table>
</div><p><strong>Where to find logs</strong>: Claude Code logs MCP communication. Run <code>/mcp</code> and check the server status. For Spring-side logs, add <code>logging.level.org.springframework.ai.mcp=DEBUG</code> to <code>application.properties</code>.</p>
<p><strong>The silent async trap</strong>: if you write a method that returns <code>Mono&lt;String&gt;</code> in a <code>SYNC</code> server, Spring AI drops it with a warning in the startup log. No error. The tool just does not show up. Check your startup logs if tools are missing.</p>
<h2 id="heading-beyond-stdio-http-transport-for-teams">Beyond STDIO: HTTP Transport for Teams</h2>
<p>STDIO works great for local development. But it requires Claude Code to launch your JAR as a subprocess. For team use or remote servers, switch to <strong>Streamable HTTP</strong> transport.</p>
<p><strong>Step 1</strong>: swap the dependency in <code>pom.xml</code>:</p>
<pre><code class="lang-xml"><span class="hljs-comment">&lt;!-- Replace spring-ai-starter-mcp-server with: --&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">dependency</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">groupId</span>&gt;</span>org.springframework.ai<span class="hljs-tag">&lt;/<span class="hljs-name">groupId</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">artifactId</span>&gt;</span>spring-ai-starter-mcp-server-webmvc<span class="hljs-tag">&lt;/<span class="hljs-name">artifactId</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">dependency</span>&gt;</span>
</code></pre>
<p><strong>Step 2</strong>: update <code>application.properties</code>:</p>
<pre><code class="lang-properties"># Remove: spring.ai.mcp.server.stdio=true
# Add:
spring.ai.mcp.server.protocol=STREAMABLE
server.port=8081
</code></pre>
<p><strong>Step 3</strong>: start the app normally and connect Claude Code:</p>
<pre><code class="lang-bash">./mvnw spring-boot:run
claude mcp add --transport http spring-actuator http://localhost:8081/mcp
</code></pre>
<p><strong>When to use which</strong>:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td>STDIO</td><td>Streamable HTTP</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Use when</strong></td><td>Local dev, single user</td><td>Team use, remote servers, CI/CD</td></tr>
<tr>
<td><strong>Startup</strong></td><td>Claude Code launches the JAR</td><td>You run the server independently</td></tr>
<tr>
<td><strong>Networking</strong></td><td>None (in-process pipes)</td><td>HTTP, can run anywhere</td></tr>
<tr>
<td><strong>Trade-off</strong></td><td>Cold start delay</td><td>Need to manage a running server</td></tr>
</tbody>
</table>
</div><h2 id="heading-native-image-instant-startup-for-mcp-servers">Native Image: Instant Startup for MCP Servers</h2>
<p>The STDIO cold start problem has a clean solution: <strong>GraalVM native image</strong>.</p>
<p>A Spring Boot 4 native image compiles your app ahead of time into a standalone binary. The result: startup in ~100ms instead of 3-5 seconds. No JVM needed at runtime. For a minimal MCP server like ours, the binary is typically around 50-80 MB (larger apps with more dependencies can exceed 100 MB).</p>
<p>For an MCP server — a small, single-purpose tool — this is a perfect fit.</p>
<p>Add the GraalVM native support to your <code>pom.xml</code> (Spring Boot 4 includes the plugin by default, you just need to activate the profile):</p>
<pre><code class="lang-bash">./mvnw -Pnative native:compile
</code></pre>
<p>This produces a binary at <code>target/mcp-actuator</code>. Add it to Claude Code without the <code>java -jar</code> wrapper:</p>
<pre><code class="lang-bash">claude mcp add --transport stdio spring-actuator \
  -- /absolute/path/to/target/mcp-actuator \
  --apps=http://localhost:8080,http://localhost:8081
</code></pre>
<p>No more <code>MCP_TIMEOUT</code> workaround. The server starts before Claude Code even finishes sending the initialization handshake.</p>
<p><strong>Trade-offs</strong>:</p>
<ul>
<li>Build time is significantly longer (2-5 minutes vs seconds for a regular JAR)</li>
<li>Reflection-based libraries may need GraalVM configuration hints</li>
<li>Spring Boot 4's improved AOT engine handles most cases automatically, but test your native build before relying on it</li>
<li>You need GraalVM installed locally or use a CI pipeline with native image support</li>
</ul>
<p>For local development, stick with the regular JAR and <code>MCP_TIMEOUT</code>. Use native image for the version you distribute to your team or deploy as a shared tool.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>The architectural choices worth remembering:</p>
<ul>
<li><strong>STDIO for local dev, Streamable HTTP for teams</strong> — same tool code, different transport dependency</li>
<li><strong>Tools for live data, resources for stable context</strong> — health status changes every second; build info does not</li>
<li><strong>Native image eliminates the cold start problem</strong> — a 100ms startup means no more <code>MCP_TIMEOUT</code> hacks</li>
</ul>
<p>Every Spring Boot application already ships with Actuator — point this MCP server at any running instance and you can monitor it through natural language. To extend this further, add tools for <code>/actuator/env</code>, <code>/actuator/loggers</code>, or <code>/actuator/threaddump</code> using the same <code>@McpTool</code> pattern. Fork the companion code at <a target="_blank" href="https://github.com/thomas-hochbichler/spring-ai-mcp-actuator">spring-ai-mcp-actuator</a> and try it with your own services.</p>
]]></content:encoded></item><item><title><![CDATA[ROPC Is Dead: How to Get User Tokens Without It]]></title><description><![CDATA[ROPC Is Dead: How to Get User Tokens Without It
A practical migration guide for headless CLIs and APIs that need user-context tokens now that OAuth2 ROPC is prohibited.
Who this is for: Developers who use grant_type=password in CLIs, APIs, or scripts...]]></description><link>https://blog.hochbichler.com/ropc-is-dead-how-to-get-user-tokens-without-it</link><guid isPermaLink="true">https://blog.hochbichler.com/ropc-is-dead-how-to-get-user-tokens-without-it</guid><dc:creator><![CDATA[Thomas Hochbichler]]></dc:creator><pubDate>Tue, 10 Mar 2026 00:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1773165127500/3c6a80a0-ca03-499c-90d2-521127917968.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-ropc-is-dead-how-to-get-user-tokens-without-it">ROPC Is Dead: How to Get User Tokens Without It</h1>
<p><em>A practical migration guide for headless CLIs and APIs that need user-context tokens now that OAuth2 ROPC is prohibited.</em></p>
<p><strong>Who this is for</strong>: Developers who use <code>grant_type=password</code> in CLIs, APIs, or scripts and need to migrate to a supported OAuth flow.</p>
<p><strong>Reading time</strong>: ~18 minutes | <strong>Companion repository</strong>: <a target="_blank" href="https://github.com/thomas-hochbichler/ropc-alternative-flows-poc">ropc-alternative-flows-poc</a> <em>(Spring Boot 3.4 + Keycloak PoC)</em></p>
<p><strong>TL;DR</strong>: RFC 9700 (January 2025) says ROPC "MUST NOT be used" — the strongest prohibition the IETF has. OAuth 2.1 removes it entirely. If your CLI or headless API collects usernames and passwords to get tokens via <code>grant_type=password</code>, you need to migrate. The Device Authorization Grant (RFC 8628) is the primary replacement for headless scenarios. Auth Code + PKCE with a localhost redirect works when a browser is available. This guide walks through both with complete HTTP examples, a decision tree, and a migration checklist.</p>
<h2 id="heading-1-what-happened-to-ropc">1. What Happened to ROPC?</h2>
<p>RFC 9700, titled "Best Current Practice for OAuth 2.0 Security", was published in January 2025. Section 2.4 is direct:</p>
<blockquote>
<p><strong>"The resource owner password credentials grant MUST NOT be used."</strong></p>
</blockquote>
<p>That's IETF "MUST NOT" — not a suggestion, not a deprecation warning. It's the strongest prohibition in RFC vocabulary. The reasoning: ROPC exposes user credentials directly to the client application, bypasses MFA, enables credential stuffing, and eliminates any chance of the authorization server enforcing its own security policies.</p>
<p>OAuth 2.1 (draft-ietf-oauth-v2-1-15, current as of March 2026) goes even further. ROPC isn't deprecated; it's <em>removed entirely</em>. Section 1.3 lists three grant types: authorization code, refresh token, and client credentials. ROPC simply doesn't exist anymore.</p>
<p>This article won't repeat the security arguments. <a target="_blank" href="https://www.scottbrady.io/oauth/why-the-resource-owner-password-credentials-grant-type-is-not-authentication-nor-suitable-for-modern-applications">Scott Brady's breakdown</a> lists 9 specific problems, and <a target="_blank" href="https://workos.com/blog/oauth-best-practices">WorkOS's RFC 9700 summary</a> covers the standards context well. Instead, this guide focuses on what you should use instead.</p>
<h2 id="heading-2-the-real-problem-user-tokens-without-a-browser">2. The Real Problem: User Tokens Without a Browser</h2>
<p>If you're reading this, you probably chose ROPC for a reason. You had a CLI, a headless API, or a batch script that needed to act on behalf of a specific user. <code>grant_type=password</code> was one HTTP request:</p>
<pre><code class="lang-http"><span class="hljs-keyword">POST</span> <span class="hljs-string">/oauth/token</span> HTTP/1.1
<span class="hljs-attribute">Host</span>: auth.example.com
<span class="hljs-attribute">Content-Type</span>: application/x-www-form-urlencoded

<span class="solidity">grant_type<span class="hljs-operator">=</span>password<span class="hljs-operator">&amp;</span>username<span class="hljs-operator">=</span>alice<span class="hljs-operator">&amp;</span>password<span class="hljs-operator">=</span>s3cret<span class="hljs-operator">&amp;</span>client_id<span class="hljs-operator">=</span>my<span class="hljs-operator">-</span>cli<span class="hljs-operator">&amp;</span>scope<span class="hljs-operator">=</span>openid</span>
</code></pre>
<p>One POST. One response. You have a token. No browser, no redirects, no local server. It was genuinely simple, and that simplicity is exactly why it was dangerous. The client application never needed to see Alice's password, but ROPC forced it to act as an intermediary that handles her credentials directly.</p>
<p>Now every recommended alternative seems to require a browser. If you're running a CLI over SSH into a headless server, "just add a redirect" isn't helpful.</p>
<p>The good news: the industry already solved this problem. GitHub CLI, Azure CLI, and AWS CLI all authenticate users from headless environments without ROPC. The patterns exist. You just need to know which one fits your scenario.</p>
<h2 id="heading-3-decision-tree-which-flow-replaces-ropc">3. Decision Tree: Which Flow Replaces ROPC?</h2>
<p>Not every ROPC migration looks the same. Your replacement flow depends on two questions: <em>does the user's machine have a browser?</em> and <em>does the token need to represent a specific user?</em></p>
<pre><code>Does the token need to represent a specific user?
├── No → Client Credentials (grant_type=client_credentials)
│         You never needed ROPC. Service accounts work fine.
│
└── Yes → Is a browser available on the device running the CLI?
    ├── Yes → Authorization Code + PKCE <span class="hljs-keyword">with</span> localhost redirect
    │         CLI opens browser, catches the callback on localhost.
    │         (Section <span class="hljs-number">5</span>)
    │
    └── No  → What kind <span class="hljs-keyword">of</span> usage?
        ├── Interactive (user present) → Device Authorization Grant
        │   CLI shows a URL + code, user authenticates elsewhere.
        │   (Section <span class="hljs-number">4</span>)
        │
        └── Non-interactive (CI/CD, scripts) → Personal Access Tokens
            User generates a PAT via web UI, configures it <span class="hljs-keyword">in</span> CLI.
            (Section <span class="hljs-number">7</span>)
</code></pre><p><strong>One more scenario</strong>: if your service already has a user token and needs to call a downstream API preserving user identity, that's <strong>Token Exchange</strong> (RFC 8693). See Section 6.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Scenario</td><td>Flow</td><td>Section</td></tr>
</thead>
<tbody>
<tr>
<td>Headless CLI, user present</td><td>Device Authorization Grant</td><td>4</td></tr>
<tr>
<td>CLI with browser available</td><td>Auth Code + PKCE (localhost)</td><td>5</td></tr>
<tr>
<td>Service-to-service, user context</td><td>Token Exchange / On-Behalf-Of</td><td>6</td></tr>
<tr>
<td>Scripts, CI/CD, automation</td><td>Personal Access Tokens</td><td>7</td></tr>
<tr>
<td>No user context needed</td><td>Client Credentials</td><td>N/A</td></tr>
</tbody>
</table>
</div><h2 id="heading-4-the-device-authorization-grant-your-primary-ropc-replacement">4. The Device Authorization Grant — Your Primary ROPC Replacement</h2>
<p>The <strong>Device Authorization Grant</strong> (RFC 8628) was designed precisely for devices that can't open a browser — smart TVs, IoT sensors, and yes, headless CLIs. It's what GitHub CLI uses when you run <code>gh auth login</code>.</p>
<p>Here is how it works at a high level: your CLI requests a short code from the authorization server and displays it to the user. The user then opens a browser on any other device (their phone, their laptop) and enters that code to authenticate. Meanwhile, the CLI keeps checking the token endpoint until the user finishes.</p>
<h3 id="heading-the-full-flow-step-by-step">The Full Flow: Step by Step</h3>
<p><strong>Step 1: Request device and user codes</strong></p>
<p>Your CLI sends a POST to the authorization server's device authorization endpoint:</p>
<pre><code class="lang-http"><span class="hljs-keyword">POST</span> <span class="hljs-string">/realms/my-realm/protocol/openid-connect/auth/device</span> HTTP/1.1
<span class="hljs-attribute">Host</span>: keycloak.example.com
<span class="hljs-attribute">Content-Type</span>: application/x-www-form-urlencoded

<span class="solidity">client_id<span class="hljs-operator">=</span>my<span class="hljs-operator">-</span>cli<span class="hljs-operator">-</span>app<span class="hljs-operator">&amp;</span>scope<span class="hljs-operator">=</span>openid profile offline_access</span>
</code></pre>
<p><strong>Step 2: Server responds with codes</strong></p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"device_code"</span>: <span class="hljs-string">"df1b060b-4e36-4bbe-98aa-5dcb11909f5f"</span>,
  <span class="hljs-attr">"user_code"</span>: <span class="hljs-string">"DRTD-NTJC"</span>,
  <span class="hljs-attr">"verification_uri"</span>: <span class="hljs-string">"https://keycloak.example.com/realms/my-realm/device"</span>,
  <span class="hljs-attr">"verification_uri_complete"</span>: <span class="hljs-string">"https://keycloak.example.com/realms/my-realm/device?user_code=DRTD-NTJC"</span>,
  <span class="hljs-attr">"expires_in"</span>: <span class="hljs-number">600</span>,
  <span class="hljs-attr">"interval"</span>: <span class="hljs-number">5</span>
}
</code></pre>
<p>Key fields:</p>
<ul>
<li><strong><code>device_code</code></strong>: Backend identifier for polling — never shown to the user</li>
<li><strong><code>user_code</code></strong>: Short, human-readable code like <code>DRTD-NTJC</code> — this is what the user types</li>
<li><strong><code>verification_uri</code></strong>: Where the user goes to authenticate</li>
<li><strong><code>interval</code></strong>: Minimum seconds between poll requests (respect this or get rate-limited)</li>
<li><strong><code>expires_in</code></strong>: The codes expire after this many seconds (typically 600)</li>
</ul>
<p><strong>Step 3: Display instructions to the user</strong></p>
<p>Your CLI prints something like:</p>
<pre><code>To sign <span class="hljs-keyword">in</span>, open https:<span class="hljs-comment">//keycloak.example.com/realms/my-realm/device</span>
and enter code: DRTD-NTJC

Waiting <span class="hljs-keyword">for</span> authentication...
</code></pre><p>Some CLIs copy the code to the clipboard automatically (GitHub CLI does this). If <code>verification_uri_complete</code> is available, you can also display a QR code.</p>
<p><strong>Step 4: User authenticates on another device</strong></p>
<p>The user opens the URL on any device with a browser — their phone, a laptop, whatever. They enter the user code, log in with their credentials (including MFA if configured), and approve the authorization. This happens entirely on the authorization server's own login page. Your CLI never sees the password.</p>
<p><strong>Step 5: CLI polls the token endpoint</strong></p>
<p>While the user is authenticating, your CLI polls:</p>
<pre><code class="lang-http"><span class="hljs-keyword">POST</span> <span class="hljs-string">/realms/my-realm/protocol/openid-connect/token</span> HTTP/1.1
<span class="hljs-attribute">Host</span>: keycloak.example.com
<span class="hljs-attribute">Content-Type</span>: application/x-www-form-urlencoded

<span class="solidity">grant_type<span class="hljs-operator">=</span>urn:ietf:params:oauth:grant<span class="hljs-operator">-</span><span class="hljs-keyword">type</span>:device_code
<span class="hljs-operator">&amp;</span>device_code<span class="hljs-operator">=</span>df1b060b<span class="hljs-number">-4e36</span><span class="hljs-operator">-</span>4bbe<span class="hljs-operator">-</span>98aa<span class="hljs-operator">-</span>5dcb11909f5f
<span class="hljs-operator">&amp;</span>client_id<span class="hljs-operator">=</span>my<span class="hljs-operator">-</span>cli<span class="hljs-operator">-</span>app</span>
</code></pre>
<p>Before the user completes login, you get:</p>
<pre><code class="lang-json">HTTP/<span class="hljs-number">1.1</span> <span class="hljs-number">400</span> Bad Request
{ <span class="hljs-attr">"error"</span>: <span class="hljs-string">"authorization_pending"</span> }
</code></pre>
<p>This is expected — keep polling at the <code>interval</code> rate.</p>
<p><strong>Step 6: Receive tokens</strong></p>
<p>After the user authorizes:</p>
<pre><code class="lang-json">HTTP/<span class="hljs-number">1.1</span> <span class="hljs-number">200</span> OK
{
  <span class="hljs-attr">"access_token"</span>: <span class="hljs-string">"eyJhbGciOiJSUzI1NiIs..."</span>,
  <span class="hljs-attr">"refresh_token"</span>: <span class="hljs-string">"eyJhbGciOiJIUzI1NiIs..."</span>,
  <span class="hljs-attr">"token_type"</span>: <span class="hljs-string">"Bearer"</span>,
  <span class="hljs-attr">"expires_in"</span>: <span class="hljs-number">3600</span>,
  <span class="hljs-attr">"scope"</span>: <span class="hljs-string">"openid profile offline_access"</span>
}
</code></pre>
<p>You now have a user-context token — representing the specific user who authenticated — without ever touching their password.</p>
<h3 id="heading-handling-polling-errors">Handling Polling Errors</h3>
<p>Your polling loop needs to handle four error codes:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Error</td><td>Meaning</td><td>What to Do</td></tr>
</thead>
<tbody>
<tr>
<td><code>authorization_pending</code></td><td>User hasn't finished authenticating</td><td>Keep polling at the same interval</td></tr>
<tr>
<td><code>slow_down</code></td><td>You're polling too frequently</td><td>Add 5 seconds to your interval</td></tr>
<tr>
<td><code>expired_token</code></td><td>The codes expired (user took too long)</td><td>Restart the entire flow from Step 1</td></tr>
<tr>
<td><code>access_denied</code></td><td>User denied the authorization</td><td>Stop polling, show an error message</td></tr>
</tbody>
</table>
</div><p>On connection timeouts, use exponential backoff. Don't send rapid repeated requests to the endpoint.</p>
<h3 id="heading-ropc-vs-device-flow-side-by-side">ROPC vs. Device Flow — Side by Side</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Property</td><td>ROPC</td><td>Device Flow</td></tr>
</thead>
<tbody>
<tr>
<td>Credentials exposed to client</td><td>Yes — client sees the password</td><td>No — user authenticates on IdP's page</td></tr>
<tr>
<td>MFA support</td><td>No</td><td>Yes — IdP handles MFA natively</td></tr>
<tr>
<td>SSO support</td><td>No</td><td>Yes — same IdP session across apps</td></tr>
<tr>
<td>Phishing resistance</td><td>None</td><td>Higher — codes entered on trusted domain</td></tr>
<tr>
<td>Browser on same device</td><td>Not required</td><td>Not required</td></tr>
<tr>
<td>User context in token</td><td>Yes</td><td>Yes</td></tr>
<tr>
<td>Complexity</td><td>1 HTTP request</td><td>Polling loop + user interaction</td></tr>
</tbody>
</table>
</div><p>You trade simplicity for security. One HTTP request becomes a polling loop. But your CLI never sees a password, MFA works without any extra configuration, and the authorization server stays in control of the login experience.</p>
<h3 id="heading-security-note-device-code-phishing-storm-2372">Security Note: Device Code Phishing (Storm-2372)</h3>
<p>The Device Authorization Grant is not immune to phishing attacks. In February 2025, Microsoft reported that a Russia-linked group called <strong>Storm-2372</strong> used device code flows to steal tokens. The attack works like this:</p>
<ol>
<li>Attacker generates a legitimate device code</li>
<li>Sends the user_code to victims via WhatsApp, Signal, or Teams</li>
<li>Victim enters the code on the real IdP login page</li>
<li>Attacker's device receives the token via polling</li>
</ol>
<p><strong>Mitigations</strong>:</p>
<ul>
<li>Restrict device code flow to applications that truly need it (Conditional Access policies in Entra ID, client-level toggles in Keycloak)</li>
<li>Educate users: device codes should only come from actions <em>you</em> initiated</li>
<li>Deploy anomaly detection for unusual device code patterns</li>
<li>Monitor token usage after device code grants</li>
</ul>
<p>This does not make Device Flow worse than ROPC. ROPC hands the password directly to the client, which is a bigger risk. But you should be aware of this attack vector when adopting Device Flow.</p>
<h3 id="heading-companion-poc-spring-boot-34-keycloak">Companion PoC: Spring Boot 3.4 + Keycloak</h3>
<blockquote>
<p><strong>Hands-on reference</strong>: The <a target="_blank" href="https://github.com/thomas-hochbichler/ropc-alternative-flows-poc">ropc-alternative-flows-poc</a> repository demonstrates a complete Device Authorization Flow implementation using <strong>Spring Boot 3.4</strong> and <strong>Keycloak 26</strong>. It includes a headless CLI client that polls for tokens, a protected resource server, Keycloak realm configuration, and Docker Compose for local development. Clone it and run the full flow locally in under 10 minutes.</p>
</blockquote>
<h2 id="heading-5-auth-code-pkce-with-localhost-redirect">5. Auth Code + PKCE with Localhost Redirect</h2>
<p>When a browser is available on the same machine (for example, a developer workstation rather than a headless server), <strong>Authorization Code + PKCE with a localhost redirect</strong> provides a better user experience. The CLI opens the system browser, the user logs in, and the browser redirects back to a temporary local server that receives the authorization code.</p>
<p>This is what Azure CLI (<code>az login</code>) and AWS CLI (<code>aws sso login</code>) use by default.</p>
<h3 id="heading-how-it-works">How It Works</h3>
<ol>
<li>CLI generates a PKCE <code>code_verifier</code> (random 43-128 character string) and derives <code>code_challenge</code> (SHA256, base64url-encoded)</li>
<li>CLI starts a temporary HTTP server on <code>http://localhost:&lt;port&gt;</code></li>
<li>CLI opens the system browser to the authorization URL</li>
<li>User authenticates on the IdP's login page</li>
<li>IdP redirects to <code>http://localhost:&lt;port&gt;?code=&lt;auth_code&gt;&amp;state=&lt;state&gt;</code></li>
<li>Local server catches the redirect, extracts the authorization code</li>
<li>CLI exchanges authorization code + <code>code_verifier</code> for tokens</li>
<li>Local server shuts down</li>
</ol>
<h3 id="heading-code-example-nodejs-with-openid-client-5x">Code Example: Node.js with openid-client 5.x</h3>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { Issuer, generators } <span class="hljs-keyword">from</span> <span class="hljs-string">'openid-client'</span>; <span class="hljs-comment">// v5.7.0</span>
<span class="hljs-keyword">import</span> http <span class="hljs-keyword">from</span> <span class="hljs-string">'node:http'</span>;
<span class="hljs-keyword">import</span> open <span class="hljs-keyword">from</span> <span class="hljs-string">'open'</span>; <span class="hljs-comment">// v10.1.0</span>

<span class="hljs-keyword">const</span> REDIRECT_PORT = <span class="hljs-number">6363</span>;
<span class="hljs-keyword">const</span> REDIRECT_URI = <span class="hljs-string">`http://localhost:<span class="hljs-subst">${REDIRECT_PORT}</span>`</span>;

<span class="hljs-comment">// 1. Discover IdP endpoints via OIDC discovery</span>
<span class="hljs-keyword">const</span> issuer = <span class="hljs-keyword">await</span> Issuer.discover(<span class="hljs-string">'https://keycloak.example.com/realms/my-realm'</span>);
<span class="hljs-keyword">const</span> client = <span class="hljs-keyword">new</span> issuer.Client({
  <span class="hljs-attr">client_id</span>: <span class="hljs-string">'my-cli-app'</span>,
  <span class="hljs-attr">redirect_uris</span>: [REDIRECT_URI],
  <span class="hljs-attr">response_types</span>: [<span class="hljs-string">'code'</span>],
  <span class="hljs-attr">token_endpoint_auth_method</span>: <span class="hljs-string">'none'</span>, <span class="hljs-comment">// public client — no client secret</span>
});

<span class="hljs-comment">// 2. Generate PKCE values</span>
<span class="hljs-keyword">const</span> codeVerifier = generators.codeVerifier();
<span class="hljs-keyword">const</span> codeChallenge = generators.codeChallenge(codeVerifier);

<span class="hljs-comment">// 3. Build authorization URL</span>
<span class="hljs-keyword">const</span> authUrl = client.authorizationUrl({
  <span class="hljs-attr">scope</span>: <span class="hljs-string">'openid profile offline_access'</span>,
  <span class="hljs-attr">code_challenge</span>: codeChallenge,
  <span class="hljs-attr">code_challenge_method</span>: <span class="hljs-string">'S256'</span>,
});

<span class="hljs-comment">// 4. Start localhost server, open browser, exchange code for tokens</span>
<span class="hljs-keyword">const</span> tokenSet = <span class="hljs-keyword">await</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Promise</span>(<span class="hljs-function">(<span class="hljs-params">resolve, reject</span>) =&gt;</span> {
  <span class="hljs-keyword">const</span> server = http.createServer(<span class="hljs-keyword">async</span> (req, res) =&gt; {
    <span class="hljs-keyword">try</span> {
      <span class="hljs-keyword">const</span> params = client.callbackParams(req);
      <span class="hljs-keyword">const</span> tokens = <span class="hljs-keyword">await</span> client.oauthCallback(REDIRECT_URI, params, {
        <span class="hljs-attr">code_verifier</span>: codeVerifier,
      });
      res.writeHead(<span class="hljs-number">200</span>, { <span class="hljs-string">'Content-Type'</span>: <span class="hljs-string">'text/html'</span> });
      res.end(<span class="hljs-string">'&lt;h1&gt;Authenticated. You can close this tab.&lt;/h1&gt;'</span>);
      server.close();
      resolve(tokens);
    } <span class="hljs-keyword">catch</span> (err) {
      res.writeHead(<span class="hljs-number">500</span>);
      res.end(<span class="hljs-string">'Authentication failed.'</span>);
      server.close();
      reject(err);
    }
  });

  server.listen(REDIRECT_PORT, <span class="hljs-function">() =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Listening on <span class="hljs-subst">${REDIRECT_URI}</span>`</span>);
    open(authUrl); <span class="hljs-comment">// opens system browser</span>
  });
});

<span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Access token:'</span>, tokenSet.access_token);
</code></pre>
<h3 id="heading-port-conflict-handling">Port Conflict Handling</h3>
<p>Your localhost server might fail to bind if the port is already in use. Two strategies:</p>
<ul>
<li><strong>Fixed port with retry</strong>: Try a predefined list of ports (e.g., 6363, 6364, 6365). Register all of them as valid redirect URIs in your IdP.</li>
<li><strong>Dynamic port</strong>: Bind to port 0 (OS assigns a free port). This requires your IdP to support wildcard or dynamic redirect URIs — most don't.</li>
</ul>
<p>Terraform's docs recommend configuring ports 10000-10010 as the redirect port range. It works, even if the approach is simple.</p>
<h3 id="heading-when-to-choose-this-over-device-flow">When to Choose This Over Device Flow</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Factor</td><td>Device Flow</td><td>Localhost PKCE</td></tr>
</thead>
<tbody>
<tr>
<td>Browser on same machine</td><td>Not required</td><td>Required</td></tr>
<tr>
<td>UX</td><td>Copy code → switch to browser</td><td>Browser opens automatically</td></tr>
<tr>
<td>Headless server / SSH</td><td>Works</td><td>Doesn't work</td></tr>
<tr>
<td>Port conflicts</td><td>None</td><td>Possible</td></tr>
<tr>
<td>Phishing risk</td><td>Device code phishing (Storm-2372)</td><td>Localhost redirect harder to intercept</td></tr>
</tbody>
</table>
</div><p><strong>The industry pattern</strong>: default to localhost PKCE when a browser is detected, fall back to Device Flow when it isn't. Azure CLI and AWS CLI both do this.</p>
<h2 id="heading-6-what-the-big-clis-actually-do">6. What the Big CLIs Actually Do</h2>
<p>These aren't theoretical alternatives. Here is how major CLI tools actually handle authentication today:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>CLI Tool</td><td>Default Flow</td><td>Fallback</td><td>Notes</td></tr>
</thead>
<tbody>
<tr>
<td><strong>GitHub CLI</strong> (<code>gh auth login</code>)</td><td>Device code flow</td><td><code>--with-token</code> for PATs</td><td>Displays code (clipboard copy via <code>--clipboard</code> flag). Uses <code>https://github.com/login/device</code>.</td></tr>
<tr>
<td><strong>Azure CLI</strong> (<code>az login</code>)</td><td>Auth Code + PKCE (opens browser)</td><td><code>--use-device-code</code></td><td>Switched <em>to</em> browser-based as the default in recent versions. Windows defaults to WAM since v2.61.0.</td></tr>
<tr>
<td><strong>AWS CLI</strong> (<code>aws sso login</code>)</td><td>Auth Code + PKCE (since v2.22.0)</td><td><code>--use-device-code</code></td><td>Switched from device code to PKCE as default.</td></tr>
<tr>
<td><strong>Terraform</strong> (<code>terraform login</code>)</td><td>Auth Code + PKCE (localhost)</td><td>N/A</td><td>Uses localhost with configurable port range. No refresh tokens.</td></tr>
<tr>
<td><strong>kubelogin</strong> (kubectl OIDC)</td><td>Auth Code + PKCE (opens browser)</td><td>N/A</td><td>Caches ID + refresh tokens locally.</td></tr>
</tbody>
</table>
</div><p>The trend is clear: <strong>Auth Code + PKCE by default, Device Flow as fallback, PATs for automation</strong>. This dual-mode approach is the new industry standard.</p>
<h2 id="heading-7-token-exchange-and-personal-access-tokens">7. Token Exchange and Personal Access Tokens</h2>
<p>Two more alternatives complete the overview. Neither is a direct ROPC replacement for end-user login, but both solve scenarios where developers previously used ROPC.</p>
<h3 id="heading-token-exchange-on-behalf-of-rfc-8693">Token Exchange / On-Behalf-Of (RFC 8693)</h3>
<p>This solves a different problem: your API already received a user token (from Device Flow, Auth Code, etc.) and needs to call a downstream API while preserving the user's identity.</p>
<pre><code class="lang-http"><span class="hljs-keyword">POST</span> <span class="hljs-string">/oauth/token</span> HTTP/1.1
<span class="hljs-attribute">Host</span>: auth.example.com
<span class="hljs-attribute">Content-Type</span>: application/x-www-form-urlencoded

<span class="solidity">grant_type<span class="hljs-operator">=</span>urn:ietf:params:oauth:grant<span class="hljs-operator">-</span><span class="hljs-keyword">type</span>:token<span class="hljs-operator">-</span>exchange
<span class="hljs-operator">&amp;</span>subject_token<span class="hljs-operator">=</span>eyJhbGciOiJSUzI1NiIs...
<span class="hljs-operator">&amp;</span>subject_token_type<span class="hljs-operator">=</span>urn:ietf:params:oauth:token<span class="hljs-operator">-</span><span class="hljs-keyword">type</span>:access_token
<span class="hljs-operator">&amp;</span>audience<span class="hljs-operator">=</span>https:<span class="hljs-comment">//downstream-api.example.com</span>
<span class="hljs-operator">&amp;</span>scope<span class="hljs-operator">=</span>read write</span>
</code></pre>
<p>The authorization server issues a new token that represents the same user but is scoped for the downstream API. Supported by Entra ID (as "On-Behalf-Of"), Okta, Keycloak, and Auth0.</p>
<p><strong>Use Token Exchange when</strong>: a microservice needs to call another service on behalf of the user who initiated the request. <strong>Don't use it as</strong> a standalone login mechanism — it requires an existing user token as input.</p>
<h3 id="heading-personal-access-tokens-pats">Personal Access Tokens (PATs)</h3>
<p>When a full OAuth flow is more than you need (CI/CD pipelines, simple scripts, automation), PATs are the practical choice. The user generates a token through a web UI and pastes it into their CLI config or an environment variable.</p>
<p>GitHub, GitLab, npm, and Docker Hub all use this pattern. GitHub's fine-grained PATs even let you scope permissions per-repository.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Factor</td><td>PATs</td><td>OAuth Flow (Device/PKCE)</td></tr>
</thead>
<tbody>
<tr>
<td>Setup complexity</td><td>Low — generate in web UI</td><td>Higher — implement flow in client</td></tr>
<tr>
<td>Token lifetime</td><td>Long-lived</td><td>Short-lived access + refresh</td></tr>
<tr>
<td>Revocation</td><td>Manual via web UI</td><td>Automatic expiration</td></tr>
<tr>
<td>MFA enforcement</td><td>At creation time only</td><td>At each authentication</td></tr>
<tr>
<td>Best for</td><td>Scripts, CI/CD, simple tools</td><td>Production CLIs, user-facing apps</td></tr>
</tbody>
</table>
</div><p><strong>Use PATs when</strong>: the "user" is a build pipeline or a one-off script that doesn't need interactive login. <strong>Don't use them as</strong> a general-purpose ROPC replacement in production CLIs — they lack automatic expiration and per-session MFA.</p>
<h2 id="heading-8-identity-provider-support-matrix">8. Identity Provider Support Matrix</h2>
<p>Before you choose a flow, verify that your identity provider (IdP) supports it. Device Flow has broad support, with one notable exception.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>IdP</td><td>Device Flow</td><td>Auth Code + PKCE</td><td>Token Exchange</td><td>Notes</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Keycloak 26.2+</strong></td><td>Native</td><td>Native</td><td>Native (GA since 26.2; preview in 26.0)</td><td>Enable "OAuth 2.0 Device Authorization Grant" per client. Configure code lifespan and polling interval.</td></tr>
<tr>
<td><strong>Microsoft Entra ID</strong></td><td>Native</td><td>Native</td><td>Native (OBO)</td><td>Restrict via Conditional Access. Use <code>--use-device-code</code> in Azure CLI.</td></tr>
<tr>
<td><strong>Auth0</strong></td><td>Native</td><td>Native</td><td>Native (Token Vault)</td><td>Enable Device Authorization grant on the application settings page.</td></tr>
<tr>
<td><strong>Okta</strong></td><td>Native</td><td>Native</td><td>Native</td><td>Enable grant type on the app + authorization server policy rule.</td></tr>
<tr>
<td><strong>AWS Cognito</strong></td><td><strong>Not native</strong></td><td>Native</td><td>Not native</td><td>Device Flow requires a <a target="_blank" href="https://aws.amazon.com/blogs/security/implement-oauth-2-0-device-grant-flow-by-using-amazon-cognito-and-aws-lambda/">Lambda + DynamoDB workaround</a>.</td></tr>
</tbody>
</table>
</div><p>If you use Cognito and need Device Flow, the AWS-provided workaround uses Lambda to implement the flow on top of Cognito. It works, but it requires significantly more infrastructure than native support. You should evaluate whether switching to a different IdP would reduce complexity enough to justify the effort.</p>
<h2 id="heading-9-migration-checklist">9. Migration Checklist</h2>
<p>Follow these steps to move from ROPC to a modern flow without breaking existing users.</p>
<h3 id="heading-step-1-audit">Step 1: Audit</h3>
<ul>
<li>[ ] Search your codebase for <code>grant_type=password</code></li>
<li>[ ] Identify every client that uses ROPC — CLI tools, SDKs, internal scripts, CI/CD pipelines</li>
<li>[ ] Document which of those need user-context tokens vs. service accounts (client credentials)</li>
<li>[ ] Check your IdP's ROPC deprecation timeline (some will force-disable it)</li>
</ul>
<h3 id="heading-step-2-choose-your-flow">Step 2: Choose Your Flow</h3>
<p>Use the decision tree from Section 3:</p>
<ul>
<li>Headless + user present → Device Authorization Grant</li>
<li>Browser available → Auth Code + PKCE (localhost redirect)</li>
<li>Non-interactive automation → PATs or Client Credentials</li>
<li>Service-to-service user propagation → Token Exchange</li>
</ul>
<h3 id="heading-step-3-implement">Step 3: Implement</h3>
<ul>
<li>[ ] Register a new OAuth client in your IdP with the appropriate grant type enabled</li>
<li>[ ] For Device Flow: implement the polling loop with proper error handling (<code>authorization_pending</code>, <code>slow_down</code>, <code>expired_token</code>, <code>access_denied</code>)</li>
<li>[ ] For Auth Code + PKCE: implement the localhost redirect server with port fallback</li>
<li>[ ] Request <code>offline_access</code> scope to get refresh tokens — your CLI shouldn't re-authenticate on every invocation</li>
<li>[ ] Store tokens securely (OS keychain, encrypted file, not plaintext in <code>~/.config</code>)</li>
</ul>
<h3 id="heading-step-4-parallel-run">Step 4: Parallel Run</h3>
<ul>
<li>[ ] Ship the new flow alongside ROPC (e.g., <code>--use-device-code</code> flag)</li>
<li>[ ] Log ROPC usage to track migration progress</li>
<li>[ ] Communicate the deprecation timeline to your users. Give them at least one release cycle to switch</li>
</ul>
<h3 id="heading-step-5-deprecate-ropc">Step 5: Deprecate ROPC</h3>
<ul>
<li>[ ] Remove <code>grant_type=password</code> from client code</li>
<li>[ ] Disable ROPC on the IdP (Keycloak: uncheck "Direct Access Grants Enabled"; Entra ID: block via Conditional Access)</li>
<li>[ ] Verify no remaining clients are using ROPC via IdP logs</li>
</ul>
<h2 id="heading-whats-next">What's Next</h2>
<p>Pick one client that uses <code>grant_type=password</code> and migrate it to Device Flow this week. Start with the <a target="_blank" href="https://github.com/thomas-hochbichler/ropc-alternative-flows-poc">ropc-alternative-flows-poc</a> PoC if you want a working reference — it has a Spring Boot 3.4 resource server, a Keycloak 26 realm, and a CLI client that demonstrates the full polling flow.</p>
<p>If your CLI runs on machines with browsers, implement the dual-mode pattern: Auth Code + PKCE by default, Device Flow via a <code>--use-device-code</code> flag. That's what Azure CLI and AWS CLI converged on, and it covers every deployment scenario.</p>
<p>ROPC was simple. Its replacements require more steps, but they are fundamentally safer. Your CLI stops handling user passwords and becomes what it should have been from the start: a token consumer that never touches credentials.</p>
<p>If this article saved you time, consider buying me a coffee:</p>
<p><a target="_blank" href="https://buymeacoffee.com/thomas.hochbichler"><img src="https://img.buymeacoffee.com/button-api/?text=Buy%20me%20a%20coffee&amp;emoji=&amp;slug=thomas.hochbichler&amp;button_colour=FFDD00&amp;font_colour=000000&amp;font_family=Cookie&amp;outline_colour=000000&amp;coffee_colour=ffffff" alt="Buy Me a Coffee" /></a></p>
]]></content:encoded></item><item><title><![CDATA[EU-Compliant Claude Code with Mistral: Setup Guide]]></title><description><![CDATA[Series: EU-Compliant Claude Code with Mistral Part 1: Setup Guide (this article) | Part 2: Testing the Limits (coming soon) | Part 3: Alternatives (coming soon)

A practical guide to routing Claude Co]]></description><link>https://blog.hochbichler.com/eu-compliant-claude-code-with-mistral-setup-guide</link><guid isPermaLink="true">https://blog.hochbichler.com/eu-compliant-claude-code-with-mistral-setup-guide</guid><category><![CDATA[claude-code]]></category><category><![CDATA[MistralAI]]></category><category><![CDATA[mistral]]></category><category><![CDATA[#gdpr]]></category><category><![CDATA[llm]]></category><dc:creator><![CDATA[Thomas Hochbichler]]></dc:creator><pubDate>Sun, 08 Mar 2026 20:46:01 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/69a98ea33728a9dc35843f0b/00a019e4-3fd4-490b-a6ff-9f62e9367c8f.svg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p><strong>Series: EU-Compliant Claude Code with Mistral</strong> <strong>Part 1: Setup Guide</strong> (this article) | Part 2: Testing the Limits <em>(coming soon)</em> | Part 3: Alternatives <em>(coming soon)</em></p>
</blockquote>
<p><em>A practical guide to routing Claude Code through Mistral's EU-hosted API — with configuration templates, model recommendations, and presets for cloud and local setups.</em></p>
<p><strong>Reading time</strong>: ~14 minutes | <strong>Companion repository</strong>: <a href="https://github.com/thomas-hochbichler/claude-code-mistral">claude-code-mistral</a></p>
<p><strong>What this part covers:</strong></p>
<ul>
<li><p>What data Claude Code sends and why EU developers should care</p>
</li>
<li><p>Mistral.ai's compliance credentials</p>
</li>
<li><p>Architecture of the claude-code-router proxy</p>
</li>
<li><p>Step-by-step configuration with cloud and local presets</p>
</li>
<li><p>Model selection guide</p>
</li>
<li><p>Troubleshooting and smoke testing your setup</p>
</li>
</ul>
<hr />
<h2>1. Introduction</h2>
<p>Every time you run Claude Code, your source code leaves your machine. File contents, terminal output, directory structures, environment state — all of it streams to an LLM provider for processing. For most developers, that's a reasonable trade-off. For EU-based developers working with client code, personal data, or regulated infrastructure, it's a legal question.</p>
<p>The EU's regulatory framework for data protection has teeth. GDPR fines totalled over EUR 1.2 billion in 2025 alone. The AI Act imposes new obligations on providers of general-purpose AI models. NIS2 demands documented supply chain security assessments. Sending source code to a US-hosted AI provider without addressing these regulations isn't just risky — it's increasingly untenable.</p>
<p>This guide shows you how to keep using Claude Code — the tool you already know — while routing all requests through Mistral.ai's EU-hosted API. The result: your code stays in the EU, processed by a French company headquartered outside US jurisdiction, with SOC 2 Type II, ISO 27001, and ISO 27701 certifications.</p>
<p>The companion repository provides everything you need: a configuration template, cloud and local presets, and an automated setup script. Clone, run, and start coding in under five minutes.</p>
<hr />
<h2>2. The Problem: Your Code Leaves the EU</h2>
<p>Before you configure anything, it's worth understanding exactly what data leaves your machine and why that matters under EU law.</p>
<p>Claude Code operates as an agentic tool. It doesn't just receive the prompt you type — it actively gathers context from your workspace. This includes:</p>
<ul>
<li><p><strong>Source code files</strong>: Full file contents, not snippets. The agent reads, writes, and edits files directly.</p>
</li>
<li><p><strong>Directory structures</strong>: Project layout, import paths, and file relationships.</p>
</li>
<li><p><strong>Terminal output</strong>: Command results, error messages, build output, and test results.</p>
</li>
<li><p><strong>Environment state</strong>: Working directory, shell context, and system information.</p>
</li>
<li><p><strong>Git metadata</strong>: Branch names, commit history, and diff output.</p>
</li>
</ul>
<p>In an agentic workflow, this context gathering is automatic. Claude Code decides which files to read, which commands to run, and which context to include — often pulling in files you didn't explicitly reference. That's by design: it's what makes agentic coding assistants powerful.</p>
<p>Source code routinely contains personal data: hardcoded email addresses, user records in seed files, test fixtures with real names, API keys tied to individuals, and database connection strings. When an agentic tool processes your entire workspace, it processes all of this.</p>
<h3>Why Anonymization Doesn't Solve This</h3>
<p>You can't practically anonymize or sanitize this data before it reaches the LLM. Agentic workflows require full semantic context — complete files with valid import paths, working directory structures, and unmodified terminal output. Strip the personal data and you break the tool. Three specific reasons:</p>
<ol>
<li><p><strong>Semantic integrity</strong>: Code must compile, execute, and maintain valid cross-file references. Stripping personal data breaks functionality.</p>
</li>
<li><p><strong>Automatic context gathering</strong>: The agent decides what context to read — intercepting and sanitizing this in real time isn't practical.</p>
</li>
<li><p><strong>Regulatory opinion</strong>: The EDPB has set a high threshold for demonstrating true anonymization in LLM contexts, requiring rigorous case-by-case assessment (Opinion 28/2024). Personal data protections apply to data processed through language models even when the model itself doesn't store the data.</p>
</li>
</ol>
<h3>The Regulatory Picture</h3>
<p>When source code containing personal data flows to a US-hosted provider, three EU regulations apply:</p>
<p><strong>GDPR</strong>: Cross-border data transfers require a legal basis under Chapter V. The EU-US Data Privacy Framework (DPF) currently provides one, but it remains structurally fragile — the US CLOUD Act creates an unresolvable conflict with GDPR Article 48, and further legal challenges are expected. If you'd rather not monitor the evolving stability of the DPF, eliminating the cross-border transfer entirely is the most robust approach.</p>
<p><strong>AI Act</strong>: Since August 2025, providers of general-purpose AI systems face documentation and due diligence obligations (deployer obligations for high-risk systems follow in August 2026). Choosing a provider with documented compliance credentials simplifies your assessment.</p>
<p><strong>NIS2</strong>: Organizations in regulated sectors must assess the cybersecurity practices of their service providers. An AI coding assistant that processes source code is a supply chain dependency — sending code to a third-party LLM without a documented risk assessment is a compliance gap.</p>
<p>The practical alternative is to ensure the data never leaves the jurisdiction in the first place.</p>
<blockquote>
<p><strong>Disclaimer</strong>: This article is a technical guide for configuring AI development tools. It's not legal advice. For questions about GDPR compliance, data processing obligations, or regulatory requirements specific to your organization, consult a qualified Data Protection Officer (DPO) or legal counsel.</p>
</blockquote>
<hr />
<h2>3. Mistral.ai: Why This Provider</h2>
<p>Mistral AI is a French company legally domiciled in Paris under EU jurisdiction.</p>
<p><strong>EU Data Residency</strong>: Mistral hosts data in the EU by default. The API endpoint <code>https://api.mistral.ai/v1</code> routes through EU infrastructure, with encrypted backups replicated across EU availability zones.</p>
<p><strong>Certifications</strong>:</p>
<ul>
<li><p><strong>SOC 2 Type II</strong>: Independently audited security controls (<a href="https://trust.mistral.ai/resources">trust.mistral.ai</a>)</p>
</li>
<li><p><strong>ISO 27001</strong>: Information security management system</p>
</li>
<li><p><strong>ISO 27701</strong>: Privacy information management (GDPR-aligned)</p>
</li>
</ul>
<p><strong>Data Processing Agreement</strong>: A DPA is available at <a href="https://legal.mistral.ai/terms/data-processing-addendum">legal.mistral.ai</a>, covering GDPR requirements, subprocessor management, Standard Contractual Clauses, and 30-day data deletion on termination.</p>
<p><strong>CLOUD Act exposure</strong>: The US CLOUD Act applies to providers with US legal presence. Unlike AWS, Azure, or Google Cloud — all US-incorporated and fully within CLOUD Act scope — Mistral is a French-headquartered company with no US parent. However, Mistral does maintain a US office and uses US cloud providers for some infrastructure. Organizations with strict sovereignty requirements should review this carefully. For most EU teams using Mistral's EU-hosted API (<code>api.mistral.ai</code>), the practical risk profile is substantially lower than routing through a US-headquartered provider.</p>
<p>This combination — EU residency by default, certifications, a DPA, and a non-US corporate structure — makes Mistral a strong candidate for EU-compliant AI coding workflows.</p>
<p>Mistral also offers <strong>Vibe CLI</strong>, their own open-source AI coding assistant that uses Mistral models natively with no proxy required. We compare Claude Code routing vs. Vibe CLI in detail in <a href="part-3-alternatives.md">Part 3</a>.</p>
<hr />
<h2>4. Architecture: The claude-code-router Proxy</h2>
<p>The routing approach relies on <a href="https://github.com/musistudio/claude-code-router">claude-code-router</a> (CCR), an open-source local proxy with 29,000+ GitHub stars. CCR intercepts Claude Code's API calls and forwards them to alternative LLM providers.</p>
<p><strong>Important</strong>: claude-code-router is a community project, not endorsed by Anthropic. However, the mechanism it uses — the <code>ANTHROPIC_BASE_URL</code> environment variable — is an officially supported Claude Code feature for pointing the CLI at alternative API endpoints.</p>
<h3>How It Works</h3>
<img src="https://cdn.hashnode.com/uploads/covers/69a98ea33728a9dc35843f0b/267e6340-e36a-44ad-ac89-dd738be3da0a.png" alt="" style="display:block;margin:0 auto" />

<ol>
<li><p>Claude Code sends requests in Anthropic Messages API format to <code>localhost:3456</code>.</p>
</li>
<li><p>CCR applies a transformer pipeline: converts Anthropic format to OpenAI-compatible format and strips <code>cache_control</code> fields (the <code>cleancache</code> transformer).</p>
</li>
<li><p>The transformed request forwards to <code>https://api.mistral.ai/v1</code>.</p>
</li>
<li><p>Mistral processes the request and returns a response.</p>
</li>
<li><p>CCR converts the response back to Anthropic format and returns it to Claude Code.</p>
</li>
</ol>
<h3>Why Two Custom Transformers Are Required</h3>
<p>Claude Code sends Anthropic-specific parameters that Mistral's API rejects with 422 errors:</p>
<ol>
<li><p><code>cleancache</code> (built-in): Strips <code>cache_control: {"type": "ephemeral"}</code> metadata from messages — part of Anthropic's prompt caching system.</p>
</li>
<li><p><code>stripreasoning</code> (custom plugin): Strips the <code>reasoning</code> parameter (e.g., <code>{"effort": "high", "enabled": false}</code>) that Claude Code sends for extended thinking configuration.</p>
</li>
</ol>
<p>You need both in the transformer pipeline. Without them, Mistral returns <code>"Extra inputs are not permitted"</code> validation errors. The <code>stripreasoning</code> plugin is included in the companion repository under <code>plugins/strip-reasoning.js</code>.</p>
<h3>Startup</h3>
<p>CCR gives you two startup methods:</p>
<ul>
<li><p><code>ccr code</code> — All-in-one: starts the proxy, reads ~/.claude-code-router/config.json, sets environment variables, and launches Claude Code as a subprocess. This is the recommended approach.</p>
</li>
<li><p><code>ccr start</code> <strong>+</strong> <code>eval "$(ccr activate)"</code> <strong>+</strong> <code>claude</code> — Manual: start the proxy server, export env vars in your shell, then run Claude Code normally. Useful for shell integration.</p>
</li>
</ul>
<hr />
<h2>5. Step-by-Step Configuration Walkthrough</h2>
<blockquote>
<p><strong>Tested with</strong>: claude-code-router v2.0.0 | Node.js 20+ | Claude Code latest <strong>Models verified</strong>: Devstral 2 (2512), Codestral 2 (2501), Mistral Large 3, Mistral Small 3.1</p>
</blockquote>
<blockquote>
<p><strong>Prefer automation?</strong> The <a href="https://github.com/hochbichler/claude-code-mistral">companion repository</a> includes a <code>setup.sh</code> script that handles every step below — install, configuration, and verification — in a single command. Clone it and skip this walkthrough entirely.</p>
</blockquote>
<h3>Prerequisites</h3>
<pre><code class="language-bash">node --version    # Must be &gt;= 20.0.0
claude --version  # Must be installed
echo $MISTRAL_API_KEY  # Must be set
</code></pre>
<p>Get your API key at <a href="https://console.mistral.ai">console.mistral.ai</a>.</p>
<h3>Install claude-code-router</h3>
<pre><code class="language-bash">npm install -g @musistudio/claude-code-router
</code></pre>
<h3>Create the Configuration</h3>
<p>Create <code>~/.claude-code-router/config.json</code> with the following contents. Each field is explained inline:</p>
<pre><code class="language-json">{
  // Passthrough token — not a real secret, just used for
  // Claude Code to authenticate with the local proxy
  "APIKEY": "sk-mistral-router",

  // Enable logging for troubleshooting (disable after verifying)
  "LOG": true,
  "LOG_LEVEL": "info",

  "Providers": [
    {
      "name": "mistral",
      // Full Mistral EU endpoint — CCR uses this URL directly
      "api_base_url": "https://api.mistral.ai/v1/chat/completions",
      // Environment variable — never hardcode your key
      "api_key": "$MISTRAL_API_KEY",
      "models": [
        "devstral-latest",
        "codestral-latest",
        "mistral-large-latest",
        "mistral-small-latest"
      ],
      "transformer": {
        // cleancache: strips Anthropic cache_control fields (422 fix)
        // stripreasoning: strips reasoning params Mistral rejects
        "use": ["cleancache", "stripreasoning"]
      }
    }
  ],

  "Router": {
    // Default coding model — best SWE-bench score
    "default": "mistral,devstral-latest",
    // Lightweight tasks — cost-effective small model
    "background": "mistral,mistral-small-latest",
    // Reasoning-heavy tasks — largest model
    "think": "mistral,mistral-large-latest",
    // Large context requests — 256K window
    "longContext": "mistral,mistral-large-latest",
    // Switch to longContext model above 60K tokens
    "longContextThreshold": 60000
  },

  // Visual confirmation of active model in terminal
  "StatusLine": {
    "enabled": true
  }
}
</code></pre>
<p>The configuration maps four task types to four models:</p>
<table>
<thead>
<tr>
<th>Route</th>
<th>Model</th>
<th>When It's Used</th>
</tr>
</thead>
<tbody><tr>
<td><code>default</code></td>
<td>Devstral</td>
<td>Standard coding tasks (file editing, search, generation)</td>
</tr>
<tr>
<td><code>background</code></td>
<td>Mistral Small</td>
<td>Lightweight background tasks (indexing, summaries)</td>
</tr>
<tr>
<td><code>think</code></td>
<td>Mistral Large</td>
<td>Complex reasoning (plan mode, architecture decisions)</td>
</tr>
<tr>
<td><code>longContext</code></td>
<td>Mistral Large</td>
<td>Requests exceeding 60K tokens</td>
</tr>
</tbody></table>
<h3>Verify</h3>
<pre><code class="language-bash">ccr code
# Run a simple task, then check:
cat ~/.claude-code-router/logs/ccr-*.log | grep "api.mistral.ai"
</code></pre>
<p>The statusline should display the active model name (e.g., <code>devstral-latest</code>). Log entries should show requests to <code>api.mistral.ai</code>.</p>
<hr />
<h2>6. Model Selection Guide</h2>
<p>Mistral offers four models suitable for AI coding workflows. All four support tool-calling — a critical requirement for Claude Code's agentic capabilities (file editing, search, command execution).</p>
<h3>Devstral (<code>devstral-latest</code>)</h3>
<p>The primary coding model. 123B dense transformer with a 256K token context window.</p>
<ul>
<li><p><strong>SWE-bench Verified</strong>: 72.2% (also 61.3% on SWE-bench Multilingual)</p>
</li>
<li><p><strong>Tool-calling</strong>: Full support, on par with best closed models</p>
</li>
<li><p><strong>Pricing</strong>: \(0.40/M input, \)2.00/M output</p>
</li>
<li><p><strong>Best for</strong>: Default route — everyday coding tasks</p>
</li>
</ul>
<p>Devstral 2 is the recommended default. Its combination of coding performance, tool-calling reliability, and cost makes it the strongest choice for the <code>default</code> route.</p>
<h3>Codestral (<code>codestral-latest</code>)</h3>
<p>A code-specialized model with 256K context and fill-in-the-middle (FIM) support.</p>
<ul>
<li><p><strong>FIM</strong>: State-of-the-art (HumanEvalFIM 85.9%)</p>
</li>
<li><p><strong>Tool-calling</strong>: Full function calling and parallel function calling supported</p>
</li>
<li><p><strong>Best for</strong>: Code completion workflows, FIM tasks</p>
</li>
</ul>
<h3>Mistral Large 3 (<code>mistral-large-2512</code>)</h3>
<p>The flagship model. 675B total parameters (41B active) using Mixture of Experts architecture with a 256K context window.</p>
<ul>
<li><p><strong>Architecture</strong>: MoE — only 41B parameters active per inference, keeping latency manageable despite 675B total</p>
</li>
<li><p><strong>Tool-calling</strong>: Native function calling and multi-tool orchestration</p>
</li>
<li><p><strong>Best for</strong>: <code>think</code> and <code>longContext</code> routes — complex reasoning and large codebases</p>
</li>
</ul>
<p><strong>Alias caveat</strong>: <code>mistral-large-latest</code> may still point to Large 2.1 (128K context) rather than Large 3 (256K context). If you need the 256K window reliably, pin to <code>mistral-large-2512</code> explicitly.</p>
<h3>Mistral Small 3.1 (<code>mistral-small-2503</code>)</h3>
<p>A 24B parameter model optimized for low-latency responses with 128K context.</p>
<ul>
<li><p><strong>Tool-calling</strong>: Full support with strong agentic capabilities for its size</p>
</li>
<li><p><strong>Pricing</strong>: Significantly cheaper than larger models</p>
</li>
<li><p><strong>Best for</strong>: <code>background</code> route — lightweight tasks where speed matters more than depth</p>
</li>
</ul>
<h3>Summary: Recommended Route Assignments</h3>
<table>
<thead>
<tr>
<th>Route</th>
<th>Model</th>
<th>Rationale</th>
</tr>
</thead>
<tbody><tr>
<td><code>default</code></td>
<td><code>devstral-latest</code></td>
<td>Best coding benchmark, full tool-calling, good cost</td>
</tr>
<tr>
<td><code>background</code></td>
<td><code>mistral-small-latest</code></td>
<td>Fastest, cheapest, sufficient for simple tasks</td>
</tr>
<tr>
<td><code>think</code></td>
<td><code>mistral-large-latest</code></td>
<td>Strongest reasoning for complex decisions</td>
</tr>
<tr>
<td><code>longContext</code></td>
<td><code>mistral-large-latest</code></td>
<td>Largest context window (256K)</td>
</tr>
</tbody></table>
<hr />
<h2>7. Presets: One-Command Switching Between Cloud and Local</h2>
<p>The manual configuration from Section 5 works, but claude-code-router's preset system offers a more portable approach. A preset is a directory containing a <code>manifest.json</code> — a self-contained configuration package you can install with a single command.</p>
<p>The companion repository includes three presets:</p>
<blockquote>
<p><strong>Note:</strong> CCR's <code>preset install &lt;path&gt;</code> command has two bugs (<a href="https://github.com/musistudio/claude-code-router/issues/1256">#1256</a>): it does not create the preset directory before writing, and its "already installed" guard fires if the directory already exists. The workarounds below copy preset files directly — bypassing the installer for local presets — and use CCR's name-based reconfigure flow only where an interactive prompt is needed (cloud API key).</p>
</blockquote>
<h3>Cloud Preset (<code>presets/mistral-cloud/</code>)</h3>
<p>Routes requests to Mistral's EU-hosted API. During installation, CCR prompts for your API key using a secure <code>password</code> input field — the key is never stored in the manifest file itself.</p>
<pre><code class="language-bash">cp -r presets/mistral-cloud ~/.claude-code-router/presets/
# start coding session with mistral-cloud settings
ccr mistral-cloud
</code></pre>
<p>The cloud preset uses a <code>schema</code> field to define install-time prompts. The <code>{{apiKey}}</code> placeholder in the manifest gets replaced with your input during installation:</p>
<pre><code class="language-json">{
  "schema": [
    {
      "id": "apiKey",
      "type": "password",
      "label": "Mistral API Key",
      "prompt": "Enter your Mistral API key (from console.mistral.ai)"
    }
  ]
}
</code></pre>
<p>This approach keeps the preset file shareable — no secrets in the repository, no environment variables to configure first.</p>
<h3>Local Preset — Ollama (<code>presets/mistral-ollama/</code>)</h3>
<p>Routes requests to a local <a href="https://ollama.com">Ollama</a> instance. Data never leaves your machine — no API key, no cloud dependency.</p>
<p><strong>1. Install Ollama</strong></p>
<p>Download and install Ollama from <a href="https://ollama.com/download">ollama.com</a>. On macOS:</p>
<pre><code class="language-bash">brew install ollama
</code></pre>
<p><strong>2. Pull the model</strong></p>
<pre><code class="language-bash">ollama pull devstral-small:latest
# Downloads ~14 GB — requires at least 16 GB RAM (24 GB recommended)
</code></pre>
<p>Verify the model is available:</p>
<pre><code class="language-bash">ollama list
# NAME                    ID              SIZE    MODIFIED
# devstral-small:latest   abc123...       14 GB   ...
</code></pre>
<p><strong>3. Start Ollama</strong> (if not already running as a background service)</p>
<pre><code class="language-bash">ollama serve
# Ollama is running on http://localhost:11434
</code></pre>
<p><strong>4. Install and activate the preset</strong></p>
<pre><code class="language-bash">cp -r presets/mistral-ollama ~/.claude-code-router/presets/
# start coding session with mistral-ollama settings
ccr mistral-ollama
</code></pre>
<p>The preset targets <code>http://localhost:11434/v1</code> with a dummy <code>api_key</code> of <code>"ollama"</code> — Ollama's OpenAI-compatible server requires no authentication. The <code>schema</code> is empty: no prompts during installation.</p>
<p>The preset omits <code>think</code> and <code>longContext</code> routes. A 24B model on consumer hardware has practical limits — all task types fall back to the <code>default</code> route.</p>
<h3>Local Preset — LM Studio (<code>presets/mistral-lm-studio/</code>)</h3>
<p>Routes requests to a local <a href="https://lmstudio.ai">LM Studio</a> server. LM Studio provides a GUI for browsing, downloading, and running quantized models — no command line required for model management.</p>
<p><strong>1. Install LM Studio</strong></p>
<p>Download from <a href="https://lmstudio.ai">lmstudio.ai</a> and install. LM Studio is available for macOS, Windows, and Linux.</p>
<p><strong>2. Download the model</strong></p>
<p>Open LM Studio and go to the <strong>Models</strong> tab. Search for <code>devstral-small</code> and download <strong>Devstral Small 2</strong> (<code>mistralai/devstral-small-2-2512</code>). The quantized GGUF variant fits in ~14 GB of RAM.</p>
<p><strong>3. Load the model and start the local server</strong></p>
<ul>
<li><p>Go to the <strong>Developer</strong> tab (or <strong>Local Server</strong> in newer versions)</p>
</li>
<li><p>Select <code>mistralai/devstral-small-2-2512</code> from the model dropdown</p>
</li>
<li><p>Click <strong>Start Server</strong></p>
</li>
</ul>
<p>LM Studio's server starts on <code>http://localhost:1234</code> and exposes an OpenAI-compatible API. The model identifier it reports is <code>mistralai/devstral-small-2-2512</code> — this must match what's in the preset manifest, which it already does.</p>
<p><strong>4. Install and activate the preset</strong></p>
<pre><code class="language-bash">cp -r presets/mistral-lm-studio ~/.claude-code-router/presets/
# start coding session with mistral-lm-studio settings
ccr mistral-lm-studio
</code></pre>
<p>Like the Ollama preset, <code>schema</code> is empty and <code>think</code>/<code>longContext</code> routes are omitted. The endpoint targets <code>http://localhost:1234/v1/chat/completions</code> with a dummy <code>api_key</code> of <code>"lm-studio"</code>.</p>
<blockquote>
<p><strong>Model identifier note</strong>: The preset uses <code>mistralai/devstral-small-2-2512</code> as the model ID. If you load a different model in LM Studio, update the <code>models</code> array and <code>Router</code> values in <code>presets/mistral-lm-studio/manifest.json</code> to match the identifier shown in LM Studio's server UI before copying the preset.</p>
</blockquote>
<h3>Switching</h3>
<p>Switching between presets is a single command:</p>
<pre><code class="language-bash"># Start coding session with Mistral Cloud
ccr code # because Mistral Cloud is our default config ~/.claude-code-router/config.json
# Start coding session with Mistral Ollama
ccr mistral-ollama
# Start coding session with Mistral LM-Studio
ccr mistral-lm-studio
</code></pre>
<blockquote>
<p><strong>Important</strong>: When changing the presets or config.json, you have to restart the CCR server</p>
</blockquote>
<pre><code class="language-bash">ccr restart
</code></pre>
<h3>Persistent Shell Integration</h3>
<p>For automatic routing in every terminal session, add the activation to your shell profile:</p>
<pre><code class="language-bash"># Add to ~/.zshrc or ~/.bashrc
export MISTRAL_API_KEY="your-key-here"
eval "$(ccr activate)"
</code></pre>
<p>With this in place, you can use <code>claude</code> directly — all requests route through the Mistral proxy automatically.</p>
<hr />
<h2>8. Troubleshooting Guide</h2>
<p>This section covers the most common issues you'll encounter when routing Claude Code through Mistral via claude-code-router.</p>
<h3>422 API Error: "Extra inputs are not permitted"</h3>
<p><strong>Symptom</strong>: <code>422 Unprocessable Entity</code> with <code>"Extra inputs are not permitted"</code> for <code>cache_control</code> or <code>reasoning</code> fields.</p>
<p><strong>Cause</strong>: Claude Code sends Anthropic-specific parameters that Mistral doesn't recognize:</p>
<ul>
<li><p><code>cache_control: {"type": "ephemeral"}</code> on messages (prompt caching)</p>
</li>
<li><p><code>reasoning: {"effort": "high", "enabled": false}</code> on the request body (extended thinking config)</p>
</li>
</ul>
<p><strong>Fix</strong>: Make sure your <code>config.json</code> includes both transformers in the <code>use</code> array:</p>
<pre><code class="language-json">"transformer": {
  "use": ["cleancache", "stripreasoning"]
}
</code></pre>
<p>Also ensure the custom <code>stripreasoning</code> plugin is registered in the top-level <code>transformers</code> array:</p>
<pre><code class="language-json">"transformers": [
  {"path": "/path/to/.claude-code-router/plugins/strip-reasoning.js"}
]
</code></pre>
<p>The <code>setup.sh</code> script handles this automatically. If you configured manually, copy <code>plugins/strip-reasoning.js</code> from the repo to <code>~/.claude-code-router/plugins/</code> and add both config entries.</p>
<h3>Missing API Key</h3>
<p><strong>Symptom</strong>: Authentication errors or <code>MISTRAL_API_KEY not set</code>.</p>
<p><strong>Fix</strong>: Set the environment variable before starting CCR:</p>
<pre><code class="language-bash">export MISTRAL_API_KEY="your-key-here"
ccr code
</code></pre>
<p>For persistent configuration, add the export to <code>~/.zshrc</code> or <code>~/.bashrc</code>.</p>
<h3>Node.js Version Error</h3>
<p><strong>Symptom</strong>: CCR fails to install or start with compatibility errors.</p>
<p><strong>Fix</strong>: CCR requires Node.js 20+. Check with <code>node --version</code> and update via nvm:</p>
<pre><code class="language-bash">nvm install 20
nvm use 20
</code></pre>
<h3>Connection Timeout</h3>
<p><strong>Symptom</strong>: Requests hang or time out when reaching Mistral's API.</p>
<p><strong>Fix</strong>:</p>
<ol>
<li><p>Verify your API key is valid at <a href="https://console.mistral.ai">console.mistral.ai</a>.</p>
</li>
<li><p>Check network connectivity to <code>api.mistral.ai</code>.</p>
</li>
<li><p>If you're behind a corporate proxy, configure <code>PROXY_URL</code> in <code>config.json</code>.</p>
</li>
</ol>
<h3>Existing Configuration Conflict</h3>
<p><strong>Symptom</strong>: <code>setup.sh</code> warns about an existing configuration, or Claude Code behaves unexpectedly after setup.</p>
<p><strong>Fix</strong>: The setup script creates a backup at <code>~/.claude-code-router/config.json.bak</code> before overwriting. If you need to restore your previous configuration:</p>
<pre><code class="language-bash">cp ~/.claude-code-router/config.json.bak ~/.claude-code-router/config.json
</code></pre>
<h3>Model Deprecation and Alias Changes</h3>
<p><strong>Symptom</strong>: A model ID stops working or behaves differently than expected.</p>
<p><strong>Fix</strong>: Mistral updates model aliases over time. <code>mistral-large-latest</code> and <code>mistral-small-latest</code> will point to newer versions as they're released. If you need consistent behavior, pin to specific version IDs:</p>
<pre><code class="language-json">"default": "mistral,devstral-latest",
"think": "mistral,mistral-large-2512"
</code></pre>
<p>Check Mistral's model documentation for current alias mappings.</p>
<hr />
<h2>9. Smoke Test: Verifying Your Setup</h2>
<p>Before using this setup for real work, run through this quick verification checklist.</p>
<h3>1. Start the proxy</h3>
<pre><code class="language-bash">ccr code
</code></pre>
<p>Confirm: The statusline at the bottom of the terminal displays a Mistral model name (e.g., <code>devstral-latest</code>). If you see no statusline, check that <code>StatusLine.enabled</code> is <code>true</code> in your config.</p>
<h3>2. Run a simple task</h3>
<p>In the Claude Code session, type a straightforward request:</p>
<pre><code class="language-plaintext">Create a file called hello.txt with the text "Hello from Mistral"
</code></pre>
<p>Confirm: Claude Code creates the file. The response completes without errors. The statusline shows the model that handled the request.</p>
<h3>3. Check the logs</h3>
<p>In a separate terminal:</p>
<pre><code class="language-bash">cat ~/.claude-code-router/logs/ccr-*.log | grep "api.mistral.ai"
</code></pre>
<p>Confirm: Log entries show requests to <code>api.mistral.ai</code>. No requests to <code>api.anthropic.com</code>.</p>
<h3>Pass/Fail</h3>
<table>
<thead>
<tr>
<th>Check</th>
<th>Expected</th>
</tr>
</thead>
<tbody><tr>
<td>Statusline shows Mistral model</td>
<td>Model name visible in terminal footer</td>
</tr>
<tr>
<td>Simple task completes</td>
<td>File created, no errors</td>
</tr>
<tr>
<td>Logs show <code>api.mistral.ai</code></td>
<td>All requests routed to EU endpoint</td>
</tr>
<tr>
<td>No <code>api.anthropic.com</code> in logs</td>
<td>Zero requests to Anthropic</td>
</tr>
</tbody></table>
<p>If all four checks pass, your setup is working. You're routing Claude Code through Mistral's EU infrastructure.</p>
<p>If any check fails, refer to the <a href="#8-troubleshooting-guide">Troubleshooting Guide</a> above.</p>
<hr />
<h2>10. Conclusion and What's Next</h2>
<p>You now have Claude Code routing through Mistral's EU-hosted API. Your source code stays within EU borders, processed by a provider with SOC 2 Type II, ISO 27001, and ISO 27701 certifications, with a substantially lower US CLOUD Act exposure than routing through a US-headquartered provider. The same Claude Code workflow — keybindings, MCP servers, skills, CLAUDE.md files — all preserved.</p>
<p>The companion repository at <a href="https://github.com/hochbichler/claude-code-mistral">github.com/hochbichler/claude-code-mistral</a> provides:</p>
<ul>
<li><p><code>setup.sh</code>: Automated setup in under 5 minutes</p>
</li>
<li><p><code>config.json</code>: Pre-configured template with four-model routing</p>
</li>
<li><p><strong>Three presets</strong>: Switch between Mistral's EU API, local Ollama, and local LM Studio with a single command</p>
</li>
</ul>
<p>This is a technical configuration guide, not a compliance certification. Using an EU-hosted provider is one component of a broader data protection strategy. Talk to your DPO or legal counsel about your specific obligations.</p>
<h3>Coming Next</h3>
<p><a href="part-2-testing.md"><strong>Part 2: Testing the Limits</strong></a> <em>(coming soon)</em> — We put this setup through real-world coding tasks: tool-calling reliability per model, MCP server compatibility, skills evaluation, extended thinking behavior, and an honest assessment of what works and what breaks.</p>
<p><a href="part-3-alternatives.md"><strong>Part 3: Beyond Mistral</strong></a> <em>(coming soon)</em> — Alternative EU-compliant setups: Mistral Vibe CLI deep dive, other EU-hosted providers (Scaleway, OVHcloud, Aleph Alpha), self-hosted open-weight models, multi-provider routing, and enterprise deployment patterns.</p>
<hr />
<p>*Written by <a href="https://github.com/thomas-hochbichler">Thomas Hochbichler</a> — I help development teams integrate AI coding tools into compliant workflows.</p>
<p><em>This article is a technical guide for configuring AI development tools. It's not legal advice. For questions about GDPR compliance, data processing obligations, or regulatory requirements specific to your organization, consult a qualified Data Protection Officer (DPO) or legal counsel.</em></p>
]]></content:encoded></item></channel></rss>