Putting Apple's on-device Foundation Models into three native Mac apps

Over the last few weeks I added on-device AI to all three of my macOS apps - a Homebrew GUI (Tappie), a dual-pane file manager (EmpiricCommander), and a Docker and Kubernetes GUI (Zenithal). They run on Apple's Foundation Models framework: the same small language model that powers Apple Intelligence, running entirely on the device. No server, no API key, no per-token bill, and nothing leaves the Mac.

The interesting part was not the API. It was deciding what a small on-device model is actually allowed to do in a tool people use on their real files and containers. Here is what worked, the rule I applied everywhere, and the features I cut because the model could not do them honestly.

One rule everywhere: the model proposes, the app disposes

The framing that made all of this safe is simple. The model never performs an action and never produces a fact the app then trusts. It only ever proposes something, and the proposal is constrained to a set of options the app already knows how to validate. The user confirms through the same UI they would have used by hand. If the model returns garbage, the worst case is a proposal that fails validation and is discarded - never a renamed-into-oblivion folder or a hallucinated security finding.

Concretely, every feature is double-gated. It only appears when SystemLanguageModel.default.availability reports the model is present (Apple silicon, macOS 26, Apple Intelligence turned on), behind a compile-time #available check. When the model is not there, the app behaves exactly as it always did. AI is strictly additive over the deterministic features that were already shipping.

Tappie: plain language into a real filter

Tappie is a GUI over Homebrew. It already had an advanced filter with structured predicates - installed state, outdated, cask vs formula, license, tap, and so on. The AI feature, Smart Filter, does one thing: it turns a phrase like "casks with updates, MIT licensed" into that existing structured filter.

It does not return a list of packages. It returns a filter object, which the deterministic engine then evaluates and previews. The resolved filter shows up as an editable chip, so the user sees exactly what was applied and can adjust it. The model is a parser from English to a schema I already trusted - not the thing deciding which packages match.

EmpiricCommander: classify into a closed set, never compose the dangerous part

The batch rename feature is where this mattered most. Renaming files in bulk is exactly the kind of operation where a clever-but- wrong AI suggestion does real damage. So before adding any AI, I rebuilt the rename engine itself. It used to take raw modes (find/replace, regex, sequential). I replaced those with first-class operations: add prefix or suffix, remove text, change case, number sequentially, change extension - plus a regex escape hatch for power users.

The AI command bar only classifies a plain- language instruction into that closed set of operations and extracts literal arguments (the prefix string, the casing, the start number). It never composes a regular expression. That is a deliberate boundary: the model cannot author the one construct that could silently mangle a thousand filenames. A destructive rule is structurally impossible because the model has no path to produce one - it can only fill in arguments to operations the engine already validates and previews.

The second feature is read-only by construction: AI Summary in the file preview. Select text files - code, config, JSON, CSV, Markdown - and get a short private summary of each without opening them. The input is capped to the model's context window and the app discloses when it truncated. Read-only means there is nothing to undo and nothing to get wrong beyond the summary text itself.

Zenithal: explaining failures and triaging scans

Zenithal manages Docker and Kubernetes. Two AI surfaces shipped. The first is "Explain this error" on a failed Docker build: it takes the build output and returns a plain-language explanation of what went wrong and how to fix it. Pure text-in, text-out, no actions taken.

The second is more interesting and was my first use of guided generation with @Generable: "Triage with AI" over a Trivy or Grype vulnerability scan. The model produces a structured, prioritized summary of the findings. The catch is obvious - a model summarizing a security report could invent a CVE that sounds plausible. So there is an anti-hallucination guard: every CVE the model cites is validated against the actual set of findings in the scan. Anything it made up is dropped before it reaches the UI. The model reorders and explains what is really there; it is not allowed to add to the list.

What I cut, and why

The honest part. A small on-device model is not a frontier model, and pretending otherwise would have shipped features that lie to users.

Cost estimation in Zenithal. I wanted the model to estimate resource or cloud cost from a config. It is bad at arithmetic, the way small models are, and a confidently wrong dollar figure is worse than no figure. Cut.
Auto-organize in EmpiricCommander. An "let the AI sort your folder" idea. It never reached the bar where I trusted the proposals enough to suggest moving real files around, so it did not ship.
Non-English content. The model does not support every language yet. Romanian text, for instance, cannot be summarized. Rather than return nonsense, the app detects this and surfaces a clear message. A feature that says "I can't do this here" beats one that quietly produces garbage.

Why native turned out to be the unlock

I did not set out to make a platform argument, but one fell out of the work. The Foundation Models framework is only reachable from a native process. An Electron app is a browser, and a browser has no path to the on-device model. A Java-based file manager has the same problem. So "private, on-device, free AI" is not a feature those apps are choosing not to build - it is one they structurally cannot reach without bolting on a cloud backend, which defeats the privacy and cost story entirely.

The backend impact of all of this was, satisfyingly, zero. No API to run, no inference bill, no new data leaving the device. If anything it shrank the privacy surface: there is nothing new to add to a data-processing record because there is no processing off the device.

The apps

All three are native SwiftUI, and the AI features above are free updates to existing versions. The on-device features need a Mac with Apple Intelligence (macOS 26); everything else works as before, down to the older macOS versions each app already supported.

Zenithal - Docker and Kubernetes GUI for macOS.
EmpiricCommander - dual-pane file manager for macOS, iPad, and iPhone.
Tappie - Homebrew package manager GUI for macOS (free).