← Back to portfolio
Personal Project

VisionClaw — JARVIS in my glasses

A real-time AI assistant running on my Oakley Meta HSTN smart glasses. See what I see, hear what I say, take actions on my behalf.

A man wearing Meta smart glasses asks the AI assistant to dim a lamp
The pitch: ask out loud, the glasses see what you see, the AI handles it.

What it is

I wanted to know how close we are to Tony Stark's JARVIS — an assistant that actually lives with you, sees what you're looking at, and can do things on your behalf without you reaching for a phone. Turns out: pretty close.

VisionClaw runs as an iOS app on my iPhone, paired to my Oakley Meta HSTN glasses. The glasses stream their camera and microphone to the phone. The phone forwards them in real time to Google's Gemini Live API. Gemini sees what I see, hears what I say, and talks back — out loud, through the glasses' speakers.

I can ask things like:

The voice loop is the whole experience. No screen, no typing. You just talk and it answers. The first time it actually worked I sat in my car for ten minutes asking it questions like a kid with a new toy.

VisionClaw app installed on my iPhone home screen
Installed on my iPhone after a Cmd+R from Xcode.

How it works

The interesting part is that none of this is "AI on the glasses." The Meta glasses are a sensor and a speaker — the brain lives on the phone (and in the cloud).

Architecture diagram: Glasses feed Gemini Live for real-time perception, which delegates to OpenClaw for execution
Two layers: Gemini Live handles perception and conversation, a local agent handles real-world actions.

The pipeline:

The stack

Hardware
Oakley Meta HSTN
Glasses SDK
Meta Wearables DAT
AI
Gemini Live API
Agent
OpenClaw (local)
App
Swift / iOS 17
Build
Xcode 15

What I actually built

The app itself is the open-source VisionClaw project — credit where it's due. What I did was get the whole stack working end-to-end on my hardware:

It's not a product. It's me proving to myself — and now, to you — that the pieces are here. The "ambient AI" thing isn't five years away. You can wire it together this weekend.

Why this matters

Smart glasses have been a punchline for a decade. What changed is the model on the other end of the wire. Gemini Live is the first time real-time multimodal voice + vision feels like talking to a person, not querying a database. The glasses are just the form factor that finally makes it natural — no phone in your hand, no screen between you and the world.

I think a lot of "AI products" people are trying to build right now look like apps because we don't have a better paradigm yet. This is what comes next.