A real-time AI assistant running on my Oakley Meta HSTN smart glasses. See what I see, hear what I say, take actions on my behalf.
I wanted to know how close we are to Tony Stark's JARVIS — an assistant that actually lives with you, sees what you're looking at, and can do things on your behalf without you reaching for a phone. Turns out: pretty close.
VisionClaw runs as an iOS app on my iPhone, paired to my Oakley Meta HSTN glasses. The glasses stream their camera and microphone to the phone. The phone forwards them in real time to Google's Gemini Live API. Gemini sees what I see, hears what I say, and talks back — out loud, through the glasses' speakers.
I can ask things like:
The voice loop is the whole experience. No screen, no typing. You just talk and it answers. The first time it actually worked I sat in my car for ten minutes asking it questions like a kid with a new toy.
The interesting part is that none of this is "AI on the glasses." The Meta glasses are a sensor and a speaker — the brain lives on the phone (and in the cloud).
The pipeline:
execute tool call. When I ask for an action, Gemini decides what to do, calls execute(task: "…"), and OpenClaw uses its 56+ skills to actually carry it out — messaging apps, web search, smart home, notes, reminders.The app itself is the open-source VisionClaw project — credit where it's due. What I did was get the whole stack working end-to-end on my hardware:
It's not a product. It's me proving to myself — and now, to you — that the pieces are here. The "ambient AI" thing isn't five years away. You can wire it together this weekend.
Smart glasses have been a punchline for a decade. What changed is the model on the other end of the wire. Gemini Live is the first time real-time multimodal voice + vision feels like talking to a person, not querying a database. The glasses are just the form factor that finally makes it natural — no phone in your hand, no screen between you and the world.
I think a lot of "AI products" people are trying to build right now look like apps because we don't have a better paradigm yet. This is what comes next.