← Back to portfolio
Personal Project

Vitals

Pull, derive, coach. Three layers — a sync layer for biometrics and message signals, a derivation layer that turns 3.4 million raw sensor samples into trustworthy analytics, and a persona contract that turns any LLM into a quantitative coach who consults both.

Architecture diagram: Oura Ring, Renpho scale, and iMessage flow through pluggable adapters into a local SQLite. A derivation layer materializes analytics tables on top. A model-agnostic persona contract (coach.md) drives any LLM — Claude, GPT, Gemini, or a local 9B — to query the derived tables before answering.
Sources on the left, adapters and derived tables in the middle, persona + LLM on the right.
1,527
Days biometric
3.4M
HR samples
6
Derivation modules
10.8 yrs
Message signals
100%
Local

What I built

Three layers, joined at a SQLite file.

Layer 1 — a pull layer. Adapter classes for Oura (biometrics, sleep, workouts, sub-minute HR during efforts), Renpho (body composition since 2020), and an iMessage roll-up that turns ten years of chat.db into per-day behavioral signals — late-night message volume, reply latency percentiles, sleep/alcohol/conflict/intimacy keyword counts, max 5-minute burst. Each adapter is idempotent on a natural key with a 2-day overlap on re-pulls. Launchd runs the sync twice a day. No webhooks; nothing leaves the machine.

Layer 2 — a derivation layer. Six analysis modules that materialize trustworthy derived tables on top of the raw samples — personal HR zones (versioned, because fitness drifts), per-day time-in-zone with cadence-aware weights, Banister TRIMP and Coggan CTL/ATL/TSB curves, HR-only session reconstruction independent of the wearable's own workout tags, cardiac-drift analysis on long steady-state efforts, recovery-load regressions with block-bootstrap CIs and BH-FDR correction. The agent never reads raw HR samples directly — it consults the derived tables. This is the layer I didn't know I was missing until I built it.

Layer 3 — a persona contract. A single markdown file (coach.md) that tells whatever LLM you load it into: your role is a quantitative fitness coach; query the derivation tables before answering; cite specific values and rolling baselines; coach with a recommendation, don't just narrate the numbers; cross-reference across tables when relevant. The contract is model-agnostic — it's the system prompt for OpenAI, the system instruction for Gemini, the --system flag for an Ollama-served local 9B, or the CLAUDE.md when working in Claude Code. The LLM is interchangeable; the contract pointed at the derived tables is the actual artifact.

Why I built it

Even with the pull layer and the persona contract, the agent was wrong.

When I asked it "is my training elite?" it answered confidently: "99.8% moderate intensity, 0% hard time, you live entirely in the gray zone." All three numbers were wrong by factors of five to fifty. Not roughly wrong — categorically wrong. And it was confident.

The proximate cause was a timezone bug. I'd joined heart-rate samples (stored in UTC with a Z suffix) against workout windows (stored in local time with ±HH:MM offsets) by string comparison. ISO 8601 strings with different offsets don't compare moments in time — they're just strings, and the lexicographic compare is meaningless across offsets. The join silently dropped most samples inside workout windows. The agent dutifully ratified the corrupt analysis and I had to retract three paragraphs of confident coaching to myself.

The deeper cause was that the only signal cheap enough for an agent to grab on demand was Oura's pre-computed intensity field — a coarse per-workout label that turns out to be ground-truth-disconnected. The pull layer gives an agent raw sensor data. The persona contract instructs the agent to query before answering. Neither layer is sufficient when the cheapest query returns a misleading number, because that's the answer the model will use.

What was missing was the layer in the middle. A derivation tier that takes 3.4 million raw samples and produces a small set of trustworthy tables — proper zone-time, proper training load, HRV gates anchored on personal baselines — that the agent reads instead of reaching for whatever pre-computed shortcut the upstream vendor happened to ship. That's Layer 2. It exists because Layer 3 was demonstrably wrong without it.

The derivation layer is the unlock. The pull layer gives the agent data. The persona contract gives it instructions. Neither makes the agent correct. The derived tables are what make the agent's answers trustworthy — they're the single source of truth the contract points at. Strip the model out of the loop and the derivation layer is what describes what the agent is for.

How it works

Layer 1 — the pull layer

Each source — Oura, Renpho, iMessage — is its own subpackage exposing a sync_* function with a shared shape: read the cursor from sync_state, pull with a 2-day overlap because upstream APIs sometimes back-fill late, upsert into the source's table keyed on a natural key (day, document id, or (timestamp, source)), update sync_state. Re-running a sync is always safe and idempotent. The contract is convention, not an ABC — a new source is a new subpackage plus three lines in cli.py.

The raw API JSON is kept in a raw_json column on every table. If the schema misses a field I want later, json_extract(raw_json, '$.path') gets it without a re-sync. Heart-rate samples come in at the API's native cadence — about 45 seconds during workouts, 12 minutes at rest, which works out to roughly 3.4 million rows over four years on a single Oura account. Well within SQLite's comfort zone, queryable with no index gymnastics.

Layer 2 — the derivation layer

Six analysis modules in src/health/analysis/, each materializing its own derived table:

Plus an HRV autoregulation gate (hrv_swc.py) implementing the Plews-Buchheit smallest-worthwhile-change framework. Daily green / yellow / red verdict against a personal 60-day baseline (travel days excluded — TZ shifts perturb HRV mechanically), with explicit detection of the "rMSSD-CV rising while mean falling" pattern that flags non-functional overreaching. Backtested over my prior 90 days, the gate would have told me to back off on 44 of them. I trained over 100 TRIMP on every single one. Adherence: zero. Building the gate didn't change my training, but it made the cost legible — that's what derivation layers do.

Layer 3 — the persona contract

The whole "agent" is a markdown file. An abbreviated excerpt:

# Vitals — Coach Persona

You are a fitness coach and quantitative health analyst.

When asked anything about recovery, training, body
composition, or how those metrics interact:

1. Query the derivation tables first — daily_load,
   hr_session, hrv_verdict, zone_config, protocol_state.
   Never re-aggregate raw sensor data on the fly.
2. Be quantitative. Cite specific values, % changes,
   day-over-day deltas. No "your sleep seems okay" —
   give the score, the 7d trend, the rolling baseline.
3. Coach, don't just report. After the numbers, give
   a concrete recommendation.
4. Cross-reference. Sleep, HRV, body temperature, and
   message-derived stress signals interact. Surface
   lagged correlations across tables when relevant.

That's the full contract — it fits in a screenshot. Loaded into Claude Code it lives as CLAUDE.md. Pasted into the OpenAI Responses API as a system message, the same words work. Fed to a local Qwen 3.5 9B via the local LLM stack as the system prompt, ditto. The contract names which derived tables to consult and forbids re-computation on the fly; the derivation layer ensures the answer to that consultation is trustworthy. The contract plus the derivation layer plus a model: that's the agent.

Cross-domain joins

Because every source lands in the same SQLite — and the derivation layer materializes its outputs to the same place — every cross-domain question is a one-line query the agent runs and cites. The figure below plots a random sample of 200 days from the corpus — late-night messages sent vs. next-day HRV. Each dot is one day. The lower-right cluster is the kind of pattern that's invisible to a single-domain app: high late-night phone activity followed by depressed next-day autonomic recovery. The agent finds these when asked; it doesn't need a separate analytics dashboard.

Scatter plot: late-night messages sent (x-axis) vs next-day HRV in ms (y-axis). 200 sampled days across 2024–2026. Most points cluster at zero late-night messages with HRV spread across 20–110. High-late-night-message days (10+) skew toward lower HRV values.
Sampled cross-domain join: messages × biometrics. Real anonymized data, no dates or contact identifiers.

Local and pull-based

The whole stack is sync httpx, SQLite, pandas/numpy, and launchd. No webhooks, no daemon, no VPS, no cloud function. The trade-off is real — there's no real-time signal — and it's the right one, because the data is daily-grain anyway and the cloud APIs deliver intraday HR at sub-minute resolution only during detected workouts. Real-time would be theatrical, not useful. Two cron firings a day catches everything that matters and the entire system runs in user-space on one laptop.

The stack

Language
Python 3.11
Storage
SQLite (local)
Compute
pandas · numpy
HTTP
httpx · sync
Scheduler
launchd · 2× daily
Reference sources
Oura v2 · Renpho · iMessage
Agent
Any LLM via coach.md

What it gets me

Privacy

Nothing on this page is identifying. The scatter is real but anonymized — no dates, no contact IDs, no names. The corpus and the agent's outputs stay on my machine, encrypted, never synced anywhere. The open-source repo ships with no data either. If you want a vitals corpus of your body and your messages, you point the adapters at your own accounts on your own laptop. That's the whole architecture.