Turn vague business metrics into evaluable sub-metrics that autonomous agents can optimize
"If you can't evaluate it, you can't auto-research it." — Karpathy
"Revenue" is not auto-researchable. But "checkout conversion rate at stage 3" is. The metric-decomposer is the systematic process for getting from the first to the second. Decomposition creates evaluability.
Top-level metrics like "Revenue" or "MAU" fail the MART test — they're measurable but not actionable or timely enough for an agent to optimize. The skill decomposes them into leaves that pass all four checks.
Each technique trades one hard metric for several easier ones. Pick the technique that matches your metric's structure.
Sequential stages with conversion rates between each.
Accumulating quantities split into inflows and outflows.
Value = unit rate × volume. Nonlinear relationship.
Growth = weighted sum of segment growth rates.
Growth of a product = sum of factor growth rates.
Separates "each segment improved" from "segment mix shifted."
A marketplace combines funnel (supply side) with P×Q (demand side). The top-level metric decomposes into five leaves, each scored for auto-research readiness.
Every leaf metric lands in one of four quadrants. The skill's job is to push metrics from Q4 toward Q1 through decomposition (horizontal) and tooling (vertical).
Agent CAN produce output but you CANNOT verify quality. Goodhart's Law lives here. Do not auto-optimize.
Fully autonomous. Define objective, set boundaries, let the agent run. This is the sweet spot.
Metrics design itself, causal reasoning, business judgment. Agent gathers info; human decides.
Agent scaffolds and calculates. Human provides judgment, reviews, and approves before execution.
Identify the top-level metric and the business question it serves
Rate on Measurable, Actionable, Relevant, Timely. Low scores = decompose.
Decision tree: funnel, stock-flow, P×Q, additive, multiplicative, or mix-rate
Break metric into sub-metrics. Verify identity holds (M = M). Recurse if needed.
For unmeasurable leaves, find proxies and document causal assumptions
6 dimensions → score 0-1 → classify Q1/Q2/Q3/Q4 per leaf
Screen against 6 failure modes: proxy gaps, non-linear funnels, circularity, over-decomposition, Simpson's Paradox, interaction effects
Final output: tree with evaluability scores. Q1 leaves get program.md configs for auto-research.
Proxy scores well on MART but optimizing it doesn't improve the real outcome
Users skip stages or loop back, breaking conversion rate math
Price depends on quantity (volume discounts), creating feedback loops
50 leaf metrics = 50 optimization targets = high false discovery rate
Overall metric reverses direction of every segment's metric due to mix shift
Optimizing one leaf degrades an adjacent leaf, nullifying gains