Back to audits

The 135K-trade backtest, after fees and stability checks

Our flagship backtest looked great at +76% per trade. Then we applied fees, split it by category, and checked temporal stability. Here's the version we can defend.

May 13, 2026·ohh.bet research·3 min read·backtestcalibrationfeesmethodology

We pitched ohh.bet on a backtest: 135,356 historical $1K+ whale buys on resolved Polymarket markets, with a clean cheap-band edge. The headline cell on the cohort table — $100K+ whales buying at price 0.20–0.40 — claimed +76% average ROI per trade at 62% hit rate.

Six weeks of running the live detector and one fresh audit later, that number isn't defensible as published. Here's what we found when we asked harder questions of the same data.

The original headline was n=5

In the v1 backtest function, the p_20_40 × whale_100k+ bucket appeared with n=21 — barely above our HAVING COUNT >= 20 floor. When we re-ran the cohort with stricter outcome normalization in v2, that bucket dropped to n=5. Below the floor entirely. The number that motivated the entire site was based on five trades.

The realistic "mini-whale" cohort — $25K–$100K buys at price 0.20–0.40 — has n=141, hits 34%, and yields +8.1% gross ROI per trade. Net of a conservative 200 bps fee, +7.4%. Still positive expectancy. Not the 76× upside we advertised.

What 200 bps does to the numbers

We apply 200 bps conservatively: take it off winners (Polymarket's settlement fee), leave losers at -100%. The cohort-wide impact:

BandSize tiernHit %Gross ROINet ROIΔ
0.20-0.40mini-whale14134.0%+8.10%+7.42%-0.68pp
0.20-0.40retail12,97734.8%+14.84%+14.14%-0.70pp
0.40-0.60mini-whale84947.5%-6.53%-7.48%-0.95pp
0.60-0.80mini-whale63958.7%-14.20%-15.37%-1.17pp

Two things stand out. Fees are a steady ~1pp drag, predictable. And the cheap band is the only place fees don't kill the trade — anywhere above 0.60, you're already losing before fees show up.

Temporal stability: most of the edge is in the last half

The v2 backtest splits each cohort by traded date at the median (2026-04-22 for the 135K-row corpus). If the edge is real and stable, both halves should look similar. They don't:

Sub-band × sizeEarly halfLate halfΔ
0.20-0.40 × mini-whalen=44, hit 22.7%, -29.2%n=97, hit 39.2%, +25.0%+54.2pp
0.20-0.40 × big_10k_25kn=223, hit 20.2%, -32.9%n=298, hit 36.9%, +23.3%+56.2pp
0.20-0.40 × retailn=8,738, +7.2%n=4,239, +30.6%+23.4pp

The cheap-band edge isn't uniform across the backtest window — the late half is doing all the work. Could be a regime shift. Could be small-sample tail behavior in the late half. Either way, publishing the all-cohort number without disclosing this δ-stability swing would have been misleading.

That's why /transparency now shows gross, net, and Δ stability columns side by side. The Δ for the live band is +50pp+ in amber. You should see it before you trust the headline.

What the cohort actually says

The defensible reading:

  1. The cheap-band thesis is directionally real, but the magnitudes were oversold in our original pitch.
  2. Mini-whale buys (≈ $25K–$100K) at 0.20–0.40 carry positive expected value historically, at roughly +7–8% net per trade. Not +76%.
  3. The expensive band (>0.60) is reliably net-negative, even at high hit rates. This part of the thesis stands cleanly.
  4. The temporal Δ stability swing is the biggest open question. If late-half outperformance was driven by a one-off regime, the live calibration we're running now will surface it within a few months.

What we changed

After this audit:

  • The about, methodology, and landing pages dropped the "+15–80% EV per trade" language. The backtest is now framed as historical motivation, not a forward promise.
  • /transparency shows Net ROI alongside Gross, and a Δ stability column.
  • Wave 31 narrowed the live band to 0.20–0.30 — the only sub-band with positive live expectancy so far.
  • Wave 34 added a wallet-reputation gate on top, so the detector only auto-trades on trigger wallets with prior resolved-trade alpha.

The strategy may still work. The number you saw before this post couldn't be defended. We took it down.


See the live numbers on /transparency. The methodology is at /methodology. If you spot another number on the site we should re-audit, we want to hear about it.

Receipts for every number live on /transparency. Methodology details at /methodology. Spot a problem with our reasoning? Drop a note via the feedback form.

All audits