Win rate confidence interval calculator
A win rate from 20 trades is a guess with an error bar wider than the number itself. A trader who has gone 16-and-4 will tell you they have an 80% win rate, and they're not wrong about the observed outcome — but the underlying rate that produced it could be anywhere from roughly 56% to 94%. The math behind that range is the binomial proportion confidence interval, and it's the bar that decides whether your win-rate number means anything.
The calculator below runs the Wilson score interval — a more accurate version of the textbook "± 1.96 × √(p(1−p)/n)" that doesn't break down at small samples or extreme proportions. Type your wins and total trades. Read the band. Decide whether you have data or noise.
Win rate confidence calculator
Observed
Std. error
± half-width
Why a number isn't enough without an error bar
Win rate is the most-cited and least useful trading metric on its own. The number "80%" looks decisive, but it's a single sample from a probability distribution. The distribution is what matters — and at small sample sizes, the distribution is wider than most traders realise.
A real 65%-win-rate strategy will produce stretches of 55% over 20 trades by pure chance about 30% of the time. A trader living through one of those stretches concludes the strategy "stopped working" and ditches it. The strategy was fine; the sample was too small to distinguish a normal cold streak from a broken edge. The mirror image is just as common: a coin-flip strategy will produce stretches of 70% over 20 trades by chance about 5% of the time, and the trader living through it sizes up.
The fix is to read the win-rate number alongside its confidence interval — the band the true rate could plausibly be hiding in given your sample. At 20 trades, that band is about ±18 percentage points. At 100, ±8. At 500, ±3.5. The data narrows toward the truth slowly: doubling the sample shrinks the band by roughly √2, not 2. To halve the error you need 4× the trades. To quarter it, 16×.
How big a sample you actually need
The right answer depends on what decision the win rate has to support:
- "Does this strategy work at all?" Roughly 100 trades for a strategy clearly above the break-even rate. The interval at 100 is tight enough to rule out coin-flip outcomes if the rate looks meaningfully above 50%.
- "Should I switch from strategy A to B?" 200-300 trades each, ideally compared on a paired-trade basis (run both strategies on the same period rather than comparing two separate samples). Small differences in true rate (75% vs 70%) need many more trades than that.
- "Should I scale risk?" 500+ trades. The interval at 500 is around ±3.5%, narrow enough that scaling decisions can rest on the headline number without major mis-sizing risk.
- "Is this strategy genuinely degrading or just having a cold week?" Depends on the size of the change. A 5-percentage-point drop on a high-winrate strategy needs 600+ trades after the suspected break to confirm. That's most of a year at typical retail frequency.
For a typical retail crypto trader taking 1-3 trades a day, hitting 500 trades is a year of consistent trading. There is no version of "take 20 trades and decide" that the math supports.
What the Wilson interval is and why it's better
The textbook formula for the confidence interval of a sample proportion is p ± z × √(p(1-p)/n). It works fine when the sample is large and the proportion is near 50%. It breaks down at small n (the assumption that the binomial distribution looks roughly normal stops holding) or extreme proportions (a 100% win rate produces a zero-width interval, which is obviously wrong).
The Wilson score interval, first published in 1927 by E.B. Wilson in the Journal of the American Statistical Association, is a more honest formula. At small n it's wider; at extreme proportions it's asymmetric in the right direction. For 18 wins out of 20, the simple formula optimistically reports about 80% ± 18%; Wilson correctly reports 56-94%.
The difference shows up most where it matters most: small samples and high observed rates, which is exactly the data retail traders most often have.
How to read the band you get
The verdict labels above are calibrated to the kind of decision the band can support, not the band's absolute width. A 60% lower bound on a 75% observed rate at 100 trades is the band saying "you probably have a real edge above coin-flip, but the size of the edge isn't yet pinned down." That's enough to keep trading, not enough to scale up. At 500 trades the band might tighten to 70-80%, at which point sizing decisions become defensible.
The band never tells you "this strategy is broken" — it tells you "the data so far is consistent with rates between X and Y." If the lower bound is below your break-even rate, that's the signal to either trade more before deciding or to look at other metrics (expectancy, profit factor) for confirmation.
For the long-form version of this argument — why a 16-and-4 record is essentially a guess with a wider error bar than people realise — see the post on how many trades before a win rate is real.
FAQ
Why does the math break at very small samples?
The simple "±1.96 × √(p(1-p)/n)" formula assumes the binomial distribution looks roughly normal — which holds when n is large and p isn't too close to 0 or 1. At small n or extreme proportions neither holds. The formula gives intervals that are too narrow and too symmetric to be honest. Wilson's 1927 score interval handles both edge cases gracefully and is the safer default.
Does this work for backtests too?
Mathematically yes — samples are samples. Practically, backtest samples have systematic biases (overfitting, look-ahead, optimistic execution) that live samples don't. A reasonable rule is to discount backtest sample size by 50-70% when projecting forward — a 1,000-trade backtest is roughly equivalent to 300-500 live trades for confidence-interval purposes.
How do I compare two strategies with this?
The cleanest method is paired evaluation — run both strategies on the same period and compute the per-trade difference. The standard error of the difference is much smaller than the standard error of either rate alone, so you need fewer trades to reach significance. About 100-200 paired trades is usually enough to distinguish strategies whose true rates differ by 5+ percentage points.
What confidence level is this calculator using?
95%, the default for almost all financial-significance work. That means: if you ran the same experiment many times, the true rate would fall inside the calculated interval 95 of those times. For a stricter band (99%), the interval would be roughly 30% wider; for a looser band (90%), about 16% narrower. 95% is the right default for trading decisions.
Should I worry about my win rate dropping over time?
Only if the drop is larger than the confidence-interval width and persists across a fresh sample. Inside the band, the rate is fluctuating exactly as it should given finite data. Outside the band — across enough new trades to be statistically meaningful — that's a real shift worth investigating. Most "the strategy stopped working" panic happens inside the band.
More calm tools like this one
You're in — go browse
Retired Today is a small platform of no-noise apps. This calculator is one of the tools; there are more.
You're signed in. Explore the rest of the tools and apps.