Retired Blog

Trade tagging — the minimum useful taxonomy

Trade tagging — the minimum useful taxonomy

The most common reason a trader's journal stops producing useful post-mortems is the tag list. The journal opens with two or three tags, balloons over six months to fourteen, and ends up as a junk drawer where no two trades share the same combination and no patterns can possibly emerge from the dataset. The opposite failure is just as common: the trader tags the asset and the direction and stops there, then wonders why a year of records hasn't told them which setups make money.

The minimum useful trade-tagging taxonomy is four tags. Setup type, session, conviction level, and a plan-followed binary. Anything fewer and the post-mortem can't separate strategy failures from execution failures. Anything more and the dataset stops aggregating — every trade becomes a unique combinatorial snowflake and the per-tag sample sizes drop below statistical usefulness within a quarter.

The four-tag minimum useful taxonomy FOUR TAGS — ENOUGH SIGNAL, MINIMUM NOISE SETUP TYPE trend-continuation counter-trend breakout mean-revert WHAT TRIGGERED YOU IN SESSION asian london new-york overlap WHAT REGIME YOU WERE IN CONVICTION 1 — A-grade 2 — B-grade 3 — marginal HOW WELL IT MATCHED YOUR RULES PLAN-FOLLOWED YES NO DID YOU EXECUTE THE PLAN No emotion tags. No indicator tags. No "lessons learned" notes. Four columns, queryable.

Each of the four answers a different question on review. Setup type tells you what kind of trade you took. Session tells you under what regime. Conviction tells you how clean the entry was vs your own rules. Plan-followed tells you whether the trade was executed or not. Drop any of the four and a class of post-mortem question becomes unanswerable. Add a fifth and you start fragmenting the dataset.

Why each tag earns its slot

Setup type is the load-bearing one. You can't tell whether your edge is real until you can isolate trades by what kind of pattern got you in. A blended dataset where breakouts and mean-reversion trades sit in the same bucket reports an average that's flattened to noise — the breakouts might be making 0.6R per trade, the mean-reversions might be losing 0.3R, and the headline number is "barely positive overall." The breakdown reveals that you have one strategy that works, one strategy that doesn't, and the second one is leaking the first one's profits.

The tag values shouldn't be infinite. Pick three to five categories and don't add a sixth without strong evidence. Trend-continuation, counter-trend, breakout, and mean-revert cover most discretionary systems. If you trade something more specific — pin-bar reversal at support, volume-spike entries, news-event setups — those go on the same axis, but each new value cuts the per-bucket sample size, so be conservative. The Engulfing pattern checker and Doji classifier are useful for normalising what counts as which setup, since "I think it was a kind of engulfing" doesn't tag cleanly.

Session is the regime tag. Markets behave differently across the Asian, London, and New York sessions — the London session post covers why crypto's largest moves cluster in the 3am-9am ET window. A strategy that's profitable overall might be unprofitable in one specific session because the market structure during that session doesn't suit it. Without the session tag, the strategy looks fine in aggregate; with it, you can see "London is positive, Asian is positive, New York is consistently negative — stop trading New York."

If you're not in crypto and your asset's session structure is different (forex pairs have their own regional rhythms; equities have RTH/AH/pre-market), use whatever the equivalent regimes are. The point is to capture the time-of-day variable as a tag because it routinely contains hidden information that's invisible in the aggregate.

Conviction level is the entry-quality tag, and it's the one most traders skip. The 1/2/3 scale is intentional — A-grade trades that hit every rule on your checklist, B-grade trades that meet most rules but have a softer signal, C-grade trades that you took on a gut nudge with marginal rule-fit. The number is set at the moment of entry, not after the fact, and it should be visible in your position-sizing decision.

The post-mortem question this tag enables: what's my average R on conviction-1 vs conviction-3 trades? If the gap is small (say, 0.7R vs 0.5R) your rule-discrimination is weak — the rules don't separate good trades from mediocre ones, and you might as well just take the highest-quality entries. If the gap is large (1.2R vs -0.4R) your rules are doing real work but you're systematically diluting your edge by taking the C-grade trades. Both diagnoses are actionable; without the tag, neither is visible.

Plan-followed is the binary that quietly carries the most weight. Did you take the trade exactly as your written plan specified — entry, stop, target, size — or did you change something? Yes or no. No gradient.

The diagnostic this enables is the difference between a strategy failure and an execution failure. A strategy that loses money even on plan-followed=yes trades is broken. A strategy that wins on plan-followed=yes but loses overall because plan-followed=no trades drag it down isn't broken — the trader is. The fix is completely different: strategy failures need rule changes; execution failures need behavioural commitments. Mixing the two without the binary tag means you'll redesign the strategy when the actual issue is that you cancelled six stops mid-trade.

What you don't tag

The temptation to add more tags is constant. Fight it. Specifically:

Don't tag emotion. "I was greedy on this one." "Felt fearful." Emotion tags are post-hoc — you can't honestly tag yourself as "greedy" while placing the trade because greed doesn't feel like greed in real time, it feels like opportunity. The label gets applied after the trade closes badly and serves as a kind of confessional, which is satisfying but produces zero actionable signal. Instead of an emotion tag, use the plan-followed binary. If the plan said "size at 2% risk" and you sized at 4%, plan-followed=no captures the behaviour. The internal weather that produced it doesn't.

Don't tag individual indicators. "RSI > 70," "MACD bearish cross," "above the 200-EMA" are all too granular. They proliferate (you'll have 30 within a year) and they don't aggregate (no two trades share the same exact indicator combination). If your setup is an indicator pattern, that's the setup-type tag, not a separate indicator tag.

Don't tag the asset. BTC vs ETH vs SOL is already a column in your trade log. Tagging it as well duplicates the data and inflates the appearance of "tag-rich" entries without adding analytical signal.

Don't tag "lessons learned" as a tag. Lessons go in a notes field, not a tag column, because they're free text and they're per-trade not per-category. A tag is something you'd want to filter on. "I should have moved my stop sooner" is feedback, not a category.

Don't add tags retroactively. If you didn't tag at the time, don't go back and add tags weeks later — your memory of the trade is contaminated by the outcome and you'll tag every losing trade as conviction=2 because it lost. Tags set at entry are signal; tags set at review are post-hoc rationalisation.

What the four tags let you see

Try the live filter below. The dataset is fifty hypothetical trades with all four tags applied at entry. Click through the filters and watch the stats update — the point is to see how the per-tag breakdowns surface patterns the aggregate hides.

Filter the dataset · 50 example trades

Setup
Session
Conviction
Plan-followed
50
Trades
52%
Win rate
+0.23R
Avg R / trade
+11.5R
Total R
Filter the dataset by tag to see how stats shift. Try conviction=3 alone — that's the marginal-entry bucket. Then try plan-followed=no. Notice which produces the biggest swing.
Numbers are hypothetical, derived from realistic distributions of a 52% win rate trader. Same shape patterns show up in real journals — the per-bucket numbers shift, the structure of the question doesn't.

The two filters that produce the most useful comparisons:

Conviction = 1 vs the unfiltered set. If your A-grade trades dramatically outperform the average, your rules genuinely separate clean entries from marginal ones — and your problem isn't strategy design, it's that you're still taking the marginal trades. The fix is a checklist that prevents conviction-3 entries from happening at all, not a redesign of the rules.

Plan-followed = yes vs plan-followed = no. This is the execution-vs-strategy diagnostic. A strategy that's profitable on plan-followed=yes trades but unprofitable overall isn't a strategy problem — it's a behavioural one. The trader is sabotaging trades they themselves designed correctly. The fix is a written entry checklist read out loud before each trade, plus a hard rule against modifying stops mid-position. The strategy can stay as it is.

A strategy that also loses money on plan-followed=yes trades is broken. Don't try to fix it with discipline — discipline can't save a system with no edge. The redesign tools you'd want here are the break-even reward-to-risk calculator and the win-rate confidence interval calculator. Both surface whether your edge is mathematical (the strategy can work given the win rate and R:R) or just absent (the math says no edge possible at the observed numbers).

How many trades before the per-tag stats mean anything

The minimum useful sample per tag bucket is around 30 trades — that's the threshold below which any per-bucket statistic is dominated by noise. Below 30 you can spot direction (this bucket looks worse than that one) but you can't trust the magnitude. The win-rate sample-size post covers the underlying confidence-interval math, but the rule of thumb is fine for routine post-mortems.

The four-tag system fragments 50 trades into at most about 24 unique combinations (4 setups × 4 sessions × 3 convictions × 2 plan-followed = 96 possible cells, but real distributions concentrate in maybe 24 of them). At 50 trades total, your per-cell average is two trades. You can still query single-axis filters meaningfully (50 trades / 4 setups = 12 per setup, marginal), but two-axis filters fall apart fast.

That's actually the design constraint. The four-tag minimum is also a four-tag maximum for most retail traders, because going to five tags pushes per-cell sample sizes into territory where nothing aggregates. If you trade 200 times a year and want to do quarterly post-mortems, you have ~50 trades to slice; four binary or near-binary axes is the most that produces useful subsets.

The trade log on the dashboard supports the four tags as columns — setup, session, conviction (1/2/3), plan-followed (yes/no) — and the analytics view groups by each axis automatically. The point of the standardised vocabulary isn't to be opinionated about your strategy, it's to keep your dataset queryable while you scale.

FAQ

Why isn't profit/loss already a tag?

P/L is the outcome — the column you're trying to explain — not a tag. A tag is a pre-trade categorisation that lets you filter outcomes. Tagging "winner" or "loser" is circular: it tells you which trades won and which lost, but you already have that information from the R-multiple itself. Useful tags describe characteristics that exist before the trade resolves.

What if I only trade one setup type?

Drop the setup-type tag and keep the other three. The point is to tag axes where there's variation; if every trade is the same kind of setup, that column would just say "trend" 200 times and provide no analytical signal. Three tags can be enough if your strategy genuinely has only one setup. Most retail discretionary traders take more than one setup type, even when they think they don't.

How do I score conviction in real time without it becoming arbitrary?

Use a checklist. Three to five rules per setup type. Conviction = 1 means every rule is met, conviction = 2 means most are met but one is borderline, conviction = 3 means it's a discretionary judgement call. The score is the count of rules that pass, normalised. Without a checklist, conviction tagging becomes "how good did this feel" and contaminates the dataset with the same recency bias the tag is supposed to surface.

Should I tag the timeframe (1h, 4h, daily)?

Only if you trade multiple timeframes. If your system is daily-only or 4h-only, the timeframe is constant and not a useful tag. If you trade across timeframes, add it as a fifth tag — but be aware that you'll need to either generate more trades or accept smaller per-bucket samples. The four-tag minimum stretches to five only when the variation in the fifth axis is meaningful.

What's the right place to log the tags — the trade log, a spreadsheet, or notes?

Wherever you'll actually do it consistently. The mechanism matters less than the consistency. A trade log app with native columns is easiest because the analytics view comes for free. A spreadsheet works but requires a pivot table to do per-tag aggregations. Free-form notes don't aggregate at all and shouldn't be used for tags — only for the per-trade lessons. The trade log on the dashboard supports the four-tag schema directly.

How often should I review the tagged trades?

Monthly works for most active traders. Quarterly is the floor — anything less and patterns shift before you've reviewed them. The review is short: filter by each axis, look at the per-bucket average R, win rate, and sample size. If a bucket looks meaningfully different from the rest, that's the post-mortem question to answer. If everything looks similar, the strategy is consistent across regimes and no rule change is indicated.

Won't different traders need different tag systems?

The four-tag minimum is meant to be the floor that fits everyone. Some traders will need a fifth tag for a specific axis their strategy depends on (timeframe, asset class, market regime). The principle is the same: tag axes where your trades vary along that dimension, and stop adding axes when per-bucket samples drop below ~30. The four covered here are the ones almost every retail discretionary trader benefits from regardless of their specific setup.

Tools that go with this

  • R-multiple calculator — converts dollar outcomes into the unit that aggregates cleanly across tag buckets.
  • Win-rate confidence interval — answers "is the gap between buckets signal or noise" given the per-bucket sample size.
  • Break-even reward-to-risk — once you've split conviction = 1 from conviction = 3, this tells you whether either bucket's win rate × R:R clears the cost-floor.
Sources
  • Brett Steenbarger (2009). The Daily Trading Coach. Wiley. Chapter on trade journaling argues the same case for axis-based tagging from a discretionary-trader perspective.
  • Wikipedia. Trade journal. General-purpose summary of journal categories and their uses.
  • Tversky, A. & Kahneman, D. (1971). Belief in the Law of Small Numbers. Psychological Bulletin, 76(2). The sample-size limit on per-bucket inference comes from this paper's broader framework.
  • W3C / MDN documentation on <details>. Reference for the FAQ collapsibles used in the post.
← All posts