Can wishlist-to-sales conversion benchmarks be validated?

Not rigorously at scale. Pre-launch wishlist counts are visible only to the developer in Steamworks — they are not public — so no third party can assemble a representative, held-out sample of (wishlists at launch, realized revenue) pairs. Published wishlist benchmarks rely on self-reported numbers from developers who chose to share, which is a selection-biased sample.

Why is 'the median Steam game makes $X' misleading?

The figure depends entirely on which games you count. Revenue datasets are survivorship-biased: games that sold almost nothing rarely register in owner-count or review-based revenue estimates at all, so they silently drop out of the denominator. A median computed over games-with-measurable-revenue is much higher than a median over all launches including the dead-on-arrival tail.

How does Steam Launch Forecaster avoid the wishlist-data problem?

It forecasts from public Steam store-page features (genre, price, tags, platforms, release timing, demo) rather than wishlist counts, and validates the resulting P10-P90 revenue cones against realized week-1 revenue on a held-out set of 1,560 launches the model never saw during training — 82% empirical coverage against an 80% target.

Data essay · Published 2026-05-22 · 9 min read

Steam wishlist benchmarks: a reality check

Most wishlist-to-sales rules of thumb can’t be validated on held-out data — and the “median Steam game makes $X” figures quietly drop the failures from the denominator. Here’s what 10,416 launches actually show, bias included.

TL;DR

You can’t validate wishlist conversion benchmarks at scale. Pre-launch wishlist counts live in Steamworks, visible only to the developer. No third party can assemble a representative held-out sample of (wishlists at launch → revenue) pairs, so “20–40% of wishlists convert in week one” is, at population scale, essentially unfalsifiable. The numbers that circulate come from self-reported devs — a selection-biased slice.
“The median Steam game makes $X” depends entirely on who you count. Revenue datasets are survivorship-biased: games that sold almost nothing leave little owner-count or review trace, so they fall out of the denominator. Across the 10,416 launches we have measurable revenue estimates for, the distribution is brutally right-skewed — but that population already excludes the dead-on-arrival tail, so its median is not the median Steam launch.
That’s why we don’t forecast from wishlists. We predict from public store-page features and validate the P10–P90 cone against realized week-1 revenue on 1,560 held-out launches (82% coverage, 80% target). The denominator is stated; the model is publicly testable.

1. The wishlist benchmark you can’t check

Open any indie-launch guide and you’ll meet some version of the wishlist rule: a game converts roughly 20–40% of its launch wishlists into week-one sales, or you need N wishlists to hit the Popular Upcoming list. These heuristics are genuinely useful as priors. The problem is what happens when you try to check one.

To validate “X% of wishlists convert” honestly, you’d need a representative sample of games with both their pre-launch wishlist count and their realized revenue. But Steam does not publish wishlist counts — they appear only in the developer’s own Steamworks dashboard. So every public wishlist benchmark is built from numbers developers chose to share. Devs who had a great launch share more often than devs who flopped; devs with clean analytics share more than those without. That’s textbook selection bias, and it pushes every wishlist-conversion average upward by an unknown amount.

This isn’t a knock on the people who publish these benchmarks — it’s the best anyone can do with private data. It is a reason to hold the numbers loosely.

2. Survivorship bias eats the “median game”

The second benchmark you’ll see everywhere is some “the median Steam game earns about $___” figure. Here the trap is the denominator.

Third-party revenue estimates come from owner counts (e.g. SteamSpy-style owners × price) or from review-count multipliers (the Boxleiter method). Both require a signal — enough owners to estimate, or enough reviews to multiply. A game that sells a few hundred copies generates almost no reliable signal, so it never enters the dataset. The failures don’t lower the median; they vanish from it.

Our own corpus is honest about this. We have measurable revenue estimates for 10,416 launches. Within that population the spread is enormous — roughly:

more than a third of launches sit under ~$100k of estimated revenue,
the top decile is ~100× the bottom decile (and the bottom decile of this set is already well above the true floor — see below),
and the top 1% pulls the mean far above the median — the familiar Steam power law.

But here’s the part most benchmarks skip: those 10,416 are the games that left a measurable trace. The much larger tail of launches that sold close to nothing isn’t in the denominator — for us or for anyone else estimating from owners/reviews. So our distribution’s median is the median of games that registered revenue, which is a far rosier population than “all Steam launches.” We’d rather say that out loud than quote you a median that quietly excludes the failures.

3. So we stopped guessing wishlists

Because pre-launch wishlist data can’t be assembled into a clean, representative training set, we built the forecaster to not depend on it. The model takes public Steam store-page inputs — genre and tags, price, platforms, demo presence, release timing — and outputs a calibrated P10–P90 revenue cone rather than a single number. Wishlists, when you supply your own, drive an independent Boxleiter cross-check, not the core estimate.

The payoff is that the model is checkable. We hold out 1,560 launches at training time, fit on the rest, then grade the cones against realized week-1 revenue. Current result: 82% of held-out launches land inside their P10–P90 cone, against an 80% target. The methodology — and the wrong predictions — are public in the Q2 calibration report and on the methodology page.

4. Why even the reviews rule is hard to grade

You might think the reviews-to-revenue rule (the Boxleiter method — estimate sales from review count) escapes all this, since review counts are public. It doesn’t, for a subtle reason worth naming: most large-scale Steam revenue figures — ours included — are themselves derived from owner counts or review counts. So “checking” a reviews-based rule against a reviews-or-owners-derived revenue number is partly circular: the yardstick and the thing being measured share a source. That isn’t a flaw unique to us — it’s structural to third-party Steam revenue data, which has no clean ground truth.

That circularity is exactly why we treat Boxleiter as a cross-check, never the answer, and why the only grade we trust is held-out coverage against realized outcomes — the 82% above. The Boxleiter popularizer’s own observation that roughly a quarter of games miss the rule by more than 30% is consistent with treating it as a loose central guide, not a per-game oracle.

5. What to ask of any launch benchmark

Before you budget against a number, ask three things:

What population? All launches, or only games with measurable revenue? (If they don’t say, assume the rosy version.)
Measured or self-reported? Especially for wishlist figures — self-reported samples skew toward winners.
Tested on held-out data, or just asserted? A benchmark you can’t falsify is marketing, not measurement.

We don’t think wishlist benchmarks are useless — they’re fine priors. We think they’re quoted with more confidence than the data can carry. The honest move is to forecast from inputs you can actually validate, publish the denominator, and show how often you’re wrong.

Forecast your own launch — on testable inputs

Enter your Steam app ID for a calibrated P10/P50/P90 revenue cone, a Boxleiter cross-check, and five real comp launches. Free, no signup.

Run a forecast →

Figures are estimates from third-party owner counts (SteamSpy-style) and review-based proxies, and describe only launches with measurable revenue — a survivorship-biased subset of all Steam releases, as discussed above. Held-out calibration: 82% P10–P90 coverage on n=1,560. Method: /methodology. Related: Steam launch lever benchmarks — the causal follow-up on what actually moves week-1 revenue.