Steam Launch Forecaster

Data essay · Published 2026-05-22 · 9 min read

Steam wishlist benchmarks: a reality check

Most wishlist-to-sales rules of thumb can’t be validated on held-out data — and the “median Steam game makes $X” figures quietly drop the failures from the denominator. Here’s what 10,416 launches actually show, bias included.


TL;DR

1. The wishlist benchmark you can’t check

Open any indie-launch guide and you’ll meet some version of the wishlist rule: a game converts roughly 20–40% of its launch wishlists into week-one sales, or you need N wishlists to hit the Popular Upcoming list. These heuristics are genuinely useful as priors. The problem is what happens when you try to check one.

To validate “X% of wishlists convert” honestly, you’d need a representative sample of games with both their pre-launch wishlist count and their realized revenue. But Steam does not publish wishlist counts — they appear only in the developer’s own Steamworks dashboard. So every public wishlist benchmark is built from numbers developers chose to share. Devs who had a great launch share more often than devs who flopped; devs with clean analytics share more than those without. That’s textbook selection bias, and it pushes every wishlist-conversion average upward by an unknown amount.

This isn’t a knock on the people who publish these benchmarks — it’s the best anyone can do with private data. It is a reason to hold the numbers loosely.

2. Survivorship bias eats the “median game”

The second benchmark you’ll see everywhere is some “the median Steam game earns about $___” figure. Here the trap is the denominator.

Third-party revenue estimates come from owner counts (e.g. SteamSpy-style owners × price) or from review-count multipliers (the Boxleiter method). Both require a signal — enough owners to estimate, or enough reviews to multiply. A game that sells a few hundred copies generates almost no reliable signal, so it never enters the dataset. The failures don’t lower the median; they vanish from it.

Our own corpus is honest about this. We have measurable revenue estimates for 10,416 launches. Within that population the spread is enormous — roughly:

But here’s the part most benchmarks skip: those 10,416 are the games that left a measurable trace. The much larger tail of launches that sold close to nothing isn’t in the denominator — for us or for anyone else estimating from owners/reviews. So our distribution’s median is the median of games that registered revenue, which is a far rosier population than “all Steam launches.” We’d rather say that out loud than quote you a median that quietly excludes the failures.

3. So we stopped guessing wishlists

Because pre-launch wishlist data can’t be assembled into a clean, representative training set, we built the forecaster to not depend on it. The model takes public Steam store-page inputs — genre and tags, price, platforms, demo presence, release timing — and outputs a calibrated P10–P90 revenue cone rather than a single number. Wishlists, when you supply your own, drive an independent Boxleiter cross-check, not the core estimate.

The payoff is that the model is checkable. We hold out 1,560 launches at training time, fit on the rest, then grade the cones against realized week-1 revenue. Current result: 82% of held-out launches land inside their P10–P90 cone, against an 80% target. The methodology — and the wrong predictions — are public in the Q2 calibration report and on the methodology page.

4. Why even the reviews rule is hard to grade

You might think the reviews-to-revenue rule (the Boxleiter method — estimate sales from review count) escapes all this, since review counts are public. It doesn’t, for a subtle reason worth naming: most large-scale Steam revenue figures — ours included — are themselves derived from owner counts or review counts. So “checking” a reviews-based rule against a reviews-or-owners-derived revenue number is partly circular: the yardstick and the thing being measured share a source. That isn’t a flaw unique to us — it’s structural to third-party Steam revenue data, which has no clean ground truth.

That circularity is exactly why we treat Boxleiter as a cross-check, never the answer, and why the only grade we trust is held-out coverage against realized outcomes — the 82% above. The Boxleiter popularizer’s own observation that roughly a quarter of games miss the rule by more than 30% is consistent with treating it as a loose central guide, not a per-game oracle.

5. What to ask of any launch benchmark

Before you budget against a number, ask three things:

  1. What population? All launches, or only games with measurable revenue? (If they don’t say, assume the rosy version.)
  2. Measured or self-reported? Especially for wishlist figures — self-reported samples skew toward winners.
  3. Tested on held-out data, or just asserted? A benchmark you can’t falsify is marketing, not measurement.

We don’t think wishlist benchmarks are useless — they’re fine priors. We think they’re quoted with more confidence than the data can carry. The honest move is to forecast from inputs you can actually validate, publish the denominator, and show how often you’re wrong.


Forecast your own launch — on testable inputs

Enter your Steam app ID for a calibrated P10/P50/P90 revenue cone, a Boxleiter cross-check, and five real comp launches. Free, no signup.

Run a forecast →

Figures are estimates from third-party owner counts (SteamSpy-style) and review-based proxies, and describe only launches with measurable revenue — a survivorship-biased subset of all Steam releases, as discussed above. Held-out calibration: 82% P10–P90 coverage on n=1,560. Method: /methodology.