When does the 2026 FIFA World Cup start?

The 2026 FIFA World Cup kicks off on June 11, 2026, with the opening match at Estadio Azteca in Mexico City.

Where is the 2026 World Cup Final?

The Final will be held at MetLife Stadium in East Rutherford, New Jersey (New York metro area) on July 19, 2026.

How many teams are in the 2026 World Cup?

48 teams will compete in the 2026 World Cup — the first time the tournament has been expanded from 32 to 48 teams.

How many host cities are there?

16 host cities across 3 countries: 11 in the USA, 3 in Mexico, and 2 in Canada.

How many matches will be played?

104 matches will be played over 39 days, from June 11 to July 19, 2026.

← Back to reports

Honest Ceilings

The clearest way to be trusted on analytics accuracy is to be the first one to say what we can't claim. This page is the receipt for that posture — locked numbers on what the model does, and an explicit list of where it falls short.

Currently running model current

1. What we can claim

These numbers are computed from the calibrated walk-forward backtest of the currently-shipped model (model-backtest.json) — the exact same live source that drives /methodology and /track-record, so the three surfaces can never disagree — together with SHA256-stamped analytics reports in our public results table.

Model version	current
Brier score (all-time, lower is better)	0.58
1X2 accuracy (all-time)	53%
Log-loss (all-time)	0.00
Graded fixture count	1,520,107

On the subset of fixtures that overlap with FiveThirtyEight's published SPI projections we beat their Brier and log-loss. The comparison is dataset-aligned, point-by-point, and re-runnable — see the head-to-head page for the breakdown: /reports/benchmark.

2. What we cannot claim

We do not beat Pinnacle's closing line. Closing lines from a sharp, high-limit market are the single hardest benchmark in soccer probability modelling. They are informed by SPX-class statistical models plus live information from deep market liquidity (sharps, syndicates, late-breaking news, lineup leaks). We have no realistic claim of out-predicting that aggregate. Anyone who tells you their public model consistently beats Pinnacle close should be asked to lock the reports pre-kickoff with a hash.
Accuracy in low-data leagues is notably worse. The headline accuracy is league-weighted, and our worst buckets sit several points below it.
We don't catch upsets. True upsets are by definition unpredictable from historical patterns. The model returned ~5% on Argentina-Saudi Arabia in 2022; that is not a bug, that is what 5% means.
We are not a betting product.No expected value, no stake-sizing, no “picks”. Probabilities only.

3. Known structural limitations

Bookmaker odds are not used as model input. This is a deliberate choice (the SPX commitment). Using closing odds as a feature would inflate every backtest number while being structurally impossible to reproduce live (the closing line doesn't exist 24h before kickoff). Our reports are generated from team-level signal and locked before the market converges — that is the entire integrity story.
In-play refresh coverage varies by league. For top-coverage competitions we issue half-time and late-phase updates. Lower divisions and some continental competitions don't carry the live data feeds required, so we publish the pre-kickoff analytics report only.
Team strength is the unit of measurement, not player. Lineups and injury context feed in, but the model is team-level today. Finer-grained player modelling is on the roadmap.
Some contextual signals are still maturing. Where a contextual signal is sparse or unavailable, the model relies on the rest of the stack rather than guessing. These gaps close as data coverage improves.

4. How we calibrate honesty

These ceilings are not vibes — they are checkable. Three live surfaces hold us to them:

Calibration plot: /methodology renders predicted-vs-actual rates with 95% Wilson CI bars. If we say 70%, the dot for the 70% bucket should sit on the diagonal. If it doesn't, the page shows you it doesn't.
Dated reports in the repo: every retrain writes a markdown report to docs/backtest-reports/. The headline numbers above come from the live calibrated backtest of the shipped model; the per-league worst-buckets are pulled from the newest dated report that matches the running model version. You can grep the file in the repo to verify the table.

Methodology: /methodology · Benchmark vs 538: /reports/benchmark.