How the Model Works
We publish this page because honest analytics deserve an honest explanation. You should be able to read the approach, check our backtest numbers, and decide for yourself whether the forecasts on this site are worth your attention.
1. What we're modelling
For every qualifying fixture we estimate three probabilities — a home win (H), a draw (D), and an away win (A) — plus the expected number of goals each team will score. We are notpredicting exact scores, we are not issuing “picks”, and we never guarantee outcomes. The right way to read us is: “if this match were played 100 times, the model thinks the home team would win roughly this many.”
2. Which matches we predict
We deliberately do not predict every match in the world. A fixture must clear five filters before it ships:
- Curated league whitelist — 27 competitions in three tiers, all with strong API-Football coverage:
- Internationals: FIFA World Cup, World Cup qualifiers (UEFA / CONMEBOL / CONCACAF / AFC / CAF), UEFA Champions League, Europa League, Conference League, FIFA Club World Cup, Nations League, Copa América, international friendlies.
- Top European leagues (with backtest-tuned per-league weights): Premier League, La Liga, Bundesliga, Serie A, Ligue 1.
- Other club leagues (use default ensemble weights — per-league tuning to come): MLS, Saudi Pro League, Brasileirão Série A, Liga MX, Eredivisie, Primeira Liga (Portugal), Belgian Pro League, Bundesliga 2, EFL Championship, Liga Profesional Argentina, Liga Pro (Ecuador), Primera A (Colombia), Primera División (Chile), Allsvenskan (Sweden), Veikkausliiga (Finland), J1 League (Japan), K League 1 (South Korea), A-League (Australia).
- Time window — the cron looks 24-72 hours ahead. Predictions appear ~24 hours before kickoff and lock at that moment.
- Training-pool minimum — at least 8 historical matches across both teams must be available. New clubs, mid-season promoted sides, or teams returning from a long break sometimes fail this filter.
- Data-quality floor— every fixture gets a 0-90 data score. Below 18, the model refuses to publish at all (the “suppress” verdict). Section 5 below covers this gate in detail.
- Lock integrity— once a prediction is hashed and locked, it cannot be regenerated by an automated process. This is the trust layer: yesterday's prediction stays exactly as published, even if today's model would have output something different.
In a typical cron run we evaluate 1,300+ candidate fixtures across the world and ship locked predictions for ~30-50 of them. The rest fail the league or data-quality filter, which is the right outcome — we'd rather publish fewer, better forecasts than spray noise.
3. The SPX AI Quant Prediction Model — three models, not one
Single-model forecasters get fooled by the same blind spot every time. We fit three independent statistical models on the same training pool, then blend their probabilities with weights that are tuned per league from three years of historical fixtures.
- Dixon-Coles (Poisson) — attack/defence multipliers fit by maximum likelihood with time-decayed weights and the classic τ correction for low-score correlation.
- Elo with Davidson 3-way extension — ratings updated after every match (margin-aware K-factor), then projected to home/draw/away probabilities via the Davidson formula that handles ties explicitly rather than splitting them artificially.
- xG-Poisson — same Poisson shape as Dixon-Coles but trained on rolling expected-goals averages instead of actual goals. Captures chance quality, which regresses to the mean faster than lucky/unlucky finishing.
Each fixture is scored by all three. The published probability is a weighted blend; the weights come from a grid search that minimises log-loss league by league.
4. Per-league optimal weights
Different leagues reward different signals. Italian football is a tactical chess match where chance quality (xG) outpredicts goals; the Premier League is high-variance and rewards Elo's rating-based view; La Liga and the Bundesliga are dominated by a few super-clubs that the Poisson model captures cleanly. The grid search picks up on these differences automatically.
| League | DC | Elo | xG |
|---|---|---|---|
| FIFA World Cup | 0.00 | 0.20 | 0.80 |
| Premier League | 0.00 | 0.45 | 0.55 |
| La Liga | 0.90 | 0.10 | 0.00 |
| Bundesliga | 0.40 | 0.25 | 0.35 |
| Serie A | 0.00 | 0.00 | 1.00 |
| Ligue 1 | 0.35 | 0.25 | 0.40 |
| Other competitions | 0.70 | 0.15 | 0.15 |
These are the currently optimal weights from the latest backtest. They are re-evaluated as we accumulate more data.
Production note: the runtime currently uses a no-xG variantof these weights — the xG slot is redistributed (~75% to Dixon-Coles, ~25% to Elo) because per-fixture xG enrichment from API-Football's statistics endpoint isn't yet wired into the live training pool. Each prediction page's footer shows the exact weights used to produce that specific forecast, so the published numbers always match what shipped. The xG-tuned variants above will become the production weights once xG enrichment lands.
5. The AI contextual reviewer
The quant ensemble is blind to things humans care about: injuries, rotations, tournament state, stylistic matchups, and breaking news. After the three models produce their blend, we hand the baseline plus a structured context document (recent form, opponent strength, round, venue, news context, and the individual component probabilities) to our AI reviewer and ask for an adjusted forecast with reasoning.
Crucially, the reviewer sees where the three quant models agree and where they disagree. When the three models converge, the reviewer is instructed to make only small adjustments — high agreement is itself a signal. When they diverge, the reviewer leans on the recent-form context to break the tie.
The final published probability is a 55/45 weighted ensemble — 55% quant, 45% LLM. The weights are deliberately conservative: the statistical models are the anchor, the LLM nudges. This prevents the reviewer from overreacting to narrative while still capturing context the equations miss.
6. The data-quality gate
Not every match has enough data to warrant a confident forecast. We score every fixture out of 80 based on:
- Effective training sample per team (30 pts)
- Recent form availability for both sides (25 pts)
- Head-to-head history (10 pts)
- Lineups published before kickoff (10 pts)
- Competition tier (5 pts)
Matches scoring ≥48publish at up to 5★ confidence. Scores of 32–47 publish with a 3★ cap and a “moderate data” note. Scores of 18–31 publish free as experimental. Anything under 18 is suppressed entirely — we would rather show nothing than a misleading number.
7. The SHA256 integrity lock
Every prediction's canonical JSON is hashed with SHA256 and the hash is stored alongside the prediction. The hash appears at the bottom of every prediction page and can be re-verified by anyone hitting our public endpoint:
GET /api/predictions/verify?id=<fixtureId>
This is the proof layer. If a prediction turns out to be wrong after kickoff, we can't quietly change it and claim we were right — the hash would no longer match. If it turns out to be right, you know we didn't have insider information tweaked in at the last minute.
8. Backtest results (real numbers, no cherry-picking)
Top-5 European leagues + 4 World Cup tournaments
Walk-forward backtest on 9,024 fixtures covering 5 seasons (2021-2025) of the Premier League, La Liga, Bundesliga, Serie A, and Ligue 1, plus all four World Cups available in our data source (2010, 2014, 2018, 2022). All three models were refit every 10 matches using only data available before each fixture kicked off — no future information leakage. Per-league weights chosen by grid search to minimise log-loss.
| Model | Accuracy | Log-loss |
|---|---|---|
| Dixon-Coles only | 49.2% | 1.0688 |
| Elo only | 32.9% | 1.0925 |
| xG-Poisson only | 50.2% | 1.0272 |
| SPX AI Quant Prediction Model | 50.5% | 1.0103 |
Lower log-loss = better-calibrated probabilities. Uniform-prior baseline is ≈1.10. Always-home baseline accuracy on this dataset is ≈42-43%. World Cup matches drag the overall numbers slightly because international knockout football is genuinely harder to predict — see the per-league breakdown above.
FIFA World Cup 2022 — international football
All 64 matches predicted using pre-tournament training data only. International football is materially harder than club football — sample sizes per team are small and upsets define the tournament.
| Stats-only accuracy | 42.2% |
| Ensemble accuracy | 43.8% |
| Log-loss vs uniform | 1.19 vs 1.10 |
| Model-called 'chalk' picks | hit rate ~70% |
| True upsets (Saudi, Japan, Morocco) | largely missed — as disclosed |
These numbers will improve when real-time injury feeds and live lineup data are wired into the AI reviewer for matches during the tournament window. The 2022 backtest was run without those signals; 2026 matches will have them.
9. What this model deliberately does not do
- Predict exact scores (the Poisson distribution makes exact scores a fool's errand).
- Reliably catch upsets. Argentina 1-2 Saudi Arabia sat at 5% in our model. Famous upsets are, by definition, unpredictable from historical patterns.
- Provide betting tips or recommend wagers. This is analytical forecasting only.
- Incorporate insider information. We use only publicly available data from our sports-data partners.
10. Read further
- Dixon & Coles (1997). “Modelling Association Football Scores and Inefficiencies in the Football Betting Market.”
- Davidson (1970). “On Extending the Bradley-Terry Model to Accommodate Ties in Paired Comparison Experiments.”
- Elo (1978). The Rating of Chess Players, Past and Present.
- Constantinou & Fenton (2012). “pi-football: A Bayesian network model for forecasting Association Football match outcomes.”
- Hubáček et al. (2019). “Learning to predict soccer results.”
Questions? Email [email protected] or read our Terms.