Future-only ranking
Ranked labels must be unavailable at freeze time. Task cards carry issue times, target timestamps, and availability metadata.
v0.1.0 — public preview
FUTURE-TS ranks models only on labels that were unavailable when the model was frozen — with strict task cards, pretraining manifests, sealed-runner execution, and multi-budget context scoring.
The contract
FUTURE-TS separates clean future-only evidence from accidental leakage, catalog convenience, and post-hoc tuning.
Ranked labels must be unavailable at freeze time. Task cards carry issue times, target timestamps, and availability metadata.
benchmarks/v1 rejects manifestless submissions, so declared-clean and undeclared models are never conflated.
The local sealed runner stamps platform timestamps, applies resource caps, and supports network isolation on Linux.
The empirical run evaluates zero-shot, few-shot, and s16 budgets at 96, 192, and 288 observations.
Canonical public run
The current artifact set excludes internal evaluator-only entries and reports 47 scored public models. Useful evidence, not a permanent universal ranking.
| Rank | Model | Tier | Score |
|---|---|---|---|
| 01 | Datadog/Toto-2.0-1B | T1 | 0.2369 |
| 02 | Datadog/Toto-2.0-313m ◆ | T2 | 0.2214 |
| 03 | NX-AI/TiRex ◆ | T3 | 0.2024 |
| 04 | Salesforce/moirai-2.0-R-small ◆ | T4 | 0.1860 |
| 05 | amazon/chronos-2 ◆ | T4 | 0.1851 |
| 06 | NX-AI/TiRex-1.1-gifteval | T4 | 0.1833 |
| 07 | google/timesfm-2.5-200m-pytorch | T4 | 0.1772 |
| 08 | google/timesfm-2.0-500m-pytorch ◆ | T4 | 0.1706 |
| 09 | Datadog/Toto-2.0-22m | T4 | 0.1640 |
| 10 | Datadog/Toto-2.0-4m ◆ | T4 | 0.1557 |
| 11 | cisco-ai/cisco-time-series-model-1.0 ◆ | T5 | 0.1437 |
| 12 | Salesforce/moirai-1.1-R-large | T6 | 0.1277 |
| 13 | amazon/chronos-t5-large | T7 | 0.1068 |
| 14 | Salesforce/moirai-1.1-R-base | T7 | 0.1067 |
| 15 | amazon/chronos-bolt-mini | T7 | 0.1030 |
| 16 | Salesforce/moirai-1.1-R-small ◆ | T8 | 0.0882 |
| 17 | mldi-lab/Kairos_50m | T9 | 0.0746 |
| 18 | ibm-research/granite-timeseries-flowstate-r1.1 ◆ | T9 | 0.0738 |
| 19 | amazon/chronos-bolt-small | T9 | 0.0645 |
| 20 | thuml/sundial-base-128m | T9 | 0.0618 |
| 21 | Datadog/Toto-Open-Base-1.0 | T9 | 0.0574 |
| 22 | amazon/chronos-bolt-base ◆ | T9 | 0.0518 |
| 23 | mldi-lab/Kairos_23m | T9 | 0.0464 |
| 24 | Maple728/TimeMoE-50M ◆ | T9 | 0.0406 |
| 25 | bytedance-research/Timer-S1 | T9 | 0.0369 |
| 26 | Salesforce/moirai-1.0-R-large | T10 | 0.0268 |
| 27 | amazon/chronos-bolt-tiny ◆ | T10 | 0.0268 |
| 28 | Salesforce/moirai-1.0-R-base | T10 | 0.0220 |
| 29 | NeoQuasar/Kronos-base | T10 | 0.0207 |
| 30 | ibm-research/granite-timeseries-flowstate-r1 | T10 | 0.0176 |
| 31 | Salesforce/moirai-1.0-R-small | T11 | -0.0331 |
| 32 | Maple728/TimeMoE-200M ◆ | T12 | -0.0452 |
| 33 | qcw2333/YingLong_300m | T12 | -0.0546 |
| 34 | mldi-lab/Kairos_10m | T12 | -0.0566 |
| 35 | ibm-research/granite-timeseries-ttm-v1 | T12 | -0.0648 |
| 36 | ibm-research/granite-timeseries-ttm-r2 ◆ | T13 | -0.1112 |
| 37 | qcw2333/YingLong_110m ◆ | T14 | -0.1403 |
| 38 | qcw2333/YingLong_50m | T15 | -0.2224 |
| 39 | qcw2333/YingLong_6m | T16 | -0.3691 |
| 40 | time-series-foundation-models/Lag-Llama | T17 | -0.6597 |
| 41 | ibm-research/ttm-r3 ◆ | T18 | -0.7236 |
| 42 | Salesforce/moirai-moe-1.0-R-base | T19 | -1.0577 |
| 43 | Salesforce/moirai-moe-1.0-R-small | T19 | -1.0644 |
| 44 | AutonLab/MOMENT-1-large | T20 | -1.2300 |
| 45 | ibm-research/granite-timeseries-patchtst-fm-r1 | T21 | -1.6944 |
| 46 | AutonLab/MOMENT-1-small | T22 | -7.0476 |
| 47 | AutonLab/MOMENT-1-base | T23 | -1281.3842 |
◆ Pareto-optimal across budget & cost · tiers group models whose rank intervals overlap (not statistically separable).
Model entries
Scope boundary
v0.1.0 is a runnable local benchmark package and sealed-runner MVP. Hosted attestation, immutable submission windows, repeated live waves, wider task coverage, and stronger manifest evidence are future work.