v0.1.0 — public preview

Future-only evaluation for time-series foundation models.

FUTURE-TS ranks models only on labels that were unavailable when the model was frozen — with strict task cards, pretraining manifests, sealed-runner execution, and multi-budget context scoring.

observed freeze future-only horizon
25strict-surface tasks
15empirical v2 tasks
47scored public models
3context budgets

The contract

Evaluation as a program, not a static bundle.

FUTURE-TS separates clean future-only evidence from accidental leakage, catalog convenience, and post-hoc tuning.

01

Future-only ranking

Ranked labels must be unavailable at freeze time. Task cards carry issue times, target timestamps, and availability metadata.

02

Strict manifest mode

benchmarks/v1 rejects manifestless submissions, so declared-clean and undeclared models are never conflated.

03

Sealed execution

The local sealed runner stamps platform timestamps, applies resource caps, and supports network isolation on Linux.

04

Multi-budget context

The empirical run evaluates zero-shot, few-shot, and s16 budgets at 96, 192, and 288 observations.

Canonical public run

Empirical v2 — public catalog, three budgets.

The current artifact set excludes internal evaluator-only entries and reports 47 scored public models. Useful evidence, not a permanent universal ranking.

RankModelTierScore
01Datadog/Toto-2.0-1BT10.2369
02Datadog/Toto-2.0-313m T20.2214
03NX-AI/TiRex T30.2024
04Salesforce/moirai-2.0-R-small T40.1860
05amazon/chronos-2 T40.1851
06NX-AI/TiRex-1.1-giftevalT40.1833
07google/timesfm-2.5-200m-pytorchT40.1772
08google/timesfm-2.0-500m-pytorch T40.1706
09Datadog/Toto-2.0-22mT40.1640
10Datadog/Toto-2.0-4m T40.1557
11cisco-ai/cisco-time-series-model-1.0 T50.1437
12Salesforce/moirai-1.1-R-largeT60.1277
13amazon/chronos-t5-largeT70.1068
14Salesforce/moirai-1.1-R-baseT70.1067
15amazon/chronos-bolt-miniT70.1030
16Salesforce/moirai-1.1-R-small T80.0882
17mldi-lab/Kairos_50mT90.0746
18ibm-research/granite-timeseries-flowstate-r1.1 T90.0738
19amazon/chronos-bolt-smallT90.0645
20thuml/sundial-base-128mT90.0618
21Datadog/Toto-Open-Base-1.0T90.0574
22amazon/chronos-bolt-base T90.0518
23mldi-lab/Kairos_23mT90.0464
24Maple728/TimeMoE-50M T90.0406
25bytedance-research/Timer-S1T90.0369
26Salesforce/moirai-1.0-R-largeT100.0268
27amazon/chronos-bolt-tiny T100.0268
28Salesforce/moirai-1.0-R-baseT100.0220
29NeoQuasar/Kronos-baseT100.0207
30ibm-research/granite-timeseries-flowstate-r1T100.0176
31Salesforce/moirai-1.0-R-smallT11-0.0331
32Maple728/TimeMoE-200M T12-0.0452
33qcw2333/YingLong_300mT12-0.0546
34mldi-lab/Kairos_10mT12-0.0566
35ibm-research/granite-timeseries-ttm-v1T12-0.0648
36ibm-research/granite-timeseries-ttm-r2 T13-0.1112
37qcw2333/YingLong_110m T14-0.1403
38qcw2333/YingLong_50mT15-0.2224
39qcw2333/YingLong_6mT16-0.3691
40time-series-foundation-models/Lag-LlamaT17-0.6597
41ibm-research/ttm-r3 T18-0.7236
42Salesforce/moirai-moe-1.0-R-baseT19-1.0577
43Salesforce/moirai-moe-1.0-R-smallT19-1.0644
44AutonLab/MOMENT-1-largeT20-1.2300
45ibm-research/granite-timeseries-patchtst-fm-r1T21-1.6944
46AutonLab/MOMENT-1-smallT22-7.0476
47AutonLab/MOMENT-1-baseT23-1281.3842

Pareto-optimal across budget & cost · tiers group models whose rank intervals overlap (not statistically separable).

Model entries

Submit through the sealed-runner path.

  1. 01Declare model identity, artifact URI, and pretraining sources.
  2. 02Implement a deterministic task-window to predictions script.
  3. 03Open a pull request and pass sealed-runner smoke validation.
  4. 04A reviewer triggers the full run and publishes the report.

Scope boundary

Public preview — the methodology can still change.

v0.1.0 is a runnable local benchmark package and sealed-runner MVP. Hosted attestation, immutable submission windows, repeated live waves, wider task coverage, and stronger manifest evidence are future work.