v0.1.0 — public preview

Future-only evaluation for time-series foundation models.

FUTURE-TS ranks models only on labels that were unavailable when the model was frozen — with strict task cards, pretraining manifests, sealed-runner execution, and multi-budget context scoring.

Benchmark card Submit a model

observed freeze future-only horizon

25strict-surface tasks

15empirical v2 tasks

47scored public models

3context budgets

The contract

Evaluation as a program, not a static bundle.

FUTURE-TS separates clean future-only evidence from accidental leakage, catalog convenience, and post-hoc tuning.

Future-only ranking

Ranked labels must be unavailable at freeze time. Task cards carry issue times, target timestamps, and availability metadata.

Strict manifest mode

benchmarks/v1 rejects manifestless submissions, so declared-clean and undeclared models are never conflated.

Sealed execution

The local sealed runner stamps platform timestamps, applies resource caps, and supports network isolation on Linux.

Multi-budget context

The empirical run evaluates zero-shot, few-shot, and s16 budgets at 96, 192, and 288 observations.

Canonical public run

Empirical v2 — public catalog, three budgets.

The current artifact set excludes internal evaluator-only entries and reports 47 scored public models. Useful evidence, not a permanent universal ranking.

Rank	Model	Tier	Score
01	Datadog/Toto-2.0-1B	T1	0.2369
02	Datadog/Toto-2.0-313m ◆	T2	0.2214
03	NX-AI/TiRex ◆	T3	0.2024
04	Salesforce/moirai-2.0-R-small ◆	T4	0.1860
05	amazon/chronos-2 ◆	T4	0.1851
06	NX-AI/TiRex-1.1-gifteval	T4	0.1833
07	google/timesfm-2.5-200m-pytorch	T4	0.1772
08	google/timesfm-2.0-500m-pytorch ◆	T4	0.1706
09	Datadog/Toto-2.0-22m	T4	0.1640
10	Datadog/Toto-2.0-4m ◆	T4	0.1557
11	cisco-ai/cisco-time-series-model-1.0 ◆	T5	0.1437
12	Salesforce/moirai-1.1-R-large	T6	0.1277
13	amazon/chronos-t5-large	T7	0.1068
14	Salesforce/moirai-1.1-R-base	T7	0.1067
15	amazon/chronos-bolt-mini	T7	0.1030
16	Salesforce/moirai-1.1-R-small ◆	T8	0.0882
17	mldi-lab/Kairos_50m	T9	0.0746
18	ibm-research/granite-timeseries-flowstate-r1.1 ◆	T9	0.0738
19	amazon/chronos-bolt-small	T9	0.0645
20	thuml/sundial-base-128m	T9	0.0618
21	Datadog/Toto-Open-Base-1.0	T9	0.0574
22	amazon/chronos-bolt-base ◆	T9	0.0518
23	mldi-lab/Kairos_23m	T9	0.0464
24	Maple728/TimeMoE-50M ◆	T9	0.0406
25	bytedance-research/Timer-S1	T9	0.0369
26	Salesforce/moirai-1.0-R-large	T10	0.0268
27	amazon/chronos-bolt-tiny ◆	T10	0.0268
28	Salesforce/moirai-1.0-R-base	T10	0.0220
29	NeoQuasar/Kronos-base	T10	0.0207
30	ibm-research/granite-timeseries-flowstate-r1	T10	0.0176
31	Salesforce/moirai-1.0-R-small	T11	-0.0331
32	Maple728/TimeMoE-200M ◆	T12	-0.0452
33	qcw2333/YingLong_300m	T12	-0.0546
34	mldi-lab/Kairos_10m	T12	-0.0566
35	ibm-research/granite-timeseries-ttm-v1	T12	-0.0648
36	ibm-research/granite-timeseries-ttm-r2 ◆	T13	-0.1112
37	qcw2333/YingLong_110m ◆	T14	-0.1403
38	qcw2333/YingLong_50m	T15	-0.2224
39	qcw2333/YingLong_6m	T16	-0.3691
40	time-series-foundation-models/Lag-Llama	T17	-0.6597
41	ibm-research/ttm-r3 ◆	T18	-0.7236
42	Salesforce/moirai-moe-1.0-R-base	T19	-1.0577
43	Salesforce/moirai-moe-1.0-R-small	T19	-1.0644
44	AutonLab/MOMENT-1-large	T20	-1.2300
45	ibm-research/granite-timeseries-patchtst-fm-r1	T21	-1.6944
46	AutonLab/MOMENT-1-small	T22	-7.0476
47	AutonLab/MOMENT-1-base	T23	-1281.3842

◆ Pareto-optimal across budget & cost · tiers group models whose rank intervals overlap (not statistically separable).

Leaderboard JSON Empirical paper

Model entries

Submit through the sealed-runner path.

01Declare model identity, artifact URI, and pretraining sources.
02Implement a deterministic task-window to predictions script.
03Open a pull request and pass sealed-runner smoke validation.
04A reviewer triggers the full run and publishes the report.

Scope boundary

Public preview — the methodology can still change.

v0.1.0 is a runnable local benchmark package and sealed-runner MVP. Hosted attestation, immutable submission windows, repeated live waves, wider task coverage, and stronger manifest evidence are future work.

Design paper Empirical paper Validity envelope Submission guide