ServiceNow A benchmark by ServiceNow Research

Dr-CiK: A Testbed for Foresight-Driven Agents

Real forecasts depend on context that no one hands you. Dr-CiK asks whether agents can find the right evidence in a sea of distractors — and use it to predict the future.

Yihong Tang1,2,✉, Andrew Robert Williams2,3,4, Arjun Ashok2,3,4, Vincent Zhihao Zheng1, Lijun Sun1, Alexandre Drouin2,5,4, Issam H. Laradji2,6, Étienne Marcotte2, Valentina Zantedeschi2,5,✉

1McGill University  ·  2ServiceNow Research  ·  3Université de Montréal  ·  4Mila – Québec AI Institute  ·  5Université Laval  ·  6University of British Columbia

Corresponding authors

Overview of Context-Aided Forecasting via Deep Research

Context-Aided Forecasting via Deep Research: an agent must search a document space, distill forecast-useful evidence, and forecast from it — while resisting distractors.

Motivation

Forecasting needs context you have to go find

Time-series forecasting in the real world rarely depends on history alone. A traffic forecast hinges on a planned road closure; a demand forecast hinges on an upcoming promotion; a sensor forecast hinges on a maintenance window. That context lives in documents — reports, tickets, notes — scattered across noisy, heterogeneous sources, mixed in with material that looks relevant but is not.

Existing context-aided forecasting benchmarks hand the model the right context up front. That leaves the central question untouched: can an agent identify the right context on its own? Dr-CiK is built to answer it.

CiK

Context is Key

When it comes to forecasting, the right external context often matters more than a better model. Quality context substantially improves forecasts.

Dr

Deep Research

Finding that context in a large corpus — and distilling it into forecast-useful evidence while rejecting distractors — demands genuine deep research.

The benchmark

What an agent has to do

Each task pairs a time series with a corpus of supporting and distractor documents, and the agent works through four steps to produce an evidence-grounded forecast.

1

Retrieve

Search the document space for context relevant to the series being forecast.

2

Filter

Reject distractors — confounders, noise, profile and temporal mismatches, and misleading time-series claims.

3

Distill

Turn the retrieved context into concise, forecast-useful evidence.

4

Forecast

Produce a forecast grounded in that evidence — and be judged against ground truth.

279
Forecasting tasks
10,342
Documents
5
Distractor types
6,975
Distractor documents

Of the 10,342 documents, 3,367 are supporting and 6,975 are distractors — exactly five per distractor type per task. Tasks span synthetic and human-authored sources across domains like infrastructure, healthcare, transportation, and systems observability. Ground-truth evidence and future values are retained for evaluation.

Key findings

Today's agents struggle to find the future's context

~40%

Context pays off. With ground-truth context, the best forecaster cuts scaled CRPS by roughly 40% versus no context — the prize is real.

<5%

But evidence is missed. Most deep-research agents recover under 5% of the ground-truth supporting evidence in a task.

>80%

And distractors win. Agents are frequently misled — a large majority of cited documents are distractors, and retrieved context can push forecasts below the no-context baseline.

Leaderboard

Results

Two leaderboards — forecasting and deep-research quality. The official ranking is on the hidden test set (80 tasks, labels withheld, scored by us); the paper's 240-task results are kept as reference. Switch protocol and sort any column.

Task showcase

Look inside a task

Each task is a time series, a forecast target, ground-truth evidence, and a corpus of supporting and distractor documents. Explore a few interactively.

Open the interactive showcase
Citation

Cite Dr-CiK

BibTeX
@article{tang2026dr,
  title={Dr-CiK: A Testbed for Foresight-Driven Agents},
  author={Tang, Yihong and Williams, Andrew Robert and Ashok, Arjun and Zheng, Vincent Zhihao and Sun, Lijun and Drouin, Alexandre and Laradji, Issam H and Marcotte, {\'E}tienne and Zantedeschi, Valentina},
  journal={arXiv preprint arXiv:2605.27904},
  year={2026}
}