CVPR 2026 Procedural Planning Differentiable Viterbi

ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos

A principled framework that explicitly integrates procedural knowledge into the learning process for procedural planning in instructional videos.

Luigi Seminara^1,†, Davide Moltisanti^2,*, Antonino Furnari^1,*

(1) University of Catania • (2) University of Bath
* Equal advising • † Work done while visiting University of Bath

arXiv (soon) Code Issues

📢 News

February, 2026 — We release the ViterbiPlanNet codebase and features.
February, 2026 — ViterbiPlanNet is accepted at CVPR 2026.

Overview

Procedural planning aims to predict a sequence of actions that transforms an initial visual state into a desired goal. ViterbiPlanNet introduces a Differentiable Viterbi Layer (DVL) that embeds a Procedural Knowledge Graph (PKG) directly within Viterbi decoding. This enables graph-based decoding during training, making the full system trainable end-to-end.

Key features:

Explicit Procedural Knowledge: integrates a PKG to guide planning.
Differentiable Viterbi Layer (DVL): smooth relaxations replace non-differentiable operations.
State-of-the-art performance: strong results on CrossTask, COIN, and NIV with fewer parameters than diffusion/LLM planners.
Sample efficiency: structure-aware training improves robustness and efficiency.

Why this code

We establish and open-source a standardized evaluation benchmark that unifies data splits and evaluation metrics implementations, providing fair and rigorous comparisons and addressing key inconsistencies in prior work.

Metrics include Success Rate, mAcc, and mIoU.