ViterbiPlanNet
Differentiable Viterbi for Planning in Instructional Videos
CVPR 2026 Procedural Planning Differentiable Viterbi

ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos

A principled framework that explicitly integrates procedural knowledge into the learning process for procedural planning in instructional videos.

Luigi Seminara1,†, Davide Moltisanti2,*, Antonino Furnari1,*

(1) University of Catania  •  (2) University of Bath
* Equal advising  •  Work done while visiting University of Bath

ViterbiPlanNet Overview

📢 News

February 2026
We release the ViterbiPlanNet codebase and features.
February 2026
ViterbiPlanNet is accepted at CVPR 2026 as Highlight paper.
June 3, 2026
EgoVis Workshop
Oral + Poster
09:30 Oral, Room 704/706 - Luigi Seminara
10:00-10:45 Poster, ExHall A - Luigi Seminara, Davide Moltisanti, Antonino Furnari
15:30-16:15 Poster, ExHall A - Luigi Seminara, Davide Moltisanti, Antonino Furnari
June 4, 2026
SAUAFG Workshop
Oral + Poster
15:50-16:20 Oral, Hall 705/707 - Luigi Seminara
16:55-18:00 Poster, ExHall A - Luigi Seminara, Davide Moltisanti, Antonino Furnari

Overview

Procedural planning aims to predict a sequence of actions that transforms an initial visual state into a desired goal. ViterbiPlanNet introduces a Differentiable Viterbi Layer (DVL) that embeds a Procedural Knowledge Graph (PKG) directly within Viterbi decoding. This enables graph-based decoding during training, making the full system trainable end-to-end.

Key features:

  • Explicit Procedural Knowledge: integrates a PKG to guide planning.
  • Differentiable Viterbi Layer (DVL): smooth relaxations replace non-differentiable operations.
  • State-of-the-art performance: strong results on CrossTask, COIN, and NIV with fewer parameters than diffusion/LLM planners.
  • Sample efficiency: structure-aware training improves robustness and efficiency.

Poster

ViterbiPlanNet CVPR 2026 poster

Why this code

We establish and open-source a standardized evaluation benchmark that unifies data splits and evaluation metrics implementations, providing fair and rigorous comparisons and addressing key inconsistencies in prior work.

Metrics include Success Rate, mAcc, and mIoU.

Datasets

The project supports three public datasets for procedural planning:

  • CrossTask (see ./dataset/CrossTask)
  • COIN (see ./dataset/COIN)
  • NIV (see ./dataset/NIV)

In our experiments we used pre-extracted S3D features:

Contact

This repository is created and maintained by Luigi. Technical questions and discussions are encouraged via GitHub issues.

You can also reach me via email: luigi.seminara@phd.unict.it

Authors

Authors of ViterbiPlanNet

Citation

Click to copy BibTeX.

@InProceedings{Seminara_2026_CVPR,
    author    = {Seminara, Luigi and Moltisanti, Davide and Furnari, Antonino},
    title     = {ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
    pages     = {31240-31249}
}