Procedural planning aims to predict a sequence of actions that transforms an initial visual state into a desired goal. ViterbiPlanNet introduces a Differentiable Viterbi Layer (DVL) that embeds a Procedural Knowledge Graph (PKG) directly within Viterbi decoding. This enables graph-based decoding during training, making the full system trainable end-to-end.
Key features:
- Explicit Procedural Knowledge: integrates a PKG to guide planning.
- Differentiable Viterbi Layer (DVL): smooth relaxations replace non-differentiable operations.
- State-of-the-art performance: strong results on CrossTask, COIN, and NIV with fewer parameters than diffusion/LLM planners.
- Sample efficiency: structure-aware training improves robustness and efficiency.