Learning Generalized Policy Automata for Relational Stochastic Shortest Path Problems


Several goal-oriented problems in the real-worldcan be naturally expressed as Stochastic ShortestPath Problems (SSPs). However, the computa-tional complexity of solving SSPs makes findingsolutions to even moderately sized problems in-tractable. Currently, existing state-of-the-art plan-ners and heuristics often fail to exploit knowledgelearned from solving other instances. This paperpresents an approach for learningGeneralized Pol-icy Automata(GPA): non-deterministic partial poli-cies that can be used to catalyze the solution pro-cess. GPAs are learned using relational, feature-based abstractions, which makes them applicableon broad classes of related problems with differentobject names and quantities. Theoretical analysisof this approach shows that it guarantees complete-ness and hierarchical optimality. Our experimentsshow that this approach effectively learns broadlyapplicable policy knowledge in a few-shot fashionand significantly outperforms state-of-the-art SSPsolvers on test problems whose object counts arefar greater than those used during training.

In The Conference on Neural Information Processing Systems, 2022