Differential Assessment of Black-Box AI Agents

Abstract

Much of the research on learning symbolic models of AIagents focuses on agents with stationary models. This as-sumption fails to hold in settings where the agent’s capa-bilities may change as a result of learning, adaptation, orother post-deployment modifications. Efficient assessment ofagents in such settings is critical for learning the true capabil-ities of an AI system and for ensuring its safe usage. In thiswork, we propose a novel approach todifferentiallyassessblack-box AI agents that have drifted from their previouslyknown models. As a starting point, we consider the fully ob-servable and deterministic setting. We leverage sparse obser-vations of the drifted agent’s current behavior and knowl-edge of its initial model to generate an active querying pol-icy that selectively queries the agent and computes an up-dated model of its functionality. Empirical evaluation showsthat our approach is much more efficient than re-learning theagent model from scratch. We also show that the cost of dif-ferential assessment using our method is proportional to theamount of drift in the agent’s functionality.

Publication
In The Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022