We are excited to share a new publication in Neural Computing and Applications:
Single-shot policy explanation to improve task performance via semantic reward coaching
by Aaquib Tabrez, Ryan Leonard, and Bradley Hayes.
In this work, we introduce Single-shot Policy Elicitation for Augmenting Rewards (SPEAR), a novel optimization algorithm that uses semantic natural language explanations to help humans and robots improve their decision-making. By modeling humans as reinforcement learning agents, SPEAR identifies misaligned reward functions and provides concise corrective feedback.
Highlights of the paper include:
- A formal characterization of the policy elicitation problem, framing behavior modification as semantic reward coaching.
- Introduction of SPEAR, which scales linearly with predicate count, substantially outperforming prior approaches.
- Human-subjects studies showing that SPEAR explanations improve policy comprehension, reduce cognitive load, and enhance decision-making.
- Demonstrations of robot-to-human and robot-to-robot collaboration, including evacuation scenarios and multi-agent tabletop cleaning tasks.
This work was supported by the Army Research Lab STRONG Program and highlights new opportunities for integrating explainable AI into human-robot teaming.
📄 Read the full paper here: DOI link CAIRO Link