What Do Large Language Models Think You Think? A False Belief Task Study in a Safety-Critical Domain

conference paper
A preliminary evaluation of ChatGPT-4o in modified False Belief Tasks for safety-critical contexts indicates weaknesses in Theory of Mind reasoning. We explore the implications for Large Language Model-enabled human-machine collaboration in such environments.
TNO Identifier
1011866
Source title
Workshop ‘Advancing Artificial Intelligence through Theory of Mind’ (ToM4AI@AAAI), part of the 39th Annual AAAI Conference on Artificial Intelligence, February 25 – March 4, 2025, Philadelphia, Pennsylvania, USA
Collation
5 p.
Files
To receive the publication files, please send an e-mail request to TNO Repository.