Search for a command to run...
by THUDM
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)