分享

$τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

热度