TestART: Improving LLM-based Unit Test via Co-evolution of Automated Generation and Repair Iteration

简介

单元测试对于检测程序单元中的错误至关重要，但需要耗费时间和精力。现有的自动化单元测试生成方法主要基于基于搜索的软件测试（SBST）和语言模型来解放开发人员。最近，大型语言模型（LLMs）展示了出色的推理和生成能力。然而，几个问题限制了它们生成高质量测试用例的能力：（1）LLMs可能在上下文不足的情况下生成无效的测试用例，导致编译错误；（2）缺乏测试和覆盖反馈信息可能导致运行时错误和低覆盖率；（3）重复抑制问题导致LLMs陷入自我修复或重新生成尝试的重复循环。在本文中，我们提出了TestART，一种新颖的单元测试生成方法，它利用LLMs的优势，同时克服了上述限制。TestART通过自动化生成和修复迭代的协同进化来改进基于LLMs的单元测试。TestART利用基于模板的修复技术修复LLM生成的测试用例中的错误，使用提示注入来指导下一步自动化生成并避免重复抑制。此外，TestART从通过的测试用例中提取覆盖信息，并将其用作测试反馈，以增强最终测试用例的充分性。这种生成和修复之间的协同作用显著提高了生成的测试用例的质量、有效性和可读性，超出了以前的方法。在比较实验中，TestART生成的测试用例通过率为78.55％，比ChatGPT-4.0模型和基于相同ChatGPT-3.5的方法ChatUniTest高约18％。它还在通过测试的焦点方法上实现了令人印象深刻的90.96％的行覆盖率，超过EvoSuite 3.4％。
图表
解决问题

TestART: A Co-evolution Method for Automated Unit Test Generation and Repair using Large Language Models
关键思路

TestART leverages large language models (LLMs) to generate unit tests and template-based repair technique to fix bugs in the generated test cases. It also uses prompt injection to guide the next-step automated generation and avoid repetition suppression. Furthermore, TestART extracts coverage information from the passed test cases and utilizes it as testing feedback to enhance the sufficiency of the final test case.
其它亮点

TestART achieves a pass rate of 78.55% and a line coverage rate of 90.96% on the focal methods that passed the test, exceeding previous methods. It also overcomes the limitations of LLM-based unit test generation methods, such as generating invalid test cases, lack of feedback information, and repetitive suppression problem.
相关研究

Related work includes search-based software testing (SBST) and language models for automated unit test generation, such as ChatGPT-4.0 model and ChatUniTest. EvoSuite is also a related work for automated test generation.

TestART: Improving LLM-based Unit Test via Co-evolution of Automated Generation and Repair Iteration

评论