Generate High-Coverage Unit Test Data Using the LLM Tool
Ngoc Thi Bich Do1, Chi Quynh Nguyen2

1Ngoc Thi Bich Do, Faculty of Information Technology, Posts and Telecommunications Institute of Technology, Hanoi, Vietnam.

2Chi Quynh Nguyen, Faculty, Department of Computer Science, Posts and Telecommunications Institute of Technology, Hanoi, Vietnam. 

Manuscript received on 30 September 2024 | Revised Manuscript received on 19 October 2024 | Manuscript Accepted on 15 November 2024 | Manuscript published on 30 November 2024 | PP: 13-18 | Volume-13 Issue-12, November 2024 | Retrieval Number: 100.1/ijitee.L999613121124 | DOI: 10.35940/ijitee.L9996.13121124

Open Access | Editorial and Publishing Policies | Cite | Zenodo | OJS | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Unit testing is a critical phase in the software development lifecycle, essential for ensuring the quality and reliability of code. However, the manual creation of unit test scripts and the preparation of corresponding test data can be a time-consuming and labor-intensive process. To address these challenges, several automated approaches have been explored, including search-based, constraint-based, random-based, and symbolic execution-based techniques for generating unit tests. In recent years, the rapid advancement of large language models (LLMs) has opened new avenues for automating various tasks, including the automatic generation of unit test scripts and test data. Despite their potential, using LLMs in a straightforward manner to generate unit tests may lead to low test coverage. This means that a significant portion of the source code, including specific statements or branches, may remain untested, which can reduce the effectiveness of the tests. To overcome this limitation, the paper presents a novel approach that not only automates the generation of unit test scripts and test data but also improves test coverage. The proposed solution begins by using an LLM tool (such as ChatGPT) to generate initial unit test scripts and data from the source code. To enhance test coverage, the specification document of the source code is also input into the LLM to generate additional test data. Following this, a coverage checking tool is used to evaluate the test coverage and identify untested statements or branches. The LLM is then applied again to generate new test data aimed specifically at addressing these gaps. The initial experimental results indicate that this method significantly improves test coverage, demonstrating its potential to enhance automated unit testing processes.

Keywords: Branch Coverage, LLM, Python, Statement Coverage, Test Data Generation, Unit Test.
Scope of the Article: