The benchmark is constructed by collecting code snippets from LeetCode, implanting bugs with GPT-4, and conducting rigorous quality assessment. Liu et al. (2024e) addresses the challenge of automated Graphical User Interface (GUI) testing for mobile applications. They propose a novel approach ...