A group of researchers from the Chinese Academy of Sciences and Monash University have presented a new approach to generating text input for mobile app testing, based on a pre-trained Large Language Model (LLM). The approach, called QTypist, was evaluated on 106 Android apps and automated testing tools and showed a significant improvement in testing performance.
A major obstacle to automating mobile app testing is the need to generate text input, the researchers say, which can be challenging even for human testers. This is a consequence of the fact that different categories of input may be required, including geolocation, addresses, health measures, as well as the relationship that may exist between different inputs required on consecutive input pages, leading to validation limitations. Furthermore, as one of the authors of the paper further explains Twittertyping in an app view determines which other views are shown.
Large language models (LLMs) such as BERT and GPT-3 have been shown to be capable of writing essays, answering questions, and generating source code. QTypist attempts to leverage the ability of LLMs to understand a mobile app’s prompts to generate meaningful output that can be used as text input to the app.
Starting with a text input GUI page and its corresponding view hierarchy file, we first extract the text input context information and design linguistic patterns to generate prompts for input into the LLM. To increase the performance of LLM in mobile input scenarios, we develop a prompt-based data construction and tuning method that automatically creates the prompts and responses for model optimization.
As a first step, QTypist uses a GUI testing tool to extract contextual information for a GUI view, including metadata about input widgets, e.g. B. User notices, context information about nearby widgets, and global context like the activity name.
The prompt generation step relies on the three categories of extracted information to create a prompt based on a set of patterns defined by linguistic authors working on a set of 500 reference apps.
This process yields 14 linguistic patterns, each related to the input widget, local context, and global context […]. The input widget patterns explicitly state what to input into the widget, and we use keywords like nouns (widget[n]), verb (widget[v]) and preposition (widget[prep]) to design the pattern.
Finally, the prompt record is used as input to GPT-3, whose output is used as input content. The effectiveness of this approach was evaluated by comparing it to the baselines of a number of alternative approaches, including DroidBot, Humanoid, and others, as well as human assessment of the quality of the generated inputs. In addition, the researchers performed a usefulness assessment of 106 Android apps available on Google Play by integrating QTypist with automated testing tools. In all cases, QTypist managed to improve the performance of existing approaches.
While the initial work by the research team behind QTypist is promising, more work is needed to extend it to cases where the app doesn’t provide enough contextual information and to apply it to cases beyond GUI testing.