Language models have become the talk of the town in recent years. These models process, produce and use natural language text to drive some groundbreaking AI applications. LLMs like GPT-3, T5 and PaLM performed significantly better. These models have begun to imitate humans by learning to read, complete code, summarize and generate text data. GPT-3, the model recently developed by OpenAI, has amazing capabilities and shows excellent performance. It features a Transformer architecture for word processing, creating a model that can easily create content and answer questions like a human would.
Researchers have constantly studied how natural language can communicate with computing devices. Not so long ago, LLMs have shown some improvements in interacting with such devices without the need for models or huge datasets. With this in mind, some researchers have developed a paper examining the practicality and feasibility of using a single Large Language model to initiate conversations with a mobile Graphical User Interface (GUI). Previous studies could only find few components to enable a conversational interaction with a mobile user interface (UI). It required task-specific models, huge datasets, and a lot of training. Also, not much progress has been observed in using LLMs for GUI interaction tasks. Researchers have now figured out how LLMs can be used to enable diverse interactions with mobile user interfaces. They have developed some prompting techniques to adapt an LLM to a mobile UI.
The team developed the prompting methods to enable the interaction designers to easily prototype and test the novel voice interactions with users. With this, the LLMs can modify how conversational interaction designs are operated and developed. This can save a lot of time, effort and money instead of deciding on models and datasets. The researchers have also developed an algorithm that can convert the view hierarchy data in an Android to HTML syntax. Because the HTML syntax is already present in the training data for LLMs, this allows LLMs to adapt to mobile UIs.
🚨 Be part of the fastest growing AI subreddit community with 15,000 members
The researchers experimented with four modeling tasks to ensure the feasibility of their approach. These are – screen question generation, screen summary, screen question answering and mapping instructions to UI actions. The results showed that their approach achieves competitive performance using only two data samples per task.
Screen Question Generation – LLMs surpassed previous approaches by influencing the UI context with input fields to generate questions. Screen Summary – Compared to the benchmark model (Screen2Words, UIST ’21), the study found that the LLMs can efficiently summarize the important features of a mobile user interface and produce more accurate summaries. Screen Question-Answering – Compared to the standard QA model, which correctly answers 36% of the questions, the 2-shot LLM provided exact match answers for 66.7% of the questions. Mapping Instructions to UI Actions – LLMs predict the UI object required to perform the taught action. The model did not surpass the benchmark model, but showed a great result with the help of just two shots.
The goal of enabling interaction between natural language and computing devices was an aspiration towards human-computer interaction. These recent studies may make this possible and bring a breakthrough in artificial intelligence.
Look at the paper. All credit for this research goes to the researchers on this project. Also, don’t forget to join our SubReddit, Discord Channel and email newsletter with 15,000+ ML where we share the latest AI research news, cool AI projects and more.
Tanya Malhotra is a senior at University of Petroleum & Energy Studies, Dehradun pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking skills and a passionate interest in learning new skills, leading groups and managing work in an organized manner.