How to improve reading comprehension with LEIA-X

At the special session chaired by His Majesty the King “The Unity and Diversity of Spanish. Tradition and the challenge of artificial intelligence”, Telefónica, together with Microsoft, Google, Amazon and Meta, presented the advances of the LEIA initiative, which aims to help machines speak correct Spanish and ensure that the ones set up by the Royal Rules are respected Spanish Academy (RAE, by its acronym in Spanish) are respected by the AI ​​tools to support the generation and understanding of the language.

Committed to the Spanish language

At the event, Ángel Vilá, Chief Operating Officer at Telefónica, gave an overview of all the progress Telefónica has made to promote the correct use of Spanish in products and services at home, such as: B. The RAE Living App on Movistar Plus+ to see definitions or learn more about the language and RAE game available on the Movistar Home device. As a novelty, he presented the prototype LEIA-X, an extension for Chrome browsers that uses artificial intelligence to improve understanding of Spanish. This tool highlights the most appropriate meaning of a selected word depending on the context. It uses an AI model trained on more than 70,000 examples from various RAE dictionaries.

This functionality is especially useful for the more than 100 million non-native speakers of Spanish. In addition, by using automatic translation APIs, it is able to provide an answer in any language, always aiming to improve the user’s understanding of Spanish.

LEIA-X responds to the need to improve reading comprehension in a web browser on a laptop, e-book or simply a mobile phone. Today, all readers have access to a “lookup” or “define” feature that allows them to select a word and automatically open a dictionary window with the appropriate entry. From there, we as readers must navigate through all of the meanings to find the one that fits best; a task that distracts from reading, especially on small screens or devices that are not particularly fast. LEIA-X uses AI to provide an accurate definition of a word according to its context, making it much easier to read.

How LEIA-X works

The extension is based on an AI model specially trained on Spanish text (namely the BETO model[1]trained by the University of Chile) to solve a problem that doesn’t require huge large language models (LLMs) like GPT3 or 4: disambiguating the meaning of a word.

The original model (BETO) is trained by the University of Chile on a task known as “fill in the mask” which consists of masking a word given a phrase and asking the model to try to predict which word the best is fit. This method of machine learning is called “self-supervised”. If you do this enough times, the model is able to extrapolate which words relate to the context in the phrase, or what the mood of the phrase is, for example, or when a verb or noun is required. In short, the AI ​​model learns to extract knowledge or correlations between the words that make up a phrase.

To make a word unique in Spanish, you must use the context in which the word occurs. To give an example, the Spanish word “banco” (in English “bank” or “bank”) has different meanings depending on the context:

“I went to the bank to make a deposit”

Or if we say:

“I am sitting on a bench and reading a book”

While humans perform this process automatically and almost unconsciously, it is very complex for an algorithm to know which of the definitions of the word “banco” is being referred to. The only way to know this is to understand each of the words and how they relate to each other in a given context.

Based on the BETO model, LEIA-X was trained with a corpus of positive and negative examples of words with their meanings in the following way: given a word and a phrase, e.g. the word “banco” (“bank” or “bank”). “ in English) and the sentence:

“I went to the bank to make a deposit”

The model takes as input the different definitions of the word “banco” during the automatic learning process; including according to the RAE dictionary:

Seat, with or without backrest, on which two or more people can sit. A company that engages in financing operations using funds from its shareholders and customer deposits.

To build the LEIA-X training corpus, each sentence and target word was automatically tagged with its correct meaning and positive examples, or incorrect and negative usage examples.

The examples in the corpus will eventually take the following form:

I went to the “bank” to make a deposit [SEP][2] where “bench” means: seat, with or without backrest, on which two or more people can sit. [incorrect]

I went to the “bank” to make a deposit [SEP] where “Bank” means: A company that engages in financing operations using funds from its shareholders and customer deposits. [correct]

In this way, a corpus of more than 70,000 examples was built based on various dictionaries provided by the RAE. In the student dictionary, each meaning or definition of an entry has a positive example, the correct meaning. To complement this corpus, we also took advantage of the Dictionary of the Spanish Language (DLE, by its acronym in Spanish), in which about 15% of its meanings contain examples of use. Thanks to the generated corpus, the BETO model was adjusted by integrating disambiguation functions.

Once trained, the LEIA-X model is able to assign to each of the word-sentence pairs the confidence or likelihood that a particular meaning is correct. In the case of the Spanish word “banco” example, the model would assign a probability level close to 0% for the first sentence and a confidence level close to 100% for the second sentence, showing the latter as the most likely meaning. It has therefore been possible to make the word unambiguous.