In Translation and proofreading
ACL 2021: Do Context-Aware Translation Models Pay the Right Attention? - read the full article about machine translation, Translation and proofreading and from Kayo Yin on Qualified.One

Youtube Blogger

Hello, Im Kayo, and today I will present our work that tries to answer the question "Do Context-Aware Translation Models Pay the Right Attention?" So to begin, why do we need context during translation? Well, lets take a look at this example. Here, what does the word "mole" in this sentence refer to? Well, if the previous sentence was "Things could start to get dangerous if the ministers find out", then the mole probably refers to a spy. But if the previous sentence was "Could it be anything serious, Doctor?" the mole refers to a birthmark instead. So depending on context, the meaning of the word changes, and therefore its translation depends on the context as well. Current Neural Machine Translation models are reasonably good at sentence-level translation for high resource language pairs, such as English and French. So for example, a popular provider can correctly translate the sentence here. However, when the context changes, the correct translation should now be "cet grain de beauté" for the mole, but the model does not pick up on the change in meaning. So current models often fail to produce adequate translations on a document level when there are ambiguous words that require context to resolve. Another example is for the translation of the neutral English pronoun "they", where depending on the context, here "they" refers to "implications" which is a feminine noun in French, so the pronoun should be the feminine "elle" instead of the masculine "il".
To address the difficulties in document level translation and the importance of context, several methods over the last four or five years have been proposed to incorporate context in Neural Machine Translation. But even with the necessary context, these models perform poorly on translating relatively simple discourse phenomena, such as anaphoric pronouns. In this example, "it" refers to "report", which is a masculine noun in French, so the pronoun should be the masculine "il". To try to find out why the model made this error, lets take a look at which tokens the model paid the most attention to, which we highlight in yellow here. We can see that the model pays high attention to the word "infirmary", which is a feminine noun when translated into French, but the model does not pay attention to "report" or "rapport", which would have helped it translate the word accurately. This may explain why the model made this error.
In general, context-aware machine translation models have been found to often attend to uninformative tokens in context, or do not use the information contained in the context at all. We, therefore, ask ourselves the following research questions: First, in context-aware translation, what context is useful to disambiguate hard translations such as ambiguous pronouns or word senses? Two, are context-aware machine translation models paying attention to the relevant context or not? Three, if not, can we encourage them to do so? First, we conducted a user study to collect the supporting context words that human translators use for disambiguation. We asked 20 professional English-French translators to select the correct translation and then highlight all the supporting context words that they used to answer. We performed this study for two tasks: first, pronoun anaphora resolution, where they choose the correct gendered French pronoun that is associated with a neutral English pronoun, then word-sense disambiguation, where the translator chooses the French translation for a polysemous English word.
We gave translators varying amounts of the previous sentences in the English source side and/or the French target side as context, and we analyzed when translators are able to answer accurately and with high confidence depending on how much and what context was given. We also analyzed the supporting context words that have been selected by translators, by looking at where these words are: is it in the current sentence or three sentences before? Whether it is an English source or a French target word, and then its features such as the part of speech and syntactic dependencies. You can look at our paper for the full analysis and the results.
Our main findings are that for pronouns, the previous context sentences are the most useful, especially on the target side, and we find that humans especially rely on the pronoun antecedent, or in other cases the other reference of the pronoun in the target side. The same coreference chain in the English side is not as useful, because the chain in French can carry information about gender whereas in English it does not carry any information about gender. Now, during word-sense disambiguation, the current sentence in either language is often sufficient. For example, "charme" in French means the quality of being charming, while "porte-bonheur" is a good luck charm. We find that humans often use words that can indicate the role or meaning of the polysemous word. Moreover, the source and target side often contain an equal amount of semantic load which is used for word-sense disambiguation, which is why either side seems to be as useful. After our user study, we also annotated the supporting context for 14 thousand examples of pronoun anaphora resolution in English-French, and we release the SCAT dataset.
Next, to evaluate whether models pay attention to the relevant context, we quantify how much model attention is aligned with SCAT. For our experiments, we use the standard Transformer translation model, but instead of only taking the data sentence by sentence as we do for sentence-level translation, we incorporate the five previous source and target sentences as the context by concatenating them to the current sentence, that is then fed into the model. We use 14 million parallel English-French sentences from the OpenSubtitles dataset for training. To quantify the alignment between the humans and the models attentions, we construct vectors that represent the SCAT annotations and the model attentions while translating the ambiguous pronoun. Then, taking this SCAT vector and the model attention vector, we first sort the tokens by decreasing model attention weights. Then, we look for the rank of the first supporting context token from SCAT in the sorted vector. In this example, the alignment score becomes 2, and the more attention the model assigns to the supporting context, the lower the alignment score. We also used two other additional alignment metrics that you can find in our paper :) Using this metric, we compare the alignment score of a uniform distribution with the alignment score of our model attention. For the model attention, we measure alignment with SCAT for the encoder self-attention, the decoder cross-attention, and the decoder self-attention. We find that the alignment between the encoder self-attention and SCAT is slightly better than the alignment score of a uniform distribution, but attentions and the decoded layers especially have very low alignment. In general, context-aware translation models do not seem to pay attention to the relevant context. We, therefore, use SCAT to try to increase the model-human alignment. We train a context-aware model on OpenSubtitles with the standard negative log-likelihood loss. We additionally sample from SCAT during training and we introduce the attention regularization loss to supervise the model attention.
We measure model performance using corpus-level BLEU and COMET. However, words such as ambiguous pronouns represent only a small portion of all words in data, so corpus-level metrics such as BLEU and COMET may not clearly capture improvements in translating discourse phenomena that are still very important for document-level translation. We therefore also compute the mean word F-measure of the translations of the ambiguous pronouns with respect to the reference pronouns, and we also perform contrastive evaluation, where we measure how often the model assigns a higher probability to the correct translation than a translation where the ambiguous pronoun is incorrect.
We find that attention regularization improves translation across all metrics and especially on metrics that are targeted to pronouns. We can conclude that regularizing attention with SCAT can effectively improve ambiguous pronoun translation. We also find that models with attention regularization obtain better attention alignment with SCAT. We can also see that the model with attention regularization assigns higher attention to the words "report" and "rapport" in this example while translating the ambiguous pronoun, and then is able to translate the pronoun correctly. This suggests that models that attention regularization with SCAT can encourage models to pay the right attention and thus allowing them to translate ambiguous words correctly.
Our paper contains more experiments that demonstrate that models with attention regularization with SCAT rely more on the supporting context that has been selected by humans, and that regularizing the encoder self-attention gives the largest improvements in translation performance compared to regularizing other types of model attention. Performance on word-sense disambiguation does not improve much when we supervise the model attention using human rationales for pronoun anaphora resolution. To summarize, we asked humans to tell us what context is useful to translate ambiguous words and we collected a corpus of 14 000 supporting context. Then, we use the SCAT dataset to measure alignment between human and model attention, and we find that previous context-aware models have very low alignment. We, therefore, use SCAT to regularize attention in context-aware translation models and we thus obtain better model-human alignment, better context usage, and better translation quality. You can find more information on our work in our paper as well as the code and data that are publicly available, and we thank you for your attention :)
Kayo Yin: ACL 2021: Do Context-Aware Translation Models Pay the Right Attention? - Translation and proofreading