CURE: Code-Aware Neural Machine Translation for Automatic Program Repair

In Translation and proofreading


CURE: Code-Aware Neural Machine Translation for Automatic Program Repair - read the full article about machine translation, Translation and proofreading and from ICSE2021 Conference on Qualified.One
alt
ICSE2021 Conference
Youtube Blogger
alt

good afternoon everyone im nanjiang and i will give a presentation about our paper cure code aware neural machine translation for automatic program repair this work is done in collaboration with tea board loot layer from university of waterloo and our advisor professor lintan from purdue university manually fixing bugs is time consuming as developers spend half of their time fixing them thus automatic program repair which in short is apr is crucial to reduce manual software debugging efforts and in recent years many neural machine translation-based apr tools have been developed these nmt based tools first use their patch training data and a tokenizer to construct the search space and then use the training data to train nmt models and in practice for a given bug they use the trained nmt models to rank patches in the search space and use a search strategy to select candidate patches since finding the package of highest score is exponentially expensive thus these tools choose to use bim search strategy bim search is a greedy search algorithm which uses brass first search to build the search tree and at every step it selects the most promising nodes to expand by using beam search apr tools can generate a list of most promising candidate patches efficiently finally the candidate patches are validated against the developed written test suits until they find and output the correct patch that makes the patched program pass all the test suits however nmt based apr tools also have limitations first their search spaces meet the correct patches of some bugs and second existing nmt models and the beam search strategy generates a lot of uncomparable patches this is about camera benchmark where the learning yellow background is the buggy line and the lying green background is the exact patch that we hope to generate however we found existing nmt based tool generate many uncompatible patches that disobey the java syntax the first tool called the method with wrong number of arguments and the third one mismatches the parenthesis this shows the nmt model doesnt learn programming language well besides it also generates uncompatible patches that contain invalid identifiers the right identifiers in these two patches are not declared in the project this shows the beam search is eee and in order to address these limitations we propose cure in order to learn java programming language we propose a new api architecture that combines a protruding programming language model with nmt architecture and to generate valid identifiers we design a new code aware search strategy besides cure also applied software tokenization technique to construct a better search space that contains more correct patches as a result cure correctly fixes 83 bucks in two java benchmarks outperforming all the existing apr tools given the time i will only focus on the first and second points to show how the programming language model and the code aware search strategy helps cure generate more compatible and correct patches the details about the third points can be found in our paper to learn programming language we borrow the pre-training and fine-tuning workflow from natural language processing fields pre-training and fine-tuning have brought significant improvements to many natural language processing tasks one would featuring a general language model on a large natural language corpus to learn a natural language for example english then he could fine-tune the language model for a specific task like question answering the language syntax which includes the quality and accuracy of questionnaires so similarly we portrayed a gpt model which refers to generating between the transformer and java codebase to learn java syntax we choose gpt as the architecture of our programming language model because its good at generation the benefit is that existing nmt-based apr tools only use patches to train the empty models which only sees partial code snippets while our programming language model is trained on millions of complete developer written java methods which helps the programming language model to learn java syntax and how developers write code then during fine-tuning cure combines the pre-trained gpt programming language model with context-aware convolutional network as the entire api model and fine-tune it with our patch training data after training the api models in order to launch the beam search select and generate valid identifiers we design a code aware beam search that uses a valid identifier check strategy that is for a given buggy code cura will use a static analysis tool to extract all the valid identifiers for this bug from the buggy file package classes important classes and java keywords and then the valid identifier check strategy will force the beam search to only select and generate valid identifiers i will use an example to help you understand how valid identifier check strategy helps to generate valid identifiers lets consider this tokenize the correct patch that we want to generate to simplify the process i will use a beam size of two assume that the beam search have selected the correct tokens in the first six steps and in step seven since the beam size is two it selects the two nodes with the highest score fixed here the number in the node is this and at step eight for each selected node the nmt model will calculate the log probability and a score for all the tokens and the beam search will select the best tool to expand which are colored gray and at step 9 after the nmt model calculates the score the vanilla beam search will select ending and here to expand since they have higher scores and now the blue path to the correct patch is discarded since max is not selected and we miss the correct patch forever however with valid identifier check strategy it easily knows that this red path contains an invalid identifier max underscore here where the blue path contains mass.max which is valid for this bug thus it will set the score of here to negative infinite as its impossible to be correct then the beam search will choose ending and max to expand which includes the path to the correct patch in this example the vanilla beam search misses the path to the correct patch while the valid identifier check strategy promotes it and thats how valid identifier check strategy helps to generate valid identifiers we evaluate the cool previous java benchmarks including defects 4j which has 393 bugs and quickbooks which has 40 bucks we compare cure with 25 existing apr tools and only list the best ones here pure correctly fixes the most number of bugs on both benchmarks outperforming all the existing apr tools besides cure also fixes bugs that havent been fixed before this is a bugging defects4g benchmark that only cure can fix cure fixes it by adding mass.max to ensure the non-negativeness of the argument which is also a new fixed pattern discovered by cure existing nmp-based apr tools cannot fixate since this fix is uncommon in the patch training data in our 2.7 million patch training data there are only two similar fixes however adding math.max to ensure non-negativities is common in developed written code which is captured by our programming language model and thats why pureprefixes in conclusion in our paper we propose a new api architecture that combines between the programming language model and nmt architecture to learn both code syntax and fixed patterns and we propose a new code aware search strategy to find the more correct patches we also apply software tokenization to apr tasks to create better search space and by combining all the normatives above we develop cure for automatic program repair which fixes 83 bucks in two java benchmarks and thats all for my presentation thank you for your listening hello again everybody we are starting with our last paper today and we have nan jiang and lin tan with us nan will be in charge of answering any questions you may have about this great work they have they have done so so far we have one question so you can we have time for more questions okay because theres only one so far and okay so none guru bandari is asking well first of all hes congratulating you for your nice work and hes asking did you also calculate other performance measures like accuracy and f1 measure etc in your paper you have used bug related keywords to classify the bug or not how did you handle the false positive issues okay yeah its a good question so first um we followed the previous apr work to use the uh to calculate how many correct fixes and how many plasma patches our two could generate and then we compare our work with previous api work with such matrix and and our work could generate our code generates the most correct fixes and also most plausible patches than previous works and here the the crack fixes means the uh the fix is exactly what the developer expected and the plasma fix means the fix makes the program pass the test suit but its but its still not incorrect so in this case you could calculate the the kind of false positive by uh but you you could uh by you by using the plasma patches minus the correct physic fixes to get the false positive um yeah i think thats the matrix we use okay and about the false positive issues um how did you handle the false positive issues in europe yeah actually i i think um most apr tools could generate false positive patches that is the the plus or patches and and that is still incorrect so its uh i think theres several reasons the first one is the model is still not good enough so the model doesnt rank the correct one higher enough or the strategy is not intelligent enough to find the correct one and the second thing is uh we might need more um test cases to help us filter the the incorrect but the possible ones so this is something thats okay thank you very much for your answers for the moment there are no yes theres one more question just pop up so amir uh amir amir mir is asking what are the next steps for improving the model um yeah thats thats a good question so um by analyzing the limitation of our current model we find that um yeah although we could fix the most bugs we still have some limitations like first we find the the model uh the compatible rate of the candidate patches generated by our model is still not high enough so uh based on based on our analyzation we found it uh our model cure could generate only about 40 percent of kind of compatible patches so that still wastes a lot of candidate patches so we would like to introduce the identifier information during training stage since our current work only tried only design the code aware search strategy to involve the identifier information during bim search and in and for the models training we use the gpd programming language model to let itself to learn the java syntax so our future work might be introduce the java syntax or compiler information during training to to hard code this such knowledge into the model to uh to make the model be better aware of the software code characteristics yeah thats quite the future thank you very much theres one more question from hai peng kai and he says well hes first congratulating you for your great talk and your nice work and the question is the following have you considered testing the train model against a totally different data set from the training data set um yeah currently we we evaluate our model on two benchmarks including defect storage and quick sparks so uh yeah so we could we could explore other data sets such as beer stuff jar which uh which we are trying to explore in our future project so in our future project we we might introduce more tech test benchmarks and yeah and evaluate our apr tools and compare compare them okay thank you very much so hi hi tenkai also wants to know if you will release the code for cure because he mentions that currently only the data set is available uh yeah sure wait uh we um yeah i think we will try to release the code in the future so currently we are we are still in trying to manage the code and then after that we could release the code and also the data set on our github repository okay thank you amir min is has another question he wonders whether you could use transformers for this task um yeah thats a good question so um our our current model use gpt for the training programming language model and use convolutional network for the neural machine translation model um so act so first the gpd programming language model actually is transformer based so it so its its already used transformers and then for the neural machine translation parts their previous word states that convolutional network might be better to capture code information than transformer because code is not write linearly and and its a convolutional network could could capture the code information in different granularity well the transformer or lstm uh or such recurrent neural networks uh encoding this code sequentially with it which might not be the best model to consider okay so thank you very much hai penkai has one more question but since he has asked a previous one we are going to start first with carl chapman okay uh kurt chapman is congratulating you for your very interesting results and his question is out of all the things you could try how did you decide on the coda world search and super world tokenization did you try other approaches that did not work um okay thats uh thats an interesting question so um so actually what we did is first we analyzed the limitation of the existing an existing neural neural machine translation based api tools and and then we find out there are issues including the the lack of code awareness and the uh and auto vocabulary token problem and then we design our own architecture so so the applying of software tokenization is is actually widely used in natural language processing field so it and and has been shown very powerful to address the auto vocabulary problem so we just apply it and actually we it help help to address the uh the limitation quite well and then for the code awareness um we will apply the we apply the best performed language model initial language processing field which is the gpt model and and our code of web search is fully designed by us i think um yeah and and the experiment shows our normal combination is quite uh successful so in our future work we will also try other approach such as leverage applying the identify information in training stage so which we havent tried okay so we have only 30 seconds left so theres one more question from high paying high tenkai um so maybe you can answer his last question um in the private discussion room okay yeah theres not enough time so thank you everyone for attending this session thank you very much to lynn and to nan for being here and answering the questions

ICSE2021 Conference: CURE: Code-Aware Neural Machine Translation for Automatic Program Repair - Translation and proofreading