Can an LSTM Neural Network learn to rewrite a C code into C++ knowing I trained it with a C/C++ instruction-equivalent dataset?
7 Answers
I think this is beyond the ability of current LSTMs. The problem is that good quality C++ generation requires inferring knowledge about the structure of the problem. LSTMs are not advanced enough (yet) to do this -- you'd need to train them about not just the code, but the problem that they were trying to solve.
It is not exactly the same but Andrej Karpathy has tried to train an LSTM to just generateC style code.
The results were interesting looking but they don't actually compile or do anything like that. Here's an example (from The Unreasonable Effectiveness of Recurrent Neural Networks -- Karpathy has done some amazingly cool work):
- /*
- * Increment the size file of the new incorrect UI_FILTER group information
- * of the size generatively.
- */
- static int indicate_policy(void)
- {
- int error;
- if (fd == MARN_EPT) {
- /*
- * The kernel blank will coeld it to userspace.
- */
- if (ss->segment < mem_total)
- unblock_graph_and_set_blocked();
- else
- ret = 1;
- goto bail;
- }
- segaddr = in_SB(in.addr);
- selector = seg / 16;
- setup_works = true;
- for (i = 0; i < blocks; i++) {
- seq = buf[i++];
- bpf = bd->bd.next + i * search;
- if (fd) {
- current = blocked;
- }
- }
- rw->name = "Getjbbregs";
- bprm_self_clearl(&iv->version);
- regs->new = blocks[(BPF_STATS << info->historidac)] | PFMR_CLOBATHINC_SECONDS << 12;
- return segtable;
- }
It looks kind of like C, but from a coding perspective, it's just nonsense. e.g. it doesn't declare variables, what the hell is "Getjbbregs" or "historidac" etc. If you can't generate code in the same language, chances of making it learn across the language barrier are small.
LSTMs are turing-complete. Thus given a large enough network, enough training data, and a good enough training algorithm (are the present ones capable of doing this? I don’t know!), they intrinsically have the capability to do this.
However I would be surprised if the present networks can do this in practice.
You can nonetheless look at language translation research (human languages, that is). It could be an interesting project.
Translating between languages is probably easier than generating code from scratch.
The answer depends in large part on what you mean by rewrite c code in c++. Most c code is valid c++ code, but you probably want something more ‘c++ like’.
If you have tons of training data and stick to fairly simple transformations - then maybe to probably - as long as you are somewhat tolerant of bugs :)
Any sort of complex transformation - probably not.
First of all, that is the wrong way to go.
C++ is not better as a computer language, but better in terms of dealing with human limitations. C is closer to the hardware, and therefore more efficient and faster. We created C++ not as an improvement, but one that allows humans to more easily look at one aspect of a large programming problem at a time.
And the goal is not to limit computers to be more like humans, but to allow them to be greater than humans, without all our conceptual limitations.
So instead of trying to turn C into C++, what neural net are good for is making better, faster, and more efficient C programs out of inefficient C++ programs intended to be more easily perceived by humans. In effect, what we should want is an optimizing compiler that generates the best machine language code from the most abstract of human perceptions. So going from C to C++ is backwards. What we want is for our C++ to be turned into the best C possible.
Comments
Post a Comment
Comment here.