Inspired by GPT-3, Kartik Godawat and Deepak Rawat has created a Jupyter extension which converts queries in natural English language into relevant Python code. The extension, named Text2Code, is a supervised model that can work on a predefined training pipeline. The model approaches the problem into the following components:
Here's a quick demo to show the capabilities of this model, this was prepared using the Chai Time Data Science dataset from Kaggle by Sanyam Bhutani:
Do the programmers need to feel worried?
Although, from the video that's what it looks like, but not yet. This model has lot of room for improvement. It can generate codes only in Python. It has support only for Ubuntu and macOS, the authors are still working on adding Windows to the list. They also needs to add support for more code, improve intent detection and NER, explore sentence Paraphrasing to generate higher-quality training data, gather real-world variable names, library names as opposed to randomly generating them as it is done now.
However, Kartik Godawat and Deepak Rawat think that with enough data, they will be able to train a language model to directly do English-to-code generation like GPT-3 does, instead of having separate stages in the pipeline. To do that, they have planned to create a survey to collect linguistic data. The code is not production-ready, but it is good enough for anyone to modify and use it on their own. You too can give it a try, here is the GitHub repository from where you find the information on how to install Text2Code locally, and here is our Facebook page where you can show us some social media love.