EDA using vanilla pandas can take hours depending on various factors including size of the dataset, it's complexity, number of features etc. You need to write lines of code to extract meaning out of the dataset. But what if you could do all those with just a single line of code?
What! How?
By using pandas-profiling.
pandas-profiling generates profile reports from a pandas DataFrame, by simply using a single line of code (which I will show you in just a minute). The pandas df.describe() function can be very useful, but it is a little primitive when it comes to some serious EDA. pandas_profiling extends the pandas DataFrame with df.profile_report() for quick data analysis.
Here are the details that are presented in the generated report (depending on how relevant they are for that respective data type):
Show me the code already!
Yes, coming back to it, let's first start with the basic step; installation. Installing pandas-profiling is pretty simple:
Or, you can simply install from the source. Download the source code by cloning the repository or by pressing 'Download ZIP' on this page. Install by navigating to the proper directory and execute the following code:
Once installed, implementing pandas-profiling is simple:
You can checkout their GitHub page here. They also have a detailed documentation page. Isn't this something! Have fun with it, but don't forget to show us some Facebook love by giving us a Like and a Share.