Save yourself hours of Data Analysis with this single line of Python code

I am sure you have heard of "work smart, not hard." But did you know how to apply it in real life when you are doing Exploratory Data Analysis (EDA) on a dataset?

EDA using vanilla pandas can take hours depending on various factors including size of the dataset, it's complexity, number of features etc. You need to write lines of code to extract meaning out of the dataset. But what if you could do all those with just a single line of code?

What! How?
By using pandas-profiling.

pandas-profiling generates profile reports from a pandas DataFrame, by simply using a single line of code (which I will show you in just a minute). The pandas df.describe() function can be very useful, but it is a little primitive when it comes to some serious EDA. pandas_profiling extends the pandas DataFrame with df.profile_report() for quick data analysis.

Here are the details that are presented in the generated report (depending on how relevant they are for that respective data type):

Type inference: detect the types of columns in a dataframe.

Essentials: type, unique values, missing values

Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range

Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness

Most frequent values

Histogram

Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices

Missing values matrix, count, heatmap and dendrogram of missing values

Text analysis learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data

File and Image analysis extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information

Show me the code already!
Yes, coming back to it, let's first start with the basic step; installation. Installing pandas-profiling is pretty simple:

Or, you can simply install from the source. Download the source code by cloning the repository or by pressing 'Download ZIP' on this page. Install by navigating to the proper directory and execute the following code:

Once installed, implementing pandas-profiling is simple:

You can checkout their GitHub page here. They also have a detailed documentation page. Isn't this something! Have fun with it, but don't forget to show us some Facebook love by giving us a Like and a Share.

LATEST UPDATES

Main Menu

Save yourself hours of Data Analysis with this single line of Python code

Breaking News

Connect on Facebook

Categories

About Us

Popular Posts

Blogroll

Stats for Nerds

Pages

LATEST UPDATES

Main Menu

Save yourself hours of Data Analysis with this single line of Python code

You Might Also Like

Breaking News

Connect on Facebook

Categories

About Us

Popular Posts

Blogroll

Stats for Nerds

Pages