Home
Machine Learning
All You Need to Know about Data Mining Pipelines
All You Need to Know about Data Mining Pipelines
Rajtilak Bhattacharjee
-
May 17, 2020
Data mining is a process of collecting, cleaning, processing, analyzing and gaining useful insights of the data.
In present era, almost all automated processes like financial modelling, IoT, recommendation in retail marketing, use data. Data is mostly available in unstructured form that means it’s not possible to gain any insight from the data directly. That’s why data mining is needed to extract useful information from data. Through data mining, data is converted into one structured format so that it can be used more efficiently.
Pipeline for data mining process:
1. Data collection: Data collection is highly domain and application specific but plays a critical role in data mining process. Databases are used to collect the data when data is huge.
2. Feature extraction and data cleaning: Once data is collected, it may be in any form like web scrapped data, free form document or log of the file. Feature extraction is done to extract relevant features along with data cleaning like remove or correct missing values. As a end result of feature extraction and data cleaning, data obtained is in well structured format and can be used for computer program to process further. This whole process is known as data pre-processing.
3. Analytical Processing: It is considered as final part of data mining where analytical algorithms are designed with the help of processed data. According to use case or problem statement, it is decided that how the data should be clustering through analytical algorithm or processing.
Case Study
Consider a case from retail industry where a company wants to recommend its users or buyers their products according to customer’s preferences. For that company has customer’s log data i.e. which web pages customer has visited more and their profile information. Based on buying behavior and demographic data, company wants to recommend its customer specific products. How to design solution for such problem?
Solution architecture
Data collection: for this process, analyst has to collect two types of the data. First, log data from company’s website. Second, user’s profile information from company’s database.
Data cleaning and feature extraction: Log information will have multiple data types like numeric (IP address), text data, date and timings, product information. Analyst has to sort all kind of information available and extract the relevant information in structured form. During data processing, analyst record the data as attributes for each customer, integrate with customer’s demographic information.
Analytical processing: After pre-processing, analyst has to decide, how to use cleaned data for recommendation engine like how it should be clustered based on user’s preference or demographic information or if any other pattern observed.
This is a guest post by Arpita Gupta. Arpita Gupta is working as a Data Scientist at Accenture. She has research and development experience in Deep Learning, Machine Learning, and Data Mining techniques. Arpita likes to share her knowledge in Machine Learning through her website Let the Data Confess. She has done her post-graduation from BITS Pilani with M.E. Degree in Embedded Systems.
Breaking News
Connect on Facebook
Categories
Add-in
Adsense
Alteryx
Android
Apple
Apps
Artificial Intelligence
Blogger
Blogging
Browser
Business Intelligence
Chrome
Coding
Computer Vision
Data Analytics
Data Science
Data Visualization
Deep Learning
Downloads
EDA
Excel
Extension
Firefox
Gaming
Gartner
GitHub
Gmail
Google
Google Domains
Google Sheet
GPT3
Guest Post
How To
Humor
IEEE
Instagram
Interview
iOS
iPhone
Job
Jupyter
Kotlin
Language
Machine Learning
Macro
Mathematics
Medium
Microsoft
Mobile
NLP
Office
Opera
Paid Post
Pandas
Pixel
PowerPoint
Programming
PUBG
Python
R
Reddit
Safari
SAP
Security
Service
Social Media
Tableau
Templates
Tool
Training
VBA
VGG16
Video
Visualization
WhatsApp
Windows
Windows Phone
Word
WordPress