📘 Unsupervised Learning

🔹 Definition

Unsupervised Learning is a type of Machine Learning where the model is trained on unlabeled data, i.e., data without predefined outputs.

👉 Goal:
Discover hidden patterns, structures, or relationships in data.

This type of learning is particularly useful for discovering clusters in data, detecting anomalies, and finding associations. Techniques like clustering, association rule mining, and dimensionality reduction allow for deeper insights into complex datasets. It's widely used in customer segmentation, fraud detection, and market basket analysis.

🔹 Key Idea

Given only input data:

X = \{x_1, x_2, x_3, ..., x_n\}

👉 The model tries to:

Group similar data
Identify patterns
Reduce dimensionality

🔹 Characteristics

No labeled outputs
No “teacher”
Exploratory in nature
Used for pattern discovery

Pic Courtesy: database town

🔹Types of Unsupervised Learning

🟦 A. Clustering

📌 Definition

Clustering is the process of grouping data points into clusters based on similarity. It helps uncover hidden patterns and groupings within the data, which can be useful for customer segmentation, anomaly detection, or identifying trends.

Grouping data points into clusters such that:

Points in same cluster → similar
Points in different clusters → dissimilar

📊 Popular Algorithms

1. K-Means Clustering

Divides data into K clusters
Iteratively updates cluster centers

2. Hierarchical Clustering

Builds a tree (dendrogram)
Two types:
- Agglomerative (bottom-up)
- Divisive (top-down)

3. DBSCAN

Density-based clustering
Detects arbitrary shaped clusters

🟩 B. Association Rule Learning

📌 Definition

Finds relationships between variables in large datasets

📊 Key Algorithms

1. Apriori Algorithm

Finds frequent itemsets
Generates rules

2. FP-Growth

Faster alternative to Apriori

📌 Example:

Market Basket Analysis:

“Customers who buy bread also buy butter”

🟨 C. Dimensionality Reduction

📌 Definition

Reduce number of features while preserving important information

📊 Techniques

1. PCA (Principal Component Analysis)

Projects data to lower dimensions
Maximizes variance

2. t-SNE

Used for visualization

📌 Example:

Reducing 100 features → 2 features for visualization

🔹How It Works

Input raw data (no labels)
Measure similarity/distance
Apply algorithm
Discover structure

🔹Distance Measures (Important)

Used in clustering:

Euclidean Distance
Manhattan Distance
Cosine Similarity

🔹Example 1: Customer Segmentation

Input:

Age, income, spending

Output:

Customer groups

👉 Helps businesses target customers

📌 Detailed Explanation: Customer Segmentation

Customer segmentation is a common example of unsupervised learning.

Problem:

A company wants to group customers based on similarities in their purchasing behavior, without any predefined labels or categories.

Data Collected:

Age
Purchase history
Spending habits

Techniques Used:

The company applies unsupervised learning algorithms such as:

K-means clustering → to group similar customers
Principal Component Analysis (PCA) → to reduce data complexity

Key Methods & Insights

1. Clustering

Groups customers with similar behaviors into segments
Example segments:
- High-spending customers
- Budget-conscious buyers

2. Association Rule Mining

Finds relationships between purchases
Example:
➡️ Customers who buy product X are likely to buy product Y

3. Dimensionality Reduction

Simplifies the dataset by focusing on key features
Retains important patterns while reducing complexity
Example: focusing on age and spending habits

4. Anomaly Detection

Identifies unusual or rare behaviors
Example:
➡️ Customers with irregular or unexpected purchasing patterns

Key Idea:

➡️ Unsupervised learning discovers hidden patterns and structures in data without labeled outputs.

🔹 Example 2: Market Basket Analysis

Input:

Transaction data

Output:

Association rules

👉 Used in retail

🔹 Example 3: Anomaly Detection

Identify unusual patterns

👉 Example:

Fraud detection
Network intrusion detection

🔹Applications of Unsupervised Learning

🛒 Retail

Customer segmentation
Product recommendation

💰 Finance

Fraud detection
Risk analysis

🧬 Bioinformatics

Gene clustering
Protein analysis

📱 Social Media

Community detection
Trend analysis

🎥 Computer Vision

Image compression
Feature extraction

🔹 Advantages

No need for labeled data
Useful for exploratory analysis
Can discover unknown patterns

🔹 Disadvantages

Results may be hard to interpret
No clear evaluation metric
Sensitive to noise and parameters

🔹Comparison with Supervised Learning

Feature	Supervised Learning	Unsupervised Learning
Data	Labeled	Unlabeled
Goal	Prediction	Pattern discovery
Output	Known labels	Hidden structure

📝 Summary

Unsupervised learning uses unlabeled data
Main tasks:
- Clustering
- Association rules
- Dimensionality reduction
Used for:
- Pattern discovery
- Data exploration