One Haut Encoded

Choose a Style Profile

Each profile represents a real customer with a distinct fashion identity. Select one to see their purchase history and what our models recommend for them.

Curate Your Look

Pick items you love, and our KNN model will find pieces that match your taste. Select at least 3 items, then see what we’d recommend.

0 selected

How It Works

Three fundamentally different approaches to “what should this person wear next?”

Baseline

Popularity

Recommends the most-purchased items across all customers. No personalization — everyone sees the same bestsellers. Captures trending products but ignores individual taste entirely.

Strength: highest raw hit rate (10.0%)
Weakness: only surfaces 0.4% of the catalog

Classical ML

KNN Content-Based

Represents each article as a feature vector (product type, colour, department, section, garment group). Your profile is the average of everything you’ve purchased. Recommendations are the nearest neighbors by cosine similarity.

Strength: 99.2% catalog coverage, highest novelty
Weakness: can’t capture cross-category taste patterns

Deep Learning

Neural Collaborative Filtering

Learns latent embeddings for every user and item through a multi-layer perceptron trained on implicit feedback. The deployed variant concatenates one-hot metadata, giving the network both collaborative signals and content features.

Strength: captures “users who bought A also bought C” patterns
Weakness: cold-start for new users (no embedding trained)

Data & Pipeline

Built on the H&M Personalized Fashion Recommendations dataset

1

Subsample

Filtered 31M transactions to ~373K transactions, 25K customers, 3K articles with minimum activity thresholds.

2

Temporal Split

Last 14 days held out as the test set — no data leakage across the time boundary.

3

Feature Engineering

ResNet50 image embeddings (2048-dim) for 105K articles. One-hot metadata across 5 categorical columns (206 features total).

4

Ablation Study

8 model variants compared: base IDs only, +metadata, +visual features, +both. Isolates the contribution of each feature source.

Evaluation Results

Offline metrics on held-out test set (last 14 days), K=12

Model	HR@12	NDCG@12	Coverage	Novelty
Popularity (global)	0.0998	0.0145	0.0040	8.91
Popularity (dept)	0.0898	0.0159	0.2365	10.02
KNN (metadata)	0.0921	0.0187	0.9920	11.85
KNN (+ images)	0.0920	0.0187	0.9923	11.85
NCF (base)	0.0724	0.0136	0.7215	10.33
NCF (+ metadata)	0.0887	0.0178	0.8769	10.68
NCF (+ visual)	0.0718	0.0134	0.6891	10.18
NCF (full)	0.0873	0.0171	0.7942	10.45

Key Findings

Metadata > Image Features

Adding categorical metadata to NCF improved HR by 22%, while ResNet50 visual embeddings alone showed no improvement. Structured product attributes encode more purchase-relevant information than pixel-level appearance.

Popularity Is Hard to Beat

The baseline achieves the highest HR@12 despite zero personalization. Concentrated demand in fashion means a few bestsellers drive disproportionate volume.

KNN Excels at Discovery

99.2% catalog coverage and the highest novelty score. KNN surfaces the long tail — niche items that match your taste but you’d never find on a bestseller list.

The Accuracy–Diversity Tradeoff

No single model wins everywhere. A production system would ensemble popular anchors with personalized long-tail picks — accuracy and discovery aren’t mutually exclusive.

HR@12 — Hit Rate: fraction of test users whose actual next purchase appears in the top-12 recommendations
NDCG@12 — Normalized Discounted Cumulative Gain: measures ranking quality (higher = relevant items ranked first)
Coverage — fraction of the full catalog surfaced across all users’ recommendation lists
Novelty — average self-information of recommended items (higher = less predictable, more surprising picks)