ML Optimization Lab

7-Day ML Leaderboard Booster Kit

By Mathurin Aché, Kaggle Grand Master (public Kaggle profile: kaggle.com/mathurinache).

This kit is designed for data scientists who want a sharper optimization workflow without chasing noise.

Day 1: Freeze the Validation Contract

Write the target, split logic, metric, and leakage risks before training.
Keep a fold_id column in every experiment file.
Create one baseline that you are willing to keep as the reference for a full week.

Day 2: Build a Clean Baseline

Start with simple preprocessing.
Save out-of-fold predictions.
Log seeds, folds, metric, and notes.
Do not tune before the baseline is reproducible.

Day 3: Leakage Sweep

Check time leakage, grouped entities, duplicates, target-derived features, and preprocessing fitted on full data.
If the validation score is suspiciously high, treat it as a bug until proven otherwise.

Day 4: Search Space Discipline

Start broad, then shrink ranges from observed failures.
Avoid adding ten knobs before learning from three.
Repeat promising trials with another seed.

Day 5: Error Slicing

Slice errors by feature bins, categorical levels, prediction confidence, and time.
Promote changes that improve the painful slices without destroying global performance.

Day 6: Ensemble Only After OOF Hygiene

Blend out-of-fold predictions, not leaderboard guesses.
Keep a simple weighted average baseline before stacking.
Reject ensembles that only work on one split.

Day 7: Decision Review

Keep the changes that survived repeated evidence.
Document what failed.
Turn the best workflow into a reusable notebook.

Next step: package these habits into repeatable assets with the Kaggle GM ML Optimization Pack.