The Ultimate Cheat Sheet: Picking the Right Model, Optimizer & LR for Every Scenario

 In supervised, unsupervised, time-series, deep-learning, and reinforcement-learning tasks, every problem has a "sweet spot" of algorithms, solvers, and hyperparameter defaults. This guide eliminates the guesswork in choosing your model, optimizer, and learning rate (LR).


1. Regression Problems

Regression is about predicting continuous values. Your choice depends on data dimensionality and the need for interpretability.

ScenarioModelsSolver / OptimizerLR & Tips
Low-dimensionalLinear RegressionClosed-formNo LR; scale features
MulticollinearRidge, LassoCoordinate Descent$\alpha \approx 1e^{-3}$ to $1$; use Cross-Val
Sparse FeaturesLassoCoordinate DescentIncrease $\lambda$ to induce sparsity
Nonlinear/InterpretableDecision TreesGreedy Splittingmax_depth 3–10
SOTA BoostingXGBoost, LightGBMHistogram & GradientLR 0.01; use early stopping

2. Classification Problems

Classification handles discrete labels. The key here is boundary complexity and data scale.

ScenarioModelsSolver / OptimizerTips
Binary, LinearLogistic RegressionLBFGS / Liblinear$C=1.0$; scale inputs
Small DataKNN$k \approx \sqrt{n\_samples}$; standardize
High-dimensionalSVMSMO$C=1$; Kernel='rbf'
ProbabilisticGMMEMUse Elbow plot for components

3. Unsupervised Learning & Anomaly Detection

When labels are missing, you focus on structure and density.

  • Dimensionality Reduction: Use PCA (SVD solver). Aim for $n\_components \approx 0.95$ explained variance.

  • Density Clustering: Use DBSCAN. Set $eps \approx 0.5 \times avg\_dist$. Great for non-spherical shapes.

  • Anomaly Detection: Use Isolation Forest. Set contamination between 0.01–0.1.


4. Time-Series Modeling

Time-series requires accounting for seasonality and autocorrelation.

  • Stationary/Short History: ARIMA/SARIMA. Use Maximum Likelihood. Determine $p, d, q$ via ACF/PACF plots.

  • Deep Sequence Modeling: LSTM/GRU. Use Adam optimizer.

    • Tip: $lr=1e^{-3}$, clip gradients at $1.0$, and tune sequence length.

  • Similarity: Dynamic Time Warping (DTW). Use a window $\approx 10\%$ of series length.


5. Deep Learning Architectures

Deep learning performance is 90% optimizer and LR scheduling.

GoalModelsOptimizerLR & Scheduling
Tabular DataMLPAdam / SGD$lr=1e^{-3}$ (Adam); Weight Decay
Image TasksCNN (ResNet)AdamW / SGD$lr=0.1$ (SGD w/ Mom); Cosine Annealing
NLP/TransformersBERT / GPTAdamW$lr=2e^{-5}$ to $5e^{-5}$; 10% Warmup
GenerativeGANAdam$lr\_D=2e^{-4}, lr\_G=2e^{-4}$; Train D more

6. Reinforcement Learning

RL is notoriously unstable; choosing the right optimizer and LR is a safety requirement, not just a preference.

  • Discrete Actions: DQN. $lr=1e^{-4}$, target_update every 1000 steps.

  • Continuous Control: PPO / SAC.

    • PPO: $lr=2.5e^{-4}$, clip=$0.2$.

    • SAC: $lr=3e^{-4}$, $\tau=0.005$ (soft updates).


7. Implementation: The Automated Benchmark

Don't marry a model before dating the data. Use a pipeline to find the "just works" baseline.

Python
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from svm import SVC

# Define candidates
models = {
    "LogReg": Pipeline([("s", StandardScaler()), ("c", LogisticRegression())]),
    "RF": RandomForestClassifier(n_estimators=100),
    "SVM": Pipeline([("s", StandardScaler()), ("c", SVC())])
}

# Rapid Cross-Validation
for name, model in models.items():
    score = cross_val_score(model, X, y, cv=5).mean()
    print(f"{name}: {score:.3f}")

Summary Checklist

  1. Start Simple: If Logistic Regression hits 90%, don't waste time on a Transformer.

  2. Match Complexity: High noise = simpler models (Ridge/Lasso). High signal + high data = Boosting/Deep Learning.

  3. Optimizer Standard: Use Adam for Deep Learning and LBFGS/Coordinate Descent for classical ML.

  4. Scaling is Non-negotiable: Unless using trees, always apply StandardScaler

Comments

Popular posts from this blog

Beyond CRUD: Building a Scalable Data Quality Monitoring Engine with React, FastAPI, and Strategy Patterns

Architecting MarketPulse: A Deep Dive into a Enterprise-Grade Financial Sentiment Pipeline

Architecting GitQuery AI: A Deep Dive into Building a Production-Ready RAG System for GitHub Repositories