The Ultimate Cheat Sheet: Picking the Right Model, Optimizer & LR for Every Scenario
In supervised, unsupervised, time-series, deep-learning, and reinforcement-learning tasks, every problem has a "sweet spot" of algorithms, solvers, and hyperparameter defaults. This guide eliminates the guesswork in choosing your model, optimizer, and learning rate (LR).
1. Regression Problems
Regression is about predicting continuous values. Your choice depends on data dimensionality and the need for interpretability.
| Scenario | Models | Solver / Optimizer | LR & Tips |
| Low-dimensional | Linear Regression | Closed-form | No LR; scale features |
| Multicollinear | Ridge, Lasso | Coordinate Descent | $\alpha \approx 1e^{-3}$ to $1$; use Cross-Val |
| Sparse Features | Lasso | Coordinate Descent | Increase $\lambda$ to induce sparsity |
| Nonlinear/Interpretable | Decision Trees | Greedy Splitting | max_depth 3–10 |
| SOTA Boosting | XGBoost, LightGBM | Histogram & Gradient | LR 0.01; use early stopping |
2. Classification Problems
Classification handles discrete labels. The key here is boundary complexity and data scale.
| Scenario | Models | Solver / Optimizer | Tips |
| Binary, Linear | Logistic Regression | LBFGS / Liblinear | $C=1.0$; scale inputs |
| Small Data | KNN | — | $k \approx \sqrt{n\_samples}$; standardize |
| High-dimensional | SVM | SMO | $C=1$; Kernel='rbf' |
| Probabilistic | GMM | EM | Use Elbow plot for components |
3. Unsupervised Learning & Anomaly Detection
When labels are missing, you focus on structure and density.
Dimensionality Reduction: Use PCA (SVD solver). Aim for $n\_components \approx 0.95$ explained variance.
Density Clustering: Use DBSCAN. Set $eps \approx 0.5 \times avg\_dist$. Great for non-spherical shapes.
Anomaly Detection: Use Isolation Forest. Set
contaminationbetween 0.01–0.1.
4. Time-Series Modeling
Time-series requires accounting for seasonality and autocorrelation.
Stationary/Short History: ARIMA/SARIMA. Use Maximum Likelihood. Determine $p, d, q$ via ACF/PACF plots.
Deep Sequence Modeling: LSTM/GRU. Use Adam optimizer.
Tip: $lr=1e^{-3}$, clip gradients at $1.0$, and tune sequence length.
Similarity: Dynamic Time Warping (DTW). Use a window $\approx 10\%$ of series length.
5. Deep Learning Architectures
Deep learning performance is 90% optimizer and LR scheduling.
| Goal | Models | Optimizer | LR & Scheduling |
| Tabular Data | MLP | Adam / SGD | $lr=1e^{-3}$ (Adam); Weight Decay |
| Image Tasks | CNN (ResNet) | AdamW / SGD | $lr=0.1$ (SGD w/ Mom); Cosine Annealing |
| NLP/Transformers | BERT / GPT | AdamW | $lr=2e^{-5}$ to $5e^{-5}$; 10% Warmup |
| Generative | GAN | Adam | $lr\_D=2e^{-4}, lr\_G=2e^{-4}$; Train D more |
6. Reinforcement Learning
RL is notoriously unstable; choosing the right optimizer and LR is a safety requirement, not just a preference.
Discrete Actions: DQN. $lr=1e^{-4}$, target_update every 1000 steps.
Continuous Control: PPO / SAC.
PPO: $lr=2.5e^{-4}$, clip=$0.2$.
SAC: $lr=3e^{-4}$, $\tau=0.005$ (soft updates).
7. Implementation: The Automated Benchmark
Don't marry a model before dating the data. Use a pipeline to find the "just works" baseline.
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from svm import SVC
# Define candidates
models = {
"LogReg": Pipeline([("s", StandardScaler()), ("c", LogisticRegression())]),
"RF": RandomForestClassifier(n_estimators=100),
"SVM": Pipeline([("s", StandardScaler()), ("c", SVC())])
}
# Rapid Cross-Validation
for name, model in models.items():
score = cross_val_score(model, X, y, cv=5).mean()
print(f"{name}: {score:.3f}")
Summary Checklist
Start Simple: If Logistic Regression hits 90%, don't waste time on a Transformer.
Match Complexity: High noise = simpler models (Ridge/Lasso). High signal + high data = Boosting/Deep Learning.
Optimizer Standard: Use Adam for Deep Learning and LBFGS/Coordinate Descent for classical ML.
Scaling is Non-negotiable: Unless using trees, always apply
StandardScaler
Comments
Post a Comment