Machine Learning Guidelines¶

Essential Reading¶

There are no real shortcuts, only tradeoffs. Read these foundational documents for a comprehensive machine learning perspective:

ML-Specific Principles¶

Specify clear hypotheses before running experiments. Machine learning research suffers from unclear hypotheses, leading to p-hacking where you search for results that aren't really results. Define exactly what you want to test and how you'll measure it.

Always establish baselines. Specify null hypotheses for comparison—how do you know your approach is good? What defines "good" for your specific problem?

Optimize for experimental velocity: Design experiments to answer one specific question, ignoring everything else. Fast iteration beats perfect experiments.

ML Data and Model Practices¶

Visualize ML data extensively: - Plot data distributions—way more than you default to - Visualize model outputs, loss curves, and activation patterns - Look for data patterns, outliers, and distribution shifts

Build model demos: Create simple demo apps for every model—helps understand what the model actually learned and its failure modes.

Verify ML assumptions: - Check data preprocessing steps - Validate train/test splits and data leakage - Examine model inputs and outputs at each stage - Test edge cases and failure modes

Experiment Tracking¶

Weights & Biases: Use it consistently with descriptive tags. Track hyperparameters, metrics, and model artifacts for reproducibility.

Tricks¶

To make things somewhat deterministic, use a snippet like

def make_deterministic(seed: int = 0):
    seed = int(seed)
    if seed == -1:
        return
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False