Machine Learning Fundamentals New
A sourced reference on Machine Learning Fundamentals.
What is machine learning?
Machine learning is a subfield of artificial intelligence in which computer systems learn from data to improve performance on tasks without being explicitly programmed. Algorithms identify patterns, make decisions, and refine predictions through experience rather than hard-coded rules. [Source: MIT CSAIL]
What is deep learning and how does it differ from machine learning?
Deep learning is a subset of machine learning that uses artificial neural networks with many layers to learn hierarchical representations of data. While all deep learning is machine learning, standard ML often requires hand-crafted features; deep learning discovers features automatically from raw data. [Source: Stanford HAI]
What is the difference between supervised and unsupervised learning?
Supervised learning trains models on labeled input-output pairs so predictions can be made on new data, while unsupervised learning finds hidden structure in unlabeled data without predefined targets. A third paradigm, reinforcement learning, learns via reward signals rather than labeled examples. [Source: NIST]
What is an artificial neural network?
An artificial neural network is a computational model loosely inspired by biological brains, composed of interconnected nodes (neurons) organized in layers. Each connection carries a weight adjusted during training; the network learns by propagating errors backward through layers to minimize prediction loss. [Source: IEEE]
How does backpropagation work in training neural networks?
Backpropagation computes the gradient of a loss function with respect to each weight by applying the chain rule of calculus layer by layer from output to input. An optimizer such as stochastic gradient descent then updates weights in the direction that reduces loss, iterating until convergence. [Source: MIT OpenCourseWare]
What is gradient descent in machine learning?
Gradient descent is an iterative optimization algorithm that minimizes a loss function by repeatedly adjusting model parameters in the direction opposite to the gradient. Variants include batch, stochastic, and mini-batch gradient descent, each trading computation cost for convergence stability during training. [Source: Stanford University]
What is the learning rate and why does it matter?
The learning rate is a hyperparameter that controls how much model weights are adjusted after each gradient update. Too large a value causes training to diverge; too small leads to slow convergence or local minima. Adaptive optimizers like Adam adjust it automatically per parameter during training. [Source: Stanford University]
What is overfitting in machine learning and how is it prevented?
Overfitting occurs when a model learns training data noise rather than generalizable patterns, performing well on training examples but poorly on unseen data. Prevention techniques include regularization (L1/L2), dropout, early stopping, cross-validation, and increasing training data volume or diversity. [Source: NIST]
What is regularization in machine learning?
Regularization adds a penalty term to a model's loss function to discourage overly complex solutions and improve generalization. L1 regularization (Lasso) promotes sparsity by zeroing weak features; L2 (Ridge) shrinks all weights proportionally. Both reduce overfitting without requiring additional training data. [Source: MIT OpenCourseWare]
What is the bias-variance tradeoff?
The bias-variance tradeoff describes a fundamental tension: high-bias models underfit by making simplistic assumptions, while high-variance models overfit by being too sensitive to training data. Optimal model complexity balances these two sources of error to minimize total generalization error on unseen examples. [Source: Stanford HAI]
What is cross-validation and when should it be used?
Cross-validation is a model evaluation technique that partitions data into k subsets, trains on k-1 folds, and tests on the remaining fold, rotating until every subset has served as a test set. It provides a reliable generalization estimate when data is limited and prevents evaluation on training data. [Source: NIST]
What is an activation function in a neural network?
An activation function introduces non-linearity into a neural network, enabling it to model complex relationships. Common choices include ReLU (Rectified Linear Unit), sigmoid, and softmax. Without non-linear activations, stacking layers would be mathematically equivalent to a single linear transformation, severely limiting model capacity. [Source: MIT OpenCourseWare]
What is reinforcement learning?
Reinforcement learning trains an agent to maximize cumulative reward by interacting with an environment, learning which actions lead to favorable outcomes through trial and error. Unlike supervised learning, no labeled dataset is required; the reward signal itself guides learning via policies updated over many episodes. [Source: DeepMind / Nature]
What are transformer models in machine learning?
Transformers are neural network architectures introduced in the 2017 paper 'Attention Is All You Need' that rely on self-attention mechanisms rather than recurrence to model relationships across sequences. They underpin large language models like GPT and BERT and have been extended to vision, audio, and multimodal tasks. [Source: Google Research / arXiv]
What is transfer learning and why is it useful?
Transfer learning reuses a model pre-trained on a large dataset as a starting point for a related task, significantly reducing the labeled data and compute needed for good performance. Fine-tuning only the final layers adapts general representations to domain-specific tasks such as medical imaging or legal text classification. [Source: Stanford HAI]
What is feature engineering in machine learning?
Feature engineering is the process of selecting, transforming, or creating input variables from raw data to improve model performance. Effective features encode domain knowledge, reduce dimensionality, and help algorithms identify meaningful patterns. It remains critical in classical ML, though deep learning automates much of this process from raw inputs. [Source: NIST]
What is dimensionality reduction and what techniques are commonly used?
Dimensionality reduction compresses high-dimensional data into fewer features while preserving important structure, reducing computational cost and mitigating the curse of dimensionality. Principal Component Analysis (PCA) and t-SNE are widely used; PCA finds linear projections of maximum variance, while t-SNE visualizes clusters in two or three dimensions. [Source: MIT OpenCourseWare]
How do you evaluate the performance of a machine learning model?
Model evaluation depends on the task: classification uses accuracy, precision, recall, F1-score, and ROC-AUC; regression uses MAE, RMSE, and R². All metrics should be computed on a held-out test set or via cross-validation to reflect true generalization rather than memorization of training examples. [Source: NIST]
What is fairness in machine learning and why does it matter?
ML fairness refers to ensuring that model predictions do not systematically disadvantage individuals based on protected attributes such as race, gender, or age. NIST's AI Risk Management Framework identifies fairness as a core trustworthy-AI property and recommends bias testing throughout the model development lifecycle. [Source: NIST]
What is model interpretability and why is it important?
Model interpretability describes the degree to which humans can understand why a model produces a given prediction. It is critical for debugging, regulatory compliance, and building user trust. Techniques include SHAP values, LIME, saliency maps, and attention visualization, each suited to different model types and explanation goals. [Source: NIST AI RMF]