Stanford CS329H: Machine Learning from Human Preferences | Autumn 2024 | Voting
BeginnerKey Summary
- •This lecture kicks off an Introduction to Machine Learning course by explaining how the class will run and what you will learn. The instructor, Byron Wallace, introduces the TAs (Max and Zohair), office hours, and where to find all materials (Piazza/Canvas). The course has weekly graded homework, a midterm, and a group final project with a proposal, report, and presentation. Lectures are mostly theory and recorded; hands-on coding happens in homework.
- •There are no strict prerequisites, but you should be comfortable with Python, and have basic linear algebra and calculus. Review resources are posted for anyone who needs a refresher. Collaboration on homework is allowed in fixed groups of up to three, but each student must submit their own write-up. Academic integrity expectations are stressed: learn by doing your own work and do not post solutions publicly.
- •The curriculum is organized into modules: supervised learning (starting with linear models), then non-linear models (decision trees, random forests, neural networks), then unsupervised learning (clustering and dimensionality reduction), and finally probabilistic machine learning (Bayesian inference, graphical models, and Markov chain Monte Carlo). Each module includes readings, lectures, and a matching homework. Supervised learning is introduced with the core idea of learning a mapping from inputs (features) to outputs (targets). Realistic examples include cat/dog image classification, spam detection, house price prediction, and medical risk prediction.
- •Two main supervised learning task types are explained: classification (predict categories like 'spam' or 'not spam') and regression (predict continuous values like house price). Linear models are presented as simple, interpretable baselines for both tasks. You learn that model parameters (coefficients) describe how each input feature contributes to the prediction. Training means choosing parameters that minimize the difference between predictions and actual targets.
- •Feature engineering is emphasized as a crucial step to improve model performance. Instead of using raw inputs like latitude and longitude, better features like 'distance to city center' or 'neighborhood average income' can be created. Good features help even simple models work well and can reduce the need for very complex models. The lecture provides concrete examples of useful engineered features.
Why This Lecture Matters
This lecture matters because it gives you a practical roadmap for learning machine learning the right way: start with solid foundations, use interpretable baselines, and build up to more complex ideas only when needed. For students, analysts, and aspiring data scientists, the structure—weekly practice, clear grading, collaborative but accountable homework, and a deep final project—ensures steady progress. In real work, you constantly face questions like “Is this problem classification or regression?” and “Are my features telling the right story?” The lecture’s emphasis on supervised learning, linear models, and feature engineering directly answers these, showing how to design clean, reliable solutions. Professionals in domains like healthcare, public policy, and business benefit from the focus on uncertainty and interpretability. Predicting heart attack risk or evaluating interventions requires both accurate models and honest confidence about predictions, which the course eventually addresses with probabilistic methods. The approach also strengthens your career development: employers value people who start with clear baselines, engineer meaningful features, evaluate properly, and communicate results lucidly. In today’s industry—where ML is everywhere from recommendation systems to risk assessment—knowing when to use a simple linear model and when to scale up to non-linear or probabilistic tools is a critical, differentiating skill. This lecture sets you up to make those calls confidently and responsibly.
Lecture Summary
Tap terms for definitions01Overview
This first lecture launches an Introduction to Machine Learning course and sets expectations for both the logistics and the learning journey. The instructor, Byron Wallace (Khoury College of Computer Sciences), begins by introducing himself and the teaching assistants, Max and Zohair, who will host office hours and help with assignments. Students are directed to Piazza (linked from Canvas) as the single source of truth for announcements, lecture notes, assignments, and recordings. The class will meet in person, and recordings will be posted shortly after each lecture. Graded components include weekly homework (50%), an in-person midterm (20%), and a final project (30%), which is done in the same groups as the homework and includes a mid-semester proposal, a final report, and a presentation.
There are no rigid prerequisites, but students are expected to have basic programming skills, especially in Python, and to be comfortable with linear algebra and calculus. Review materials are provided for those who feel rusty. Collaboration is allowed in groups of up to three for homework, but each student must write up and submit their own solutions, encouraging real learning rather than copying. Academic integrity is highlighted: you may look online for ideas, but you must implement your own solutions and never post course solutions publicly.
Pedagogically, the course is organized into modules, each focusing on a family of algorithms and paired with readings, lectures, and an assignment. The first module covers supervised learning, starting with linear models. Here, you learn the core supervised learning setup: input features (also known as predictors, independent variables, or covariates) and targets (also known as outcomes, dependent variables, or responses). The goal is to learn a function that maps inputs to outputs. Two key supervised problem types are defined: classification (predicting categories, like spam vs. not spam or animal types) and regression (predicting continuous values, like house price or tomorrow’s temperature).
Within this framework, linear models are positioned as simple, interpretable, and surprisingly powerful. A linear model assumes the target is a weighted sum of the input features plus an intercept. The house price example illustrates this with terms like square footage, location, and number of bedrooms. Learning means finding parameter values (weights/coefficients) that minimize the difference between predicted and actual targets on training data. Linear models can be used for both classification and regression and are foundational building blocks for more complex methods.
Feature engineering is introduced as a crucial practice for improving model performance. Instead of feeding raw inputs like latitude and longitude directly into a model, you might create more informative features such as distance to a city center or neighborhood average income. These engineered features can make patterns more obvious to the model, increasing accuracy and reducing the need for highly complex architectures. The importance of thoughtful feature creation and transformation is emphasized because it often has a larger impact on performance than changing algorithms.
After linear models, the course broadens to non-linear models such as decision trees, random forests, and neural networks. While these still operate within the supervised learning paradigm (mapping inputs to outputs using labeled data), they use non-linear functions, which allows them to represent more complex relationships in the data. Next, the course turns to unsupervised learning, which finds structure in unlabeled data through techniques like clustering (grouping similar points) and dimensionality reduction (compressing data for easier visualization and preprocessing). Finally, the course covers probabilistic machine learning, including Bayesian inference, graphical models, and Markov chain Monte Carlo (MCMC). These methods explicitly represent uncertainty and are especially helpful when data is scarce or when you want to incorporate prior knowledge.
Throughout, the instructor references real-world applications, particularly in medicine and public health, such as predicting heart attack risk or understanding the effects of clinical interventions. This grounds the theory in meaningful problems and motivates careful modeling choices. The lecture closes by previewing the next steps: delving into specific training algorithms for linear models in the following session. By the end of the lecture, students know how the class runs, what topics they will learn, and the key concepts of supervised learning, classification vs. regression, linear models, and feature engineering.
Key Takeaways
- ✓Always start by framing the problem: decide if it’s supervised, and whether it’s classification or regression. This determines the loss, metrics, and model types you should use. Clear framing prevents wasted effort on the wrong tools. Write the problem definition down before coding.
- ✓Use a linear model as a baseline for any new dataset. It’s fast, interpretable, and surprisingly strong with good features. If a complex model doesn’t beat the linear baseline, rethink your features and setup. Baselines keep you honest about progress.
- ✓Invest early in feature engineering. Transform raw inputs into meaningful signals like distance to city center or neighborhood stats. Better features usually help more than changing algorithms. Document how each feature is created.
- ✓Split your data properly and avoid leakage. Keep test data untouched until final evaluation. Fit preprocessors only on training data, then apply to validation/test. Leakage makes results look good in development and fail in production.
- ✓Pick metrics that match the task and data balance. Don’t rely on accuracy for imbalanced classes; use precision, recall, F1, and AUC. For regression, track RMSE/MAE and inspect residuals. The right metric changes the model you choose and tune.
- ✓Keep models as simple as possible for the problem. Start with linear/logistic regression, then add complexity only if needed. Simpler models are easier to debug and explain. Complexity without benefit is tech debt.
- ✓Use cross-validation when data is limited. It stabilizes performance estimates across splits. This helps with model selection and hyperparameter tuning. It reduces the chance of overfitting to a lucky split.
Glossary
Supervised learning
A way to teach a computer using examples that come with the correct answers. The computer sees inputs and the right outputs and learns how to connect them. Over time, it gets better at guessing the right output for new inputs. This is like studying from a workbook with answer keys. It is used for tasks like classifying emails as spam or predicting house prices.
Unsupervised learning
A way to find patterns in data that does not have labels. The computer tries to group similar things or simplify the data without being told the correct answers. This helps us explore and understand the data’s structure. It’s useful when labeling is expensive or impossible. It often prepares data for later supervised tasks.
Feature (predictor, covariate)
A piece of information about each example that helps make a prediction. Think of it as a characteristic or measurement, like age or house size. Features are the inputs the model uses to learn. Good features make learning easier and more accurate. Poor features can hide important patterns.
Target (label, outcome, response)
The value you want the model to predict. It could be a category (like dog or cat) or a number (like price). During training, the model sees the true target so it can learn. Later, it predicts targets for new cases. Clear targets are essential for correct learning.
