Mathematical Foundations for Data Analysis

📖 Overview

I read this book after learning the basics of machine learning. It really strengthened my understanding of the mathematical concepts that underpin data analysis techniques. The book covers a wide range of topics, including probability, linear algebra, distance metrics, regression, clustering, and classification. Each chapter builds on the previous ones, providing a comprehensive foundation for anyone interested in data science. And this book solves many questions that I haven’t figure out before like the meaning of Support Vector Machine, Principal Component Analysis, and Singular Value Decomposition. I highly recommend this book to anyone looking to deepen their understanding of the mathematical principles behind data analysis or machine learning.

📅 Learning Journey

Chapter 1 Probability Review

2025.10.22 1.1 Sample Space
2025.10.22 1.2 Conditional Probability and Independence
2025.10.25 1.3 Density Functions
2025.10.25 1.4 Expected Value
2025.10.25 1.5 Variance
2025.10.25 1.6 Joint, Marginal, and Conditional Distributions
2025.10.25 1.7 Bayes’ Rule
2025.10.25 1.8 Bayesian Inference

Chapter 2 Convergence and Sampling

2025.10.28 2.1 Sampling and Estimation
2025.10.28 2.2 Probably Approximately Correct (PAC)
2025.10.28 2.3 Concentration of Measure
2025.10.28 2.4 Importance Sampling

Chapter 3 Linear Algebra Review

2025.10.31 3.1 Vectors and Matrices
2025.10.31 3.2 Addition and Multiplication
2025.10.31 3.3 Norms
2025.10.31 3.4 Linear Independence
2025.10.31 3.5 Rank
2025.10.31 3.6 Square Matrices and Properties
2025.10.31 3.7 Orthogonality

Chapter 4 Distance and Nearest Neighbors

2025.11.6 4.1 Metrics
2025.11.6 4.2 Lp Distances and their Relatives
2025.11.6 4.3 Distances for Sets and Strings
2025.11.6 4.4 Modeling Text with Distances
2025.11.7 4.5 Similarities
2025.11.7 4.6 Locality Sensitive Hashing

Chapter 5 Linear Regression

2025.11.19 5.1 Simple Linear Regression
2025.11.19 5.2 Linear Regression with Multiple Explanatory Variables
2025.11.19 5.3 Polynomial Regression
2025.11.19 5.4 Cross-Validation
2025.11.20 5.5 Regularized Regression

Chapter 6 Gradient Descent

2025.11.24 6.1 Functions
2025.11.24 6.2 Gradients
2025.11.24 6.3 Gradient Descent
2025.11.24 6.4 Fitting a Model to Data

Chapter 7 Dimensionality Reduction

2025.12.1 7.1 Data Matrices
2025.12.1 7.2 Singular Value Decomposition
2025.12.1 7.3 Eigenvalues and Eigenvectors
2025.12.1 7.4 The Power Method
2025.12.1 7.5 Principal Component Analysis
2025.12.9 7.6 Multidimensional Scaling
2025.12.9 7.7 Linear Discriminant Analysis
2025.12.9 7.8 Distance Metric Learning
2025.12.9 7.9 Matrix Completion
2025.12.9 7.10 Random Projections

Chapter 8 Clustering

2025.12.23 8.1 Voronoi Diagrams
2025.12.23 8.2 Gonzalez’s Algorithm for k-Center Clustering
2025.12.23 8.3 Lloyd’s Algorithm for k-Means Clustering
2025.12.23 8.4 Mixture of Gaussians
2025.12.23 8.5 Hierarchical Clustering
2025.12.23 8.6 Density-Based Clustering and Outliers
2025.12.23 8.7 Mean Shift Clustering

Chapter 9 Classification

2026.1.17 9.1 Linear Classifiers
2026.1.17 9.2 Perceptron Algorithm
2026.1.17 9.3 Support Vector Machines and Kernels
2026.1.17 9.4 Learnability and VC dimension
2026.1.17 9.5 kNN Classifiers
2026.1.17 9.6 Decision Trees
2026.1.17 9.7 Neural Networks

Chapter 10 Graph Structured Data

2026.1.18 10.1 Markov Chains
2026.1.18 10.2 PageRank
2026.1.18 10.3 Spectral Clustering on Graphs
2026.1.18 10.4 Communities in Graphs

Chapter 11 Big Data and Sketching

2026.1.19 11.1 The Streaming Model
2026.1.19 11.2 Frequent Items
2026.1.19 11.3 Matrix Sketching

Cyan Chi

📖 Overview

📅 Learning Journey