PSTAT100 Data Science and Analysis
Lecture Notes
Introduction
These are my online lecture notes for the PSTAT100 Data Science and Analysis course taught at UC Santa Barbara. In these lecture notes we study fundamental topics in data science and the tools we use for data retrieval, analysis, visualization, and reproducible research in preparation for advanced data science courses.
Throughout these notes we will conduct our data analysis using the programming language python. It is assumed throughout this course that you have had some experience working in python or a similar programming language such as R. You may use the Integrated Development Environment (IDE) of your choice but my recommendation would be to use VSCode or some branch of the repository such as Positron or Cursor. If you are unfamiliar with python and require some help with setup I have included some guidance in the preliminary materials.
Contents
- Preliminary Material
- Getting Started with Python
- Linear Algebra
- Probability Theory Fundamentals
- Introduction to Data Science
- Data Lifecycle
- Data Science Terminology
- Variable Classification
- Data Tidying
- Data Manipulation
- Data Cleaning
- Databases
- Variable Classification
- Exploratory Data Analysis
- Summary Statistics
- Data Visualizations
- Experiment Design
- Study Design and Experimental Techniques
- Statistics
- Inferential Statistics
- Estimators and Bias
- Sampling Distributions
- Confidence Intervals and Hypothesis Testing
- Statistical Modelling
- Simple Linear Regression
- Generalized Linear Regression
- Principle Components Analysis
- Logistic Regression and Classification
- Clustering