I took Dr. Ernest Fokoue’s course Data Mining (STAT 747) in my Master study in RIT and gained tremendous fascinating modern Statistical Machine Learning technique skills. I want to share the marvelous essence of this course and my self-learning and self-reflection towards this course.
This course covers topics such as clustering, classification and regression trees, multiple linear regression under various conditions, logistic regression, PCA and kernel PCA, model-based clustering via mixture of gaussians, spectral clustering, text mining, neural networks, support vector machines, multidimensional scaling, variable selection, model selection, k-means clustering, k-nearest neighbors classifiers, statistical tools for modern machine learning and data mining, naïve Bayes classifiers, variance reduction methods (bagging) and ensemble methods for predictive optimality.
I will show the roadmap of this note in this post and follow the order. Basically, each post contains one essential data mining technique and later I will show some relative examples and exercises based on these methods.
- Supervised Learning
Classification
Regression - Unsupervised Learning
Clustering Analysis
Factor Analysis
Topic Modeling
Recommender System
Application in Statistical Machine Learning
- Handwritten Digit Recognition (MNIST)
- Text Mining
- Credit Scoring
- Disease Diagonostics
- Audio Processing
- Speaker Recognition & Speaker Identification
Computing Tools in R
1 | library(ctv) |
Important Aspects of Machine Learning
Machines Inherently designed to handle p larger than n problems
- Classification and Regression Trees
- Support Vector Machines
- Relevance Vector Machines (n < 500)
- Gaussian Process Learning Machines (n < 500)
- k-Nearest Neighbors Learning Machines (Watch for the curse of dimensionality)
- Kernel Machines in general
Machines can handle p larger than n problems if regularized with suitable constraints
- Multiple Linear Regression Models
- Generalized Linear Models
- Discriminant Analysis
Ensemble Learning Machines
- Random Subspace Learning Ensembles (Random Forest)
- Boosting and its extensions
Note: Red parts in this Note Series remain questionable and I will update and add explanations for those parts as soon as I figure them out.