I took Dr. Ernest Fokoue’s course Data Mining (STAT 747) in my Master study in RIT and gained tremendous fascinating modern Statistical Machine Learning technique skills. I want to share the marvelous essence of this course and my self-learning and self-reflection towards this course.

This course covers topics such as clustering, classification and regression trees, multiple linear regression under various conditions, logistic regression, PCA and kernel PCA, model-based clustering via mixture of gaussians, spectral clustering, text mining, neural networks, support vector machines, multidimensional scaling, variable selection, model selection, k-means clustering, k-nearest neighbors classifiers, statistical tools for modern machine learning and data mining, naïve Bayes classifiers, variance reduction methods (bagging) and ensemble methods for predictive optimality.

I will show the roadmap of this note in this post and follow the order. Basically, each post contains one essential data mining technique and later I will show some relative examples and exercises based on these methods.

- Supervised Learning

Classification

Regression - Unsupervised Learning

Clustering Analysis

Factor Analysis

Topic Modeling

Recommender System

### Application in Statistical Machine Learning

- Handwritten Digit Recognition (MNIST)
- Text Mining
- Credit Scoring
- Disease Diagonostics
- Audio Processing
- Speaker Recognition & Speaker Identification

### Computing Tools in R

1 | library(ctv) |

# Important Aspects of Machine Learning

##### Machines Inherently designed to handle p larger than n problems

- Classification and Regression Trees
- Support Vector Machines
- Relevance Vector Machines (n < 500)
- Gaussian Process Learning Machines (n < 500)
- k-Nearest Neighbors Learning Machines (Watch for the curse of dimensionality)
- Kernel Machines in general
##### Machines can handle p larger than n problems if regularized with suitable constraints

- Multiple Linear Regression Models
- Generalized Linear Models
- Discriminant Analysis
##### Ensemble Learning Machines

- Random Subspace Learning Ensembles (Random Forest)
- Boosting and its extensions

**Note:** *Red parts in this Note Series remain questionable and I will update and add explanations for those parts as soon as I figure them out.*