Skip to content

Latest commit

 

History

History
End-to-end ML Modeling process for classification problem
  1. Data reading and preprocessing: missing data, visualization, scaling
  2. Algorithm selection
  3. Model Evaluation: train-test split, k-fold cross validation, metrics (accuracy,recall,precision,f1-score,confusion matrix)
  4. Hyperparameter tuning: grid search
  5. Final model Saving into disk and loading

Algorithms
  1. Logistic Regression
  2. Lasso
  3. Ridge
  4. Elastic Net
  5. Linear Discriminant Analysis (LDA)
  6. K-Nearest Neighbors (KNN)
  7. Naive Bayes (NB)
  8. Support Vector Machine (SVM)
  9. Decision Tree (Classification & Regression Trees; CART)
  10. C5.0
  11. Bagged CART
  12. Random Forest
  13. Stochastic Gradient Boosting
  14. Learning Vector Quantization (LVQ)
 
Examples
  1. UCI pima indians diabetes data (binary class) https://archive.ics.uci.edu/ml/datasets/pima+indians+diabetes
  2. UCI breast cancer data (binary class) https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
  3. Kaggle Titanic data (binary class) https://www.kaggle.com/c/titanic
  4. UCI iris data (multi-class) https://archive.ics.uci.edu/ml/datasets/iris
  5. UCI wine data (multi-class) https://archive.ics.uci.edu/ml/datasets/wine
  6. CrowdFlower Twitter Global Warming Sentiment data (binary class) https://www.crowdflower.com/data-for-everyone/
  7. CrowdFlower Twitter Airline Sentiment data (multi-class) https://www.crowdflower.com/data-for-everyone/