Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
End-to-end ML Modeling process for classification problem
  1. Data reading and preprocessing: missing data, visualization, scaling, text preprocessing
  2. Algorithm selection
  3. Model Evaluation: train-test split, k-fold cross validation, stratified cv, metrics (accuracy,recall,precision,f1-score,confusion matrix)
  4. Hyperparameter tuning: grid search
  5. Final model Saving into disk and loading

Algorithms
  1. Logistic Regression
  2. Linear Discriminant Analysis (LDA)
  3. K-Nearest Neighbors (KNN)
  4. Naive Bayes (NB)
  5. Decision Tree
  6. Support Vector Machine (SVM)
  7. Random Forest
  8. Bagged Decision Trees
  9. Extra Trees
  10. AdaBoost
  11. Gradient Boosting		
  12. XGBoost
  13. Neural Network
  14. Voting Ensemble
 
Examples
  1. ISLR college data (binary class) https://www.kaggle.com/ishaanv/ISLR-Auto/data
  2. UCI pima indians diabetes data (binary class) https://archive.ics.uci.edu/ml/datasets/pima+indians+diabetes
  3. UCI breast cancer data (binary class) https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
  4. UCI iris data (multi-class) https://archive.ics.uci.edu/ml/datasets/iris
  5. UCI wine data (multi-class) https://archive.ics.uci.edu/ml/datasets/wine
  6. Kaggle HR data (binary class) https://www.kaggle.com/giripujar/hr-analytics
  7. Kaggle titanic data (binary class) https://www.kaggle.com/c/titanic
  8. Kaggle otto data (multi-class)  https://www.kaggle.com/c/otto-group-product-classification-challenge
  9. Kaggle Toxic Comment Classification Challenge (binary class) https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge
  10. CrowdFlower Twitter Global Warming Sentiment data (binary class) https://www.crowdflower.com/data-for-everyone/
  11. CrowdFlower Twitter Airline Sentiment data (multi-class) https://www.crowdflower.com/data-for-everyone/
  12. CrowdFlower Corporate Messaging data (multi-class) https://www.crowdflower.com/data-for-everyone/
  13. CrowdFlower Coachella 2015 Twitter sentiment data (multi-class) https://www.crowdflower.com/data-for-everyone/