-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathreadme
More file actions
37 lines (35 loc) · 2.03 KB
/
readme
File metadata and controls
37 lines (35 loc) · 2.03 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
End-to-end ML Modeling process for classification problem
1. Data reading and preprocessing: missing data, visualization, scaling, text preprocessing
2. Algorithm selection
3. Model Evaluation: train-test split, k-fold cross validation, stratified cv, metrics (accuracy,recall,precision,f1-score,confusion matrix)
4. Hyperparameter tuning: grid search
5. Final model Saving into disk and loading
Algorithms
1. Logistic Regression
2. Linear Discriminant Analysis (LDA)
3. K-Nearest Neighbors (KNN)
4. Naive Bayes (NB)
5. Decision Tree
6. Support Vector Machine (SVM)
7. Random Forest
8. Bagged Decision Trees
9. Extra Trees
10. AdaBoost
11. Gradient Boosting
12. XGBoost
13. Neural Network
14. Voting Ensemble
Examples
1. ISLR college data (binary class) https://www.kaggle.com/ishaanv/ISLR-Auto/data
2. UCI pima indians diabetes data (binary class) https://archive.ics.uci.edu/ml/datasets/pima+indians+diabetes
3. UCI breast cancer data (binary class) https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
4. UCI iris data (multi-class) https://archive.ics.uci.edu/ml/datasets/iris
5. UCI wine data (multi-class) https://archive.ics.uci.edu/ml/datasets/wine
6. Kaggle HR data (binary class) https://www.kaggle.com/giripujar/hr-analytics
7. Kaggle titanic data (binary class) https://www.kaggle.com/c/titanic
8. Kaggle otto data (multi-class) https://www.kaggle.com/c/otto-group-product-classification-challenge
9. Kaggle Toxic Comment Classification Challenge (binary class) https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge
10. CrowdFlower Twitter Global Warming Sentiment data (binary class) https://www.crowdflower.com/data-for-everyone/
11. CrowdFlower Twitter Airline Sentiment data (multi-class) https://www.crowdflower.com/data-for-everyone/
12. CrowdFlower Corporate Messaging data (multi-class) https://www.crowdflower.com/data-for-everyone/
13. CrowdFlower Coachella 2015 Twitter sentiment data (multi-class) https://www.crowdflower.com/data-for-everyone/