-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathreadme
More file actions
31 lines (29 loc) · 1.42 KB
/
readme
File metadata and controls
31 lines (29 loc) · 1.42 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
End-to-end ML Modeling process for classification problem
1. Data reading and preprocessing: missing data, visualization, scaling
2. Algorithm selection
3. Model Evaluation: train-test split, k-fold cross validation, metrics (accuracy,recall,precision,f1-score,confusion matrix)
4. Hyperparameter tuning: grid search
5. Final model Saving into disk and loading
Algorithms
1. Logistic Regression
2. Lasso
3. Ridge
4. Elastic Net
5. Linear Discriminant Analysis (LDA)
6. K-Nearest Neighbors (KNN)
7. Naive Bayes (NB)
8. Support Vector Machine (SVM)
9. Decision Tree (Classification & Regression Trees; CART)
10. C5.0
11. Bagged CART
12. Random Forest
13. Stochastic Gradient Boosting
14. Learning Vector Quantization (LVQ)
Examples
1. UCI pima indians diabetes data (binary class) https://archive.ics.uci.edu/ml/datasets/pima+indians+diabetes
2. UCI breast cancer data (binary class) https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
3. Kaggle Titanic data (binary class) https://www.kaggle.com/c/titanic
4. UCI iris data (multi-class) https://archive.ics.uci.edu/ml/datasets/iris
5. UCI wine data (multi-class) https://archive.ics.uci.edu/ml/datasets/wine
6. CrowdFlower Twitter Global Warming Sentiment data (binary class) https://www.crowdflower.com/data-for-everyone/
7. CrowdFlower Twitter Airline Sentiment data (multi-class) https://www.crowdflower.com/data-for-everyone/