Natural-Language-Processing
Directory actions
More options
Directory actions
More options
Natural-Language-Processing
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
parent directory.. | ||||
End-to-end Natural Language Processing (NLP) 1. text cleaning: removing puntuations, numbers, stopwords, HTML tags and URLs, stemming 2. text tokenizing and creating a bag-of-words model 3. word scoring: binary, count, frequency, Term frequency–Inverse document frequency (TF-IDF) Examples: 1. UCI Spam Collection data https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection 2. UCI Yelp Restaurant Review data https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences# 3. UCI Amazon Product Review data https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences# 4. Kaggle IMDB Sentiment data https://www.kaggle.com/c/word2vec-nlp-tutorial 5. Kaggle Yelp Business Rating data https://www.kaggle.com/c/yelp-recsys-2013 6. Kaggle Toxic Comment Classification Challenge https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge 7. CrowdFlower Twitter Airline Sentiment data https://www.crowdflower.com/data-for-everyone/ 8. CrowdFlower Twitter Global Warming Sentiment data https://www.crowdflower.com/data-for-everyone/ 9. CrowdFlower Corporate Messaging data https://www.crowdflower.com/data-for-everyone/ 10. CrowdFlower Coachella 2015 Twitter sentiment data https://www.crowdflower.com/data-for-everyone/