-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathreadme
More file actions
10 lines (9 loc) · 706 Bytes
/
readme
File metadata and controls
10 lines (9 loc) · 706 Bytes
1
2
3
4
5
6
7
8
9
10
End-to-end Natural Language Processing (NLP)
1. text cleaning: removing puntuations, numbers, stopwords, HTML tags and URLs, stemming
2. text tokenizing and creating a bag-of-words model
Examples:
1. UCI Spam Collection data https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection
2. UCI Yelp Restaurant Review data https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences#
3. UCI Amazon Product Review data https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences#
4. CrowdFlower Twitter Global Warming Sentiment data https://www.crowdflower.com/data-for-everyone/
5. CrowdFlower Twitter Airline Sentiment data https://www.crowdflower.com/data-for-everyone/