Machine-Deep-Learning-in-Python-and-R/Machine-Learning-in-Python/Natural-Language-Processing at master · dundee2002/Machine-Deep-Learning-in-Python-and-R

History

readme

End-to-end Natural Language Processing (NLP)
  1. text cleaning: removing puntuations, numbers, stopwords, HTML tags and URLs, stemming
  2. text tokenizing and creating a bag-of-words model
  3. word scoring: binary, count, frequency, Term frequency–Inverse document frequency (TF-IDF)

Examples:
  1. UCI Spam Collection data  https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection
  2. UCI Yelp Restaurant Review data  https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences#
  3. UCI Amazon Product Review data  https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences#
  4. Kaggle IMDB Sentiment data  https://www.kaggle.com/c/word2vec-nlp-tutorial
  5. Kaggle Yelp Business Rating data  https://www.kaggle.com/c/yelp-recsys-2013
  6. Kaggle Toxic Comment Classification Challenge https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge
  7. CrowdFlower Twitter Airline Sentiment data  https://www.crowdflower.com/data-for-everyone/
  8. CrowdFlower Twitter Global Warming Sentiment data  https://www.crowdflower.com/data-for-everyone/
  9. CrowdFlower Corporate Messaging data  https://www.crowdflower.com/data-for-everyone/
  10. CrowdFlower Coachella 2015 Twitter sentiment data https://www.crowdflower.com/data-for-everyone/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme

Name		Name	Last commit message	Last commit date
parent directory ..
1_text_cleaning.ipynb		1_text_cleaning.ipynb
2_text_tokenizing_creating_bow_model.ipynb		2_text_tokenizing_creating_bow_model.ipynb
3_word_scoring.ipynb		3_word_scoring.ipynb
NLP_BinClass_sklearn_kaggle_toxic_comments_auc882.ipynb		NLP_BinClass_sklearn_kaggle_toxic_comments_auc882.ipynb
NLP_sklearn_crowdflower_airline_twitter_sentiment_a820.ipynb		NLP_sklearn_crowdflower_airline_twitter_sentiment_a820.ipynb
NLP_sklearn_crowdflower_coachella_sentiment_a717.ipynb		NLP_sklearn_crowdflower_coachella_sentiment_a717.ipynb
NLP_sklearn_crowdflower_corporate_messaging_a912.ipynb		NLP_sklearn_crowdflower_corporate_messaging_a912.ipynb
NLP_sklearn_crowdflower_twitter_global_warming_sentiment_a850.ipynb		NLP_sklearn_crowdflower_twitter_global_warming_sentiment_a850.ipynb
NLP_sklearn_kaggle_IMDB sentiment analysis.ipynb		NLP_sklearn_kaggle_IMDB sentiment analysis.ipynb
NLP_sklearn_kaggle_yelp_business_rating.ipynb		NLP_sklearn_kaggle_yelp_business_rating.ipynb
NLP_sklearn_uci_amazon_product_reviews.ipynb		NLP_sklearn_uci_amazon_product_reviews.ipynb
NLP_sklearn_uci_spam_collection_data_wordcloud_a970.ipynb		NLP_sklearn_uci_spam_collection_data_wordcloud_a970.ipynb
NLP_sklearn_yelp_restaurant_reviews.ipynb		NLP_sklearn_yelp_restaurant_reviews.ipynb
readme		readme

FilesExpand file tree

Natural-Language-Processing

Directory actions

More options

Directory actions

More options

Latest commit

History

Natural-Language-Processing

Folders and files

parent directory

readme