Commit 79c7328b authored by Sathvick Reddy N's avatar Sathvick Reddy N
Browse files


parent 1c9ca457
## Project title
sentiment analysis on twitter
creating a model for sentiment analysis i.e the process of identifying a piece of text weather the text is positive,negative or neutral!
## Motivation
The aim of the project was to collect tweets, predict their positive and negative sentiment and determine the best model for the tweets with unknown sentiment.
## Features
This will be helpful for end users to visualize the number of tweets on a particular hashtag & keyword and sentiment analysis will be done on those tweets to show the impact of tweets belonging to a particular hashtag
See what people are saying about the business’s brand on Twitter.
Do market research on how people feel about competitors, market trends, product offerings etc.
Analyze the impact of marketing campaigns on Twitter users.
## API Reference
using the twitter developer account we can get the API
## How to use?
While we haven’t built empathetic robots yet, we have begun using machine learning to identify human emotions expressed in social media data, a technology known as sentiment analysis.
Simply enter a keyword, and the Tweet Visualizer automatically pulls recent tweets (from the past week, though the time range is shorter for popular subjects).
You can then explore the many visualization options that the tool offers for tweets.
## Dataset Information
We use and compare various different methods for sentiment analysis on tweets (a binary classification problem). The training dataset is expected to be a csv file of type tweet_id,sentiment,tweet where the tweet_id is a unique integer identifying the tweet, sentiment is either 1 (positive) or 0 (negative), and tweet is the tweet enclosed in "". Similarly, the test dataset is a csv file of type tweet_id,tweet. Please note that csv headers are not expected and should be removed from the training and test datasets.
## library requirements
There are some general library requirements for the project and some which are specific to individual methods. The general requirements are as follows.
The library requirements specific to some methods are:
keras with TensorFlow backend for Logistic Regression, MLP, RNN (LSTM), and CNN.
## Preprocessing
Run <raw-csv-path> on both train and test data. This will generate a preprocessed version of the dataset.
Run <preprocessed-csv-path> where <preprocessed-csv-path> is the path of csv generated from This gives general statistical information about the dataset and will two pickle files which are the frequency distribution of unigrams and bigrams in the training dataset.
After the above steps, you should have four files in total: <preprocessed-train-csv>, <preprocessed-test-csv>, <freqdist>, and <freqdist-bi> which are preprocessed train dataset, preprocessed test dataset, frequency distribution of unigrams and frequency distribution of bigrams respectively.
For all the methods that follow, change the values of TRAIN_PROCESSED_FILE, TEST_PROCESSED_FILE, FREQ_DIST_FILE, and BI_FREQ_DIST_FILE to your own paths in the respective files. Wherever applicable, values of USE_BIGRAMS and FEAT_TYPE can be changed to obtain results using different types of features as described in report.
Run With TRAIN = True it will show the accuracy results on training dataset.
## Naive Bayes
Run With TRAIN = True it will show the accuracy results on 10% validation dataset.
## SVM
Run With TRAIN = True it will show the accuracy results on 10% validation dataset.
## Multi-Layer Perceptron
Run Will validate using 10% data and save the best model to best_mlp_model.h5.
## Information about other files
dataset/positive-words.txt: List of positive words.
dataset/negative-words.txt: List of negative words.
dataset/glove-seeds.txt: GloVe words vectors from StanfordNLP which match our dataset for seeding word embeddings.
Plots.ipynb: IPython notebook used to generate plots present in report.
## Contribute
To contribute into our project [contributing guideline](
Twitter is one of the most widely used social media platform to express what an individual feels about various things.Whenever we read a tweet on a particular issue we only understand the tip of it.
Most of the time when we search anything in twitter we get a series of tweets which mostly speak positive,negative or neutral about it.So it is difficult to understand the overall sentiment of the issue
we are creating a model to perform sentiment analysis on a particular word by reading all the tweets which contain that word. Sentiment analysis is the process of identifying whether a piece of text
is positive,negative or neutral!
This model will help you see whether most of the people are having a psoitive ,negative or neutral opinion by giving eye catching visualizations and also generates a word cloud which will be helpful in understanding
about the most frequently used words.We even provide the count of total positive tweets, total negative tweets and also total neutral tweets.
All that the user has to do is to enter the keyword he wants to search and he will be provided with visualisations and analysis on the sentiment of that keyword in twitter.
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment