Movie Review Sentiment Using Machine Learning.

The project titled ‘Movie Review Sentiment’ is focused on predicting whether a movie review is a positive or a negative one. The task of sentiment classification requires predicting whether some text has a positive or a negative sentiment.

The dataset contains positive and negative movie reviews. The reviews are not stored as raw text but have been stored as the feature representations. The dataset is split into three splits namely: train, test and eval. The raw text has been pre-processed by using tfidf, spaCy (spacy-embeddings) and RoBERTa (Roberta). The tfidf representation of text is short for ‘term frequency–inverse document frequency’, and is a popular document representation that seeks to reflect how important is a word to a document in a corpus. The spacy-embeddings contain dense 300-dimensional vectors where each document is represented by the average of its word embeddings. The roberta embeddings are obtained using a transformer-based model called RoBERTa.

For the project, I tried different combinations of the three datasets provided by using them to train and evaluate different models like average perceptron, SVM, logistic regression, boosted perceptron and ensemble.