Publish:

Tags:

Categories:

Sentiment Analysis of Business Reviews using Natural Language Processing

A comprehensive walkthrough of a machine learning project analyzing the sentiment of business reviews

Introduction

Sentiment analysis is the process of determining the sentiment or emotion expressed in a piece of text, such as a review or a tweet. This can be particularly useful for businesses to gauge customer opinions and feedback, or for researchers analyzing public sentiment on various topics.

Problem Statement

The goal of this project is to develop a machine learning model that can accurately rate/classify business reviews from star ratings from 1 to 5 based on the sentiment expressed in the review text.

Dataset

The dataset used for this project was not revealed but assumes to be scrapped from yelp, which consists of 18000 business reviews labeled (train set) and 4000 business reviews unlabeled (test set). The dataset is imbalanced details can be found in below sections. You can find the dataset here.

![Dataset Visualization]

Preprocessing

Before training the model, the dataset was preprocessed using the following steps:

  1. Lowercasing: The review texts were converted to lowercase to ensure uniformity.
  2. Punctuation removal: Punctuation marks were removed as they do not contribute much to the sentiment.
  3. Tokenization: The review texts were tokenized into individual words.
  4. Stopword removal: Common stopwords like “a”, “an”, and “the” were removed as they do not contribute much to the sentiment.
  5. Lemmatization: Words were lemmatized to their base forms, which helps reduce dimensionality and improve model performance.
  6. Truncation and padding: The tokenized sequences were truncated or padded to a fixed length for input to the BERT model.

Preprocessing Visualization

Feature Extraction

After preprocessing the text data, two methods were used to convert the text into numerical features:

  1. Bag-of-Words (BoW): BoW is a simple method of converting text into numerical features by counting the number of times each word appears in the text. This was done using the CountVectorizer and TfidfVectorizer from the scikit-learn library. Unlike CountVectorize, TF-IDF helps to quantify the importance of words in the review texts and reduce the noise from less significant words.

  2. word2vec: word2vec is a method of converting text into numerical features by representing each word as a vector of real numbers. This was done using the gensim library. The pretrained Google News word2vec model was used to convert the review texts into vectors. Advantages of using word2vec include: (1) it can capture the semantic meaning of words, (2) it can capture the context of words, and (3) it can capture the relationship between words.

Model Selection and Training

Experimented with various machine learning models, including Logistic Regression, Naive Bayes, and Support Vector Machines (SVM), as well as deep learning models like Long Short-Term Memory (LSTM) and Bidirectional LSTM. To leverage the power of transfer learning, I also explored the pretrained BERT model, which has shown state-of-the-art performance on several NLP tasks.

The final model architecture included the pretrained BERT model followed by a Bidirectional GRU (BiGRU) layer and a Dense output layer. The specific configuration of the model is as follows:

1
2
3
4
5
6
7
8
9
10
self.bert = BertModel.from_pretrained('bert-base-uncased')
self.gru = nn.GRU(input_size=self.bert.config.hidden_size,
                  hidden_size=256,
                  num_layers=2,
                  batch_first=True,
                  bidirectional=True,
                  dropout=0.2,
                  )
self.out = nn.Linear(512, n_classes)

Challenges and Struggles

  1. Imbalanced dataset: The dataset was imbalanced, with more data available for ratings 1 and 5 compared to ratings 2, 3, and 4. This imbalance made it easier for the model to predict ratings 1 and 5, resulting in significantly higher F1 scores for these ratings compared to ratings 2, 3, and 4. I tried multiple approaches to address this issue, such as:

    a. Oversampling: Experimented with Random Oversampling to generate synthetic data points for the underrepresented classes, but did not lead to significant improvements in model performance.

    b. Class weighting: Experimented assigning different weights to the classes during model training, giving higher importance to the underrepresented classes. However, did not result in substantial improvements either.

  2. Computational complexity: Training deep learning models, especially on large datasets, can be computationally intensive and time-consuming. Having even BiGRU layer RAM usage of 12GB, I was unable to train the model on my local machine. To address this, I leveraged Google Colab’s paid GPU and RAM resources to speed up the training process.

  3. Model selection: Selecting the most appropriate model was a challenge, as different models had their own strengths and weaknesses.

Results

The BERT model with the BiGRU layer achieved an F1 score of 0.60 on the validation set, outperforming all the other models even the strong baseline model, F1 score of 0.54, indicating a strong performance in classifying movie reviews based on their sentiment. Some key insights from the project include:

  1. The model was able to capture complex language patterns and nuances in the review text, thanks to the pretrained BERT model.
  2. The inclusion of a BiGRU layer allowed the model to learn from both past and future context, resulting in more accurate sentiment classification.

Conclusion

This project demonstrates the effectiveness of using NLP and transfer learning techniques, such as the pretrained BERT model with a BiGRU layer, to perform sentiment analysis on movie reviews. The model provided a robust solution, achieving a high F1 score and showcasing the potential of deep learning in NLP tasks. Future work could include exploring other pretrained models or oversampling techniques to further improve the model performance.

Project Repository

You can find the complete source code, along with detailed documentation and instructions on how to run the project, in my GitHub repository here. Feel free to explore, provide feedback, or contribute to the project!

Written Report

For a more in-depth analysis of the project, you can read my full written report here. The report includes a detailed explanation of the project, the dataset, the preprocessing steps, the model architecture, the results, and the conclusions.


Leave a comment