As a data scientist, JUMO’s Pamela Afful often reads interesting articles about the latest techniques in machine learning. She recently used natural language processing (NLP) to put together some machine learning models that could predict fake news on Twitter, and shared the fun experience and process, step by step.
‘The project itself was not new. There have been many machine learning analyses run on Twitter data, but I wanted to take it a step further by looking at model interpretability i.e. looking at why a model predicts the way it does. I did this using Local Interpretable Model-Agnostic Explanations or LIME, which analyses the keywords models use to predict outcomes.
The data I used was gathered from Twitter around the time of the 2016 US General Election. As such, most of the tweet data was political, however, the analysis itself was not political. The analysis did not aim to make any sweeping political conclusions or generalisations, so nothing of the sort should be inferred from it. It was simply a fun exercise in machine learning.
The data contained three main columns:
1. Title- the title of the news tweet.
2. Text- the actual text of the news tweet.
3. Label- the label of the news tweet, either ‘fake’ or ‘real’.
For this analysis, I focussed on the actual text of the tweet. For example:
Before taking a closer look at the data, I applied a pre-processing function to clean it which did the following:
- Converted all text to lowercase.
- Removed all URL links, non alphanumeric text, and Twitter handles.
- Tokenised the text based on each word (or the white space between each word) and put these in a word list.
- Removed all stop words from the word list. Stop words are common words that appear too frequently in the text. Examples of these include “and”, “so”, “the” etc. They are necessary for sentence construction (they form the parts of speech of the text) but don’t provide any additional meaning or value to the text. This proves quite important for data storage, model training, and model fine-tuning.
- Converted the “purified” word list into a data column that was appended to the original data frame as “cleaned_text”.
Data frame with cleaned text
It’s always good to derive some early insights from data before modeling it. Since this is text data, most of the analysis was around word frequency and sentiment analysis.
For sentiment analysis, I took a look at the average polarity and subjectivity of each label in the data. Polarity defines the orientation of the expressed sentiment, that is, it measures if the text expresses a positive, negative or neutral sentiment.
To measure the polarity and subjectivity of the text I defined two functions and applied them throughout the entire data frame.
The average polarity of the tweet data was slightly positive (although closer to neutral). This makes sense given the context of the data itself. We do expect news text to be fairly neutral in its delivery, therefore the average subjectivity of the tweets were on the lower end of the spectrum, indicating that news tweets are closer to being “factual”.
I also visualised the polarity and subjectivity spread of the data using histograms. If you’re interested in a deeper understanding of this you can view the graphs on my Medium post.
Fun fact, initial insights revealed the words ‘said’, ‘Trump’, and ‘Clinton’ were most frequently used across the entire tweet data.
The data modeling method
To model the data, I first trained various models or classifiers to see which would be the fastest and yield the best metrics. The classifiers I used were Passive-Aggressive, Logistic Regression, Random Forest, SVC, and Multinomial Naive Bayes.
The Passive-Aggressive classifier yielded the best accuracy and scores followed by SVC (although it took the longest to train). On the other hand, the Naive Bayes classifier took the shortest time to train.
As such, the Passive-Aggressive classifier was the best performing model so I continued with it. (For the more mathematically inclined, you can read how the model works here). I wanted to take the analysis further to see why this was the case. Looking at the number of non-zero weighted features used by the model gives a high-level view.
To take a look at why certain features were weighted the way they were, I used Local Interpretable Model-Agnostic Explanations or LIME. LIME is an interpretability surrogate model which can be used on any black-box model (hence the word model-agnostic) and provides interpretability for a single observation or prediction (hence the word local).
LIME requires a model that produces probability scores for each prediction. The Passive-Aggressive classifier doesn’t have a probability method but it does have a decision function method that provides confidence values for the different predictions.
I created a modified Passive Aggressive Classifier class that inherited from the Passive-Aggressive Classifier, but also included a predict_proba method. The method simply wrapped the values from the decision function in a sigmoid function outputting the probabilities required.
LIME could then be applied to the model. LIME took in a pipeline as an input. I first fit the pipeline into the training data and saved this to the variable “model”. Since LIME only provides local interpretability I applied LIME to a list of random indices from the x_test vector.
LIME Text Visualisations
For the above indices, the visualisations showed the following with regards to real versus fake:
- In the first and third examples, the word “said” (which remember was the topmost frequent word for the Real news label in our initial findings), was also heavily weighted by the model, contributing to the overall prediction (or probability) that the tweet was ‘real’ news.
- Interestingly in the second and third examples, the year 2016 was weighted in favour of the ‘fake’ news label prediction. On the other hand, the year 2012 was weighted in favour of the ‘real’ news label in the fourth example.
- The different shadings of the highlighted words also gave an increased understanding of how the model weighted the keywords relative to each other for a specific example. A darker shade indicated the word was weighted heavier in favour of a class label within that text. So although 2016 was a much darker shade in the second example, its shading was much lighter in the third example relative to the weights of the other keywords.
- In the fourth example, we had an instance where the model incorrectly classified a text as ‘fake’ when it was ‘real’. This was mostly due to the high weighting of the word ‘debt’ within the text in favour of the ‘fake’ news label.
- The results from the fifth example were also quite surprising. The word “cancer” was weighted in favour of the ‘real’ news label but the word “chemotherapy” was weighted in favour of the ‘fake’ news label! I had expected these words to move together in favour of a particular label, since they are related but that wasn’t the case for the Passive-Aggressive Classifier. For the more nerdy readers out there, analysing the cosine similarity of the tweets that contain some of the keywords could shed some light on why this was the case, but I’ll leave that rabbit hole to you the reader to explore!
More insights can be derived from the data, using different indices from the ones I used. The code and accompanying files can be found in my Github Repo here. I hope you enjoy it as much as I did.
Model interpretability has become increasingly important in machine learning, particularly within NLP. Being able to apply data science for interesting uses like this is what makes this growing scientific field incredibly exciting and beneficial.’