9 December, 2016


AI Mining have achieved state of the art results in sentiment analysis! We call it Sentwatch and it’s available through our API.

Model IMDB Yelp Amazon Yelp 5⭐ Amazon 5⭐
char-CNN (Zhang and LeCun, 2015) 94.7 94.5 62.0 59.5
VDCNN (Conneau et al, 2016) 95.7 95.7 64.7 63.0
fastText (Joulin et al, 2016) 88.3 95.7 94.6 63.9 60.2
Sentwatch 91.6 96.6 96.0 66.0 63.3

What does this table actually mean? I’ll need to explain a few other concepts first.

What is Sentiment Analysis?
First of all, sentiment analysis (SA for short) is a computer program which judges whether reviews like “These guys are the best plumbers in the world!” or “This film stinks.” are positive or negative. If you are a big company trying to figure out what people are saying about you on Twitter, without actually reading all of those tweets, automated SA is very useful.

What is Machine Learning?
Secondly, there are two main approaches to sentiment analysis: lexical analysis (counting up the nice words and the nasty words) and machine learning (the latest craze which we use).

Machine learning (ML for short) uses an algorithm to analyse data and create a model. The algorithm is like the recipe, the data is the ingredients, and the model is the cake. In cooking, we bake a cake in the oven. In ML, we train a model on a really really fast computer. The trained model is ultimately a giant mathematical equation which knows how to find patterns in text. In our case, it can turn a review like “Wow this is awesome!” into a sentiment like 99% positive. There’s not really a baking analogy for that unfortunately.

State of the Art
Once it has been trained/baked, the model/cake is then tested. The table above essentially says that the fastText recipe, using the Amazon ingredients, produced a cake which tasted good to 95.7% of Amazon employees. Similarly our very own AI Mining recipe called Sentwatch, baked using Yelp ingredients, satisfied 96.67% of Yelp employees.

The term “state of the art” is used in academic literature to describe the latest development of a particular technology. Our 96.67% puts us right up there with Facebook’s new fastText algorithm, Facebook’s Very Deep Convolutional Networks for Natural Language Processing and Zhang and Yann LeCun’s char-CNN. The char-CNN, VDCNN and fastText scores above come from the fastText paper, except for the fastText IMDB result which we measured ourselves. In all cases we tested fastText both with and without bigrams and used the better score.

In the table above, the IMDB, Yelp and Amazon data sets just test positive or negative. The Yelp 5⭐ and Amazon 5⭐ data sets include a rating of 1 to 5 stars. Guessing the correct number of stars is much harder because there are 5 categories instead of just 2. Therefore the percentages are lower.

In the table above, Yelp ingredients were only tested by Yelp employees, and Amazon ingredients by Amazon employees. This is common practice in academic papers, because it is the algorithm/recipe which is being tested, and there needs to be guidelines for comparison. Specifically, the algorithms above were trained on 560,000 Yelp reviews and then tested on 38,000 Yelp reviews. Similarly, separate cakes were baked using 3.6 million Amazon reviews and tested on 400,000 Amazon reviews.

But we want our algorithm to generalise. We want to produce a recipe that is so easy to follow that a cake baked with Yelp ingredients will appeal to Amazon testers too. In machine learning, this is known as domain adaptation.

The table below shows what happens when we train on one data set like Yelp and test on another like Amazon. We also introduce a third database, the Internet Movie Database, which contains 25,000 reviews for training and another 25,000 for testing. We compare our Sentwatch with Facebook’s fastText (which we trained ourselves):

Model trained on IMDB IMDB Yelp Amazon
fastText 88.3 79.5 80.2
Sentwatch 91.6 87.0 87.8
Model trained on Yelp IMDB Yelp Amazon
fastText 85.1 95.7 89.4
Sentwatch 88.6 96.6 91.9
Model trained on Amazon IMDB Yelp Amazon
fastText 90.9 89.0 94.2
Sentwatch 93.0 94.6 96.0

We are very pleased that Sentwatch generalises well.

Writer’s Point of View
Sentwatch analyses the text from the point of view of the writer. This is unlike Emowatch, which is from the reader’s point of view. In academic literature, the writer is also known as the holder or source of the sentiment. All other sentiment analysis tools we have researched inherently or explicitly adopt this approach, and ours is no different in this respect.