3. February 2016
The Wisdom of Crowds
A scientific look at sentiment analysis and named-entities
Sentiment analysis encloses various techniques to detect words that express a positive and negative feeling or emotion. These words are commonly known as sentiment words or opinion words. Beyond words, n-grams (contiguous sequence of n words) and idiomatic expressions are commonly used as sentiment words. For example, ‘terrible’, ‘quite wonderful’ and ‘break a leg’. Sentiment words can be used to predict sentiment classes for users’ opinions and these words have proved to be useful in a sentiment analysis task. An important task is to identify relevant mentions to named entities, which are accompanied by related sentiment words. From the algorithm perspective, the challenge is to analyse how these sentiment words affect the public image of the named entities. Previous work (e.g. by Minqing Hu and Bing Liu, by Yohan Jo and Alice Oh or by Wayne Xin Zhao et al.) has made significant advances on detecting product aspects or features, and it is reasonable to apply these methods by analysing how sentiment words affect named entities reputation. However, unlike products, opinions about named entities is not structured around a fixed set of aspects or features which imply a more challenging task. For these reasons, in the last decade, there has been a lot of research in which numerous algorithms have been proposed to learn sentiment lexicons (See works by Stefano Baccianella et al., by Xiaowen Ding et al., by Andrea Esuli and Fabrizio Sebastiani, by Delip Rao and Deepak Ravichandran, by Hiroya Takamura et al. or by Leonid Velikovich et al.).
Intuitively, sentiment words are associated with words or phrases that express a sentiment. For example, good, wonderful, poor and terrible represent sentiment words. However, beyond these words there are numerous words that are used to express a sentiment, e.g. Bollywood encloses a sentiment value in “Queen is not another Bollywood movie.” In this example it is not so obvious which sentiment is expressed by the word ‘Bollywood’. Specific products, organizations or named entities characteristics are used in subjective sentences to express a sentiment which may only have a sentiment in specific domains (see Articles from DXiaowen Ding et al. and Bing Liu). In some cases, these named-entities are so important that they become a synonym of high-quality (or low-quality). It is not uncommon to find reviews where multiple citations to actors or movies occur. It becomes fundamental that an IR system identifies these named-entities and infers their reputation. Reputation analysis for entities has been a topic of recent research. Go et al. used various machine learning algorithms (Naïve Bayes, Maximum Entropy and SVM) to classify the overall sentiment of Twitter messages towards specific keywords, representing various distinct entities likes movies, famous people, locations and companies. Why is reputation, allied with sentiment analysis, an important object of study, one may ask? Well, it will allow to improve predictive sentiment analysis. Which helps to identify trends that target to undermine named-entities or to point out newly esteemed named-entities. This type of knowledge is highly valuable resulting in the high interest to improve algorithms for this topic.Photo Credit:
- All images are taken from Pixabay.com and have been published under CC0 Public Domain