[return to overview page]

The feature I am now going to extract is some proxy for each statement’s sentiment. There are plausible reasons to believe and indeed some empirical evidence that sentiment (the affective, emotional content of a statement) might vary between when people are lying and telling the truth – and thus might be somewhat of a proxy for whether a statement is truthful or a lie. It is plausible that when people are lying they are often not in the same emotional state as when they are telling the truth. Most obviously, they may be more anxious. Other negative affect states (e.g. guilt) may also accompany lying, leading perhaps to an overall more negative affective state. If these emotions leak out in the way people speak and the words they use, by analyzing the sentiment in people’s statements, we may have some predictive signal about whether they are lying. Indeed, Pennebaker, Mehl, & Niederhoffer (2003, p.564), reviewing the evidence connecting sentiment and lying, state “several labs have found slight but consistent elevations in the use of negative emotion words during deception compared with telling the truth (e.g., Knapp & Comadena 1979, Knapp et al. 1974, Newman et al. 2002, Vrij 2000).” Likewise, in their review of the behavioral cues of lying, DePaulo, Lindsay, Malone, Muhlenbruck, Charlton, & Cooper (2003) find evidence that liars are less positive and more tense – for example, making more negative statements and complaints (see Table 5, from p. 93, below), and exhibiting more tense and fidgety behavior (see Table 6, from p. 93, below).

Obviously such a signal might be very weak, as the emotional content of people’s speech vary for all sorts of reasons (and in the particular context that this data was collected, the stakes were extremely low, so there may be even less reason for people to feel, for example, anxious when lying). (And see, for example, Vrij, Fisher, Mann, & Leal (2006) for critiques of fear and other emotion-based accounts of deception detection. ) Nevertheless, it is worth extracting this feature and exploring this avenue.


Again, I will start by loading relevant packages.

library(tidyverse) # cleaning and visualization
library(quanteda) # text analysis
library(tidytext) # has function for sentiment extraction

Load Data

I will load the most recent version of the cleaned statements, which comes from Feature Extraction. (Note, we created a more recent object, recording the frequency of the occcurance of various parts of speech. However, we will not be using that object right now.)

# this loads: stats_clean (a data-frame of out cleaned statements)


Here, the goal is to extract the sentiment from each statement. This is also done on a word by word basis. Each word is mapped onto a sentiment (e.g. positive, negative). And then for each sentence we can count up the numbers of words with each type of sentiment (e.g total number of positive words, total number of negative words).

Sentiment dictionaries

Some of the most popular methods for sentiment extraction involve essentially the same method. The authors take a big list of words, then for each word, they map a sentiment to that word (or multiple sentiments, in some cases). The most well known of such cases is LIWC (Pennebaker, Francis, & Booth, 2001; Tausczik, & Pennebaker, 2010). A notable downside of the LIWC dictionary is that it is propietary, i.e. you have to pay for it.

Nevertheless, there are many freely available dictionaries that map from words to sentiment. Some of these are summarized by Silge & Robinson (2016) in their text analysis textbook (which is specifically geared toward text analysis in R, and even more specifically using the tidy approach to R data). They note (and make available) 3 popular sentiment mapping dictionaries:

  • bing: from Hu & Liu (2004)
  • AFINN: from Nielsen (2011)
  • nrc: from Mohammad & Turney (2013)

These are all created through some sort of analysis of large scale web data. (See citations at the end for further references on the creation and use of each of these lexicons.)

Some of these lexicons map words to a large set of possible sentiments. For example, the nrc lexicon maps words to the following sentiments: negative, positive, fear, anger, trust, sadness, disgust, surprise, anticipation, joy. Other lexicons, like bing, simply map to either positive or negative sentiment, and that’s it. Others like afinn map to a number quantity (in their case, indicating the degree of negativity or positivity from -5 to +5).

To begin, we are just going to map to positive and negative sentiment. And we are going to use the bing set of words, because it appears to have the largest number of words with a positive or negative mapping.

Sentiment Extraction

Here, I will be relying on the tidytext package created by Silge & Robinson (2016), and explained in their aformentioned textbook on the subject.

Again, I will proceed by way of example.

Sentiment Extraction (Example)

Let’s look at two example sentences:

  • “The murder made me sad and depressed.”
  • “The party made me happy and excited, but also a little overwhelmed.”

First, let’s save these sentences to a data object and print them.

# create sentences
example <-
  data.frame(sentence = c("The murder made me sad and depressed.",
                      "The party made me happy and excited, but also a little overwhelmed.")) %>%
  mutate(sentence = as.character(sentence),
         sent_num = row_number()) %>%

# Print Sentences

Load Bing Sentiment Dictionary

What we want to do next is take each of the the words in these two sentences and find all the words in them that have been mapped to a sentiment in the bing dictionary. To do that, we need to first load in the dictionary of word mappings from bing. I do that below. We can see that over 6,788 words have been mapped to either a positive or negative sentiment.

# load bing sents and save to object
(bing_sents <- get_sentiments(lexicon = "bing"))