[return to overview page]

In this section, I examine the accuracy of humans in truth and lie detection. To do this, I first needed people to judge the statements in our corpus. This was done with the help of three research assistants (Alexis Levine, Emem-Esther Ikpot, and Catherine Seita). I describe the procedure by which they rendered judgments in more detail below. This is followed by an analysis of their performance.

Procedure

To begin, I randomly sorted the full set of 5,004 statements. I then divided this randomly sorted list of statements into three non-overlapping sets. I assigned one RA to each given segment. And I asked them go through the statements within their segment, one statement at a time. For each statement, they were asked to make two judgments. First, they made a binary judgment, a guess, about whether the statement was a truth or a lie. Second, they assessed how confident they were in their guess. The research assistants assessed their confidence by responding to the question “How confident are you in your guess?”, to which they could pick one of five responses: “0 = Not at all confident; 1 = Slightly confident; 2 = Somewhat confident; 3 = Fairly confident; 4 = Very confident”.

They were given the following general instruction about how they should orient their guessing.

“Each of these statements represent a person’s response to a question that was asked of them. Sometimes those people responded to the question truthfully (i.e. by telling the truth) and sometimes they responded to the question untruthfully (i.e. by telling a lie).

We would like for you to go through each of these statements, one at a time. First, read the statement thoroughly. And then, give us your best guess as to whether that statement is true (i.e. a case where the person responded to the question by telling the truth) or that statement is a lie (i.e. a case where the person responded to the question by telling a lie). Then, move on to the next statement and do the same.

For each statement, you may make this guess on whatever basis you choose (i.e. on intuition and “gut” feeling, careful deliberation, or any other basis of deciding). What is simply most important is that you give us your best guess as to what you think is more likely - that the person’s statement is a truth or a that the person’s statement is a lie."

Research assistants recorded their responses in an excel sheet, pictured below.

Note that participants were not given any information about the questions to which each statement was a response. They simply read the statements and rendered their guesses. This was done so that any eventual comparison between human and computer performance would be on more equal footing. The computer models I have built and the primary additional ones I plan to build do not take into account any information about the question to which statements are a response. That is, the models do not include anything like an indicator variable for each question, a question by feature interaction term for predictors, a hierarchical model structure which factors in question or anything else whereby the model would account for the different question to which statements are a response. As far as the computer knows, there are just statements. (In later analysis, I do plan on building models that do account for each question.) Because the models do not account for questions, it seemed only fair that humans should not get any information about questions either.

As of the time of this writing, the research assistants did not render a judgment for each of the 5,004 statements. However, over 3,000 statements were evaluated, providing a solid basis to start the analysis, with which I now proceed.

Packages

Again, I will start by loading relevant packages.

# before knitting: message = FALSE, warning = FALSE
library(tidyverse) # cleaning and visualization
library(ggthemes) # visualization
library(xlsx) # reading in excel file
library(caret) # for confusionMatrix() function
library(skimr) # for dope ass summary stats

Load Data

First, I will load in the excel sheets on which each of the research assistants recorded their responses. (And also another file which contains some other useful information about each statement – namely, the actual ground truth for each statement, which we’ll need to assess performance.)

# load in guesses from RAs
stats_emem <- 
  read.xlsx("guesses_EMEM.xlsx",
            sheetIndex = 1)


stats_catherine <- 
  read.xlsx("guesses_CATHERINE.xlsx",
            sheetIndex = 1)

stats_lexi <-
  read.xlsx("guesses_LEXI.xls",
            sheetIndex = 1)

# load data frame that has stat_id connected to grd_truth
load("stats_clean.Rda")

Clean Data

In this section, I clean up and format the research assistant’s responses. I then combine these responses into one data object. The entries in this data object are printed below.

(A note: there were actually a set of 100 statements for which two research assistants, Emem and Catherine, both registered responses. This was not an accident. Rather, the purpose was to use this to later examine consistency in guessing between different guessers. For the main analyses, these 100 statements are excluded.)

# rename columns in emem file
stats_emem_clean <-
  stats_emem %>%
  select(stat_id, 5, 6) %>%
  rename_at(2, ~ "predict") %>%
  rename_at(3, ~ "conf") %>%
  mutate(predict = tolower(trimws(predict))) %>%
  filter(!is.na(predict)) %>%
  mutate(person = "emem") %>%
  dplyr::mutate(order = row_number())

# rename columns in catherine file
stats_catherine_clean <-
  stats_catherine %>%
  select(stat_id, 5, 6) %>%
  rename_at(2, ~ "predict") %>%
  rename_at(3, ~ "conf") %>%
  mutate(predict = tolower(trimws(predict))) %>%
  filter(!is.na(predict)) %>%
  mutate(person = "catherine") %>%
  dplyr::mutate(order = row_number())

# rename columns in lexi file
stats_lexi_clean <-
  stats_lexi %>%
  select(stat_id, 4, 5, Participant) %>%
  rename_at(2, ~ "predict") %>%
  rename_at(3, ~ "conf") %>%
  mutate(predict = tolower(trimws(predict))) %>%
  filter(!is.na(predict)) %>%
  dplyr::rename(person = Participant) %>%
  mutate(person = trimws(as.character(person))) %>%
  mutate(person = case_when(person == "1" ~ "lexi",
                            person != "1" ~ person)) %>%
  filter(person == "lexi") %>% # only take the guesses from lexi (not the p's ran)
  dplyr::mutate(order = row_number())

# combine files
stats_guess <-
  bind_rows(stats_emem_clean,
            stats_catherine_clean,
            stats_lexi_clean)

# find statements for which multiple people might have registered guesses
overlap_stat_id <-
  c(intersect(stats_emem_clean$stat_id, stats_catherine_clean$stat_id), # has overlap
    intersect(stats_emem_clean$stat_id, stats_lexi_clean$stat_id),
    intersect(stats_catherine_clean$stat_id, stats_lexi_clean$stat_id))

# remove any rows which have been answered by multiple people
stats_guess <-
  stats_guess %>%
  filter(!(stat_id %in% overlap_stat_id))

# join files with ground truth data
stats_guess <- 
  stats_guess %>%
  left_join(y = (stats_clean %>% select(stat_id, grd_truth)),
            by = "stat_id") %>%
  select(stat_id,
         grd_truth,
         everything()) %>%
  mutate(predict = as.factor(predict))

# print resulting data frame
stats_guess