[return to overview page]

In this section, we will generate our second hybrid human-computer model. This will be a support vector machine model. The general training-testing and model building procedure will be exactly the same as for our earlier support vector machine model, with the exception that our hybrid model will also take into account human predictions as a feature. The performance of this model will be compared to the performance of a non-hybrid model (which uses only textual features) trained on the same statements.

Packages

Again, I will start by loading relevant packages.

# before knitting: message = FALSE, warning = FALSE
library(tidyverse) # cleaning and visualization
library(caret) # modeling
library(kernlab) # has sigest, for selecting tuning parameter sigma

Load Data

Next, I will load the data file, we created earlier, which has both the cleaned and processed textual features and human predictions. Note, again, that this data file consists of a total of 3,663 statments.

# load df of combined human and processed textual feature and ground truth
load("stats_combo.Rda")

# For rendering, I'm going to cheat here and load results created when this model was first run
# For some reason, chunks that were supposed to be cached when originally run are rerunning
load("results_HYB_svm.Rda")
# change the specific names (renamed at end), back to generic name
results_HYB_svm -> results
# convert predict column from factor (truth, lie) to numeric (1, 0); for use by sigest
stats_combo <-
  stats_combo %>%
  mutate(predict = case_when(predict == "truth" ~ 1,
                             predict == "lie" ~ 0))

# print
stats_combo

EXAMPLE (Single Support Vector Machine Model, with Radial Basis Function)

As usual, let’s begin with a singular example. Here we will simply train and test two support vector machine models: one hybrid model that takes into accounts both human predictions and textual features and one that only takes into account textual features. As with our original support vector machine models, we will use a radial basis kernel function, to take advantage of SVMs ability to produce non-linear classification boundaries (see final introductory paragraphs of earlier SVM page, right above “Packages” section, for some more discussion).

Split Sample Into Training and Testing Sets

As before, we are going to conduct an exact 50-50 split, randomly allocating one half of the statements to the training set, and the other half to the testing set (using the createDataPartition function in the caret packages; Kuhn, 2008; Kuhn, 2019). The exact same training set will be used to build the hybrid and non-hybrid models, both of which will then also be tested on the exact same testing set.

# set seed, so that statistics don't keep changing for every analysis
set.seed(2019)

# partition data in 50-50 lgocv split (create index for test set)
index_train_ex <- 
  createDataPartition(y = stats_combo$stat_id,
                      p = 0.50,
                      times = 1,
                      list = FALSE)

# actually create data frame with training set (predictors and outcome together)
train_set_ex <- stats_combo[index_train_ex, ]

# actualy create data frame with test set (predictors and outcome together)
test_set_ex <- stats_combo[-index_train_ex, ]

Build Model (on Training Set)

Now that the data are split, we can fit support vector machine models to the training data. Again, we will fit two support vector machine models. One, a hybrid model that uses both textual features and human guesses as predictors; the other, a non-hybrid model that uses on textual features as predictors.

As with our earlier support vector machine models, and unlike with our logistic regression models, this type of model has two tuning parameters that we need to set: our cost penalty and our sigma parameter in the radial basis function. As before, I will set the sigma parameter using the slightly modified recommendations from the sigest function in the kernlab package. And as before, I will select a cost penalty by cycling through the candidate values of 0.25, 0.5, 1, 2, and 4, and selecting the value that leads to the highest level of overall accuracy across 3 repeated 50-50 training testing splits within the training set (see “Build Model (on Training Set)” section of earlier SVM page for more detailed discussion).

When we go through this process, we end up with two trained models with the following (highly similar) tuning parameter values:

Model Type Sigma Cost Penalty
Non-Hybrid 0.0057 4
Hybrid 0.0055 4
# note: these setting chunks are separated for reuse later

# set seed, so that statistics don't keep changing for every analysis
set.seed(2019)

# -----------------------------------------------------------------------------
# STEP 1: SELECT TUNING PARAMETERS

# part a: select cost penalty values
costs_svm <- c(0.25, 0.5, 1, 2, 4)

# part b: get suggested sigma value, for radial basis function, for this data set
# i take the median of the kernlab suggestions 
# (default in caret is to take mean, excluding second suggestion)
# see: https://github.com/topepo/caret/blob/master/models/files/svmRadial.R
# and: https://stats.stackexchange.com/questions/408159/what-is-the-basis-for-the-default-sigma-value-used-by-svmradial-in-caret
sigma_hyb_svm <- median(
              kernlab::sigest(
                as.matrix(train_set_ex 
                            %>% select(
                              -stat_id,
                              -grd_truth)
                          ),
                scaled = TRUE
                )
              )

sigma_non_hyb_svm <- median(
              kernlab::sigest(
                as.matrix(train_set_ex 
                            %>% select(
                              -stat_id,
                              -grd_truth,
                              -predict)
                          ),
                scaled = TRUE
                )
              )

# part c: save parameters in tune grid object
tune_grid_hyb_svm <- expand.grid(sigma = sigma_hyb_svm,
                                 C = costs_svm)

tune_grid_non_hyb_svm <- expand.grid(sigma = sigma_non_hyb_svm,
                                     C = costs_svm)

# -----------------------------------------------------------------------------
# STEP 2: SELECT TUNING METHOD
# set up train control object, which specifies training/testing technique
train_control_svm <- trainControl(method = "LGOCV",
                                  number = 3,
                                  p = 0.50)
# set seed, so that statistics don't keep changing for every analysis
# (applies for models which might have random parameters)
set.seed(2019)

# start timer
start_time <- Sys.time()

# -----------------------------------------------------------------------------
# STEP 3: TRAIN MODEL

# use caret "train" function to train svm
model_hybrid <- 
  train(form = grd_truth ~ . -stat_id,
        data = train_set_ex,
        method = "svmRadial",
        tuneGrid = tune_grid_hyb_svm,
        trControl = train_control_svm,
        metric = "Accuracy") # how to select among models

model_non_hybrid <- 
  train(form = grd_truth ~ . -stat_id -predict,
        data = train_set_ex,
        method = "svmRadial",
        tuneGrid = tune_grid_non_hyb_svm,
        trControl = train_control_svm,
        metric = "Accuracy") # how to select among models

# end timer
total_time <- Sys.time() - start_time
# examine tuning parameters selected for non-hybrid model
model_non_hybrid
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 1832 samples
##   92 predictor
##    2 classes: 'lie', 'truth' 
## 
## No pre-processing
## Resampling: Repeated Train/Test Splits Estimated (3 reps, 50%) 
## Summary of sample sizes: 916, 916, 916 
## Resampling results across tuning parameters:
## 
##   C     Accuracy   Kappa    
##   0.25  0.5716885  0.1433770
##   0.50  0.5775109  0.1550218
##   1.00  0.5796943  0.1593886
##   2.00  0.5895197  0.1790393
##   4.00  0.6008006  0.2016012
## 
## Tuning parameter 'sigma' was held constant at a value of 0.005691903
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.005691903 and C = 4.
# examine tuning parameters select for hybrid model
model_hybrid
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 1832 samples
##   92 predictor
##    2 classes: 'lie', 'truth' 
## 
## No pre-processing
## Resampling: Repeated Train/Test Splits Estimated (3 reps, 50%) 
## Summary of sample sizes: 916, 916, 916 
## Resampling results across tuning parameters:
## 
##   C     Accuracy   Kappa    
##   0.25  0.5709607  0.1419214
##   0.50  0.5836972  0.1673945
##   1.00  0.5855167  0.1710335
##   2.00  0.5917031  0.1834061
##   4.00  0.5986172  0.1972344
## 
## Tuning parameter 'sigma' was held constant at a value of 0.005521202
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.005521202 and C = 4.

Evaluate Model (on Testing Set)

Now, let’s examine and compare the performance of the two models. As before, we will do this by using the models to generate predictions on the remaining ~50% of the data (1831 statements) in the training set. Performance statistics are derived using the confusionMatrix function from the caret package (Kuhn, 2008).

When we do this, we see that both models have an overall accuracy level that is a again above chance:

  • hybrid model: 64.1% [95% CI: 61.9, 66.3%]
  • non-hybrid model: 63.5% [95% CI: 61.2, 65.7%]

Further, as with analysis of hybrid and non-hybrid logistic regression models, the overall accuracy of the hybrid model is better than the overall accuracy of the non-hybrid model. But again, dissapointingly, this difference is not significant, chi-squared = 0.14, p = 0.705. (Again, see the “Evaluate Model (on Testing Set)” section of the hybrid logistic regression analyses for some discussion and speculation about possible reasons for this outcome.)

As before, with regard to other performance metrics, I have not yet conducted formal statistical for each of theme, but sensitivity, specificity, precision, and negative predictive value look roughly the same between the hybrid and non-hyrbid models, hovering around 63% for all.

# generate predictions for hybrid model 
preds_hybrid <-
  predict(object = model_hybrid,
                 newdata = test_set_ex,
                 type = "raw")

# generate predictions for non-hybrid model
preds_non_hybrid <-
  predict(object = model_non_hybrid,
                 newdata = test_set_ex,
                 type = "raw")
# examine hybrid model performance
conf_hybrid <-
  confusionMatrix(data = preds_hybrid,
                  reference = test_set_ex$grd_truth,
                  positive = "truth")

# print
conf_hybrid
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction lie truth
##      lie   595   336
##      truth 321   579
##                                           
##                Accuracy : 0.6412          
##                  95% CI : (0.6187, 0.6632)
##     No Information Rate : 0.5003          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.2824          
##  Mcnemar's Test P-Value : 0.5849          
##                                           
##             Sensitivity : 0.6328          
##             Specificity : 0.6496          
##          Pos Pred Value : 0.6433          
##          Neg Pred Value : 0.6391          
##              Prevalence : 0.4997          
##          Detection Rate : 0.3162          
##    Detection Prevalence : 0.4915          
##       Balanced Accuracy : 0.6412          
##                                           
##        'Positive' Class : truth           
## 
# examine non-hybrid model performance
conf_non_hybrid <-
  confusionMatrix(data = preds_non_hybrid,
                  reference = test_set_ex$grd_truth,
                  positive = "truth")

# print
conf_non_hybrid
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction lie truth
##      lie   581   334
##      truth 335   581
##                                           
##                Accuracy : 0.6346          
##                  95% CI : (0.6121, 0.6567)
##     No Information Rate : 0.5003          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.2693          
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.6350          
##             Specificity : 0.6343          
##          Pos Pred Value : 0.6343          
##          Neg Pred Value : 0.6350          
##              Prevalence : 0.4997          
##          Detection Rate : 0.3173          
##    Detection Prevalence : 0.5003          
##       Balanced Accuracy : 0.6346          
##                                           
##        'Positive' Class : truth           
## 
# calculate values relevant for test (and some extras, in case i want to compare sensitivity, etc)
hybrid_truth_correct = conf_hybrid$table[4]
hybrid_truth_total = conf_hybrid$table[3] + conf_hybrid$table[4]
hybrid_lie_correct = conf_hybrid$table[1]
hybrid_lie_total = conf_hybrid$table[1] + conf_hybrid$table[2]
hybrid_total_correct = hybrid_truth_correct + hybrid_lie_correct
hyrbid_total_truth_guesses = conf_hybrid$table[2] + conf_hybrid$table[4]
hyrbid_total_lie_guesses = conf_hybrid$table[1] + conf_hybrid$table[3]
hybrid_total = sum(conf_hybrid$table)

non_hybrid_truth_correct = conf_non_hybrid$table[4]
non_hybrid_truth_total = conf_non_hybrid$table[3] + conf_non_hybrid$table[4]
non_hybrid_lie_correct = conf_non_hybrid$table[1]
non_hybrid_lie_total = conf_non_hybrid$table[1] + conf_non_hybrid$table[2]
non_hybrid_total_correct = non_hybrid_truth_correct + non_hybrid_lie_correct
non_Hyrbid_total_truth_guesses = conf_non_hybrid$table[2] + conf_non_hybrid$table[4]
non_hyrbid_total_lie_guesses = conf_non_hybrid$table[1] + conf_non_hybrid$table[3]
non_hybrid_total = sum(conf_non_hybrid$table)
# conduct actual test
prop.test(x = c(hybrid_total_correct, non_hybrid_total_correct),
          n = c(hybrid_total, non_hybrid_total),
          alternative = "two.sided",
          conf.level = 0.95)
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  c(hybrid_total_correct, non_hybrid_total_correct) out of c(hybrid_total, non_hybrid_total)
## X-squared = 0.14305, df = 1, p-value = 0.7053
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.02512375  0.03823135
## sample estimates:
##    prop 1    prop 2 
## 0.6411797 0.6346259
# examine z value, using two proportion z-test just for comparison

# function to compute z-value
# from: https://www.r-bloggers.com/comparison-of-two-proportions-parametric-z-test-and-non-parametric-chi-squared-methods/
z.prop = function(x1,x2,n1,n2){
  # compute z-value 
  numerator = (x1/n1) - (x2/n2)
  p.common = (x1+x2) / (n1+n2)
  denominator = sqrt(p.common * (1-p.common) * (1/n1 + 1/n2))
  z.prop.ris = numerator / denominator
  
  # compute p-value
  p_value <- 2*pnorm(-abs(z.prop.ris))
  
  # combine results in string vector
  test_result <- c(paste("z=", round(z.prop.ris, 3),
                         ", p=", round(p_value, 3),
                         sep = ""))
  
  # return z and p-value
  return(test_result)
}

# get actual two proportion z-test results
z.prop(x1 = hybrid_total_correct,
         x2 = non_hybrid_total_correct,
         n1 = hybrid_total, 
         n2 = non_hybrid_total)
## [1] "z=0.413, p=0.68"

Change in predictions

As we did before, let’s try to get some insight into how the incorporating human predictions changed the predictions made by the hybrid model (as compared to the predictions made by the non-hybrid model).

To do this, I have tabulated the total number of times each of the eight possible changes in predictions have occurred, when the non-hybrid model’s predictions are compared to the hybrid model’s predictions. As a reminder, these eight possible changes are:

  • nothing changes
    • 1: non-hybrid prediction of true positive stays true positive prediction in hybrid model
    • 2: true negative prediction in non-hybrid model stays true negative prediction in hybrid model
    • 3: false positive prediction in non-hybrid model stays false positive prediction in hybrid model
    • 4: false negative prediction in non-hybrid model stays false negative prediction in hybrid model
  • correct prediction changed to incorrect prediction
    • 5: true positive prediction in non-hybrid model changes to false positive prediction in hybrid model
    • 6: true negative prediction in non-hybrid model changes to false negative prediction in hybrid model
  • incorrect prediction changed to correct prediction
    • 7: false positive prediction in non-hybrid model changes to true negative prediction in hybrid model
    • 8: false negative prediction in non-hybrid model changes to true positive prediction in hybrid model

(Again, see “Change in predictions” section in earlier hybrid logistic regression analysis

When do this, what we see is that again most of the predictions stayed the same between the non-hybrid and hybrid model (1697 of 1831 predictions, 92.7%, were the same). And only a slightly higher number of predictions went from incorrect to correct (73 cases, 4.0%, were either false negative that become true positives or false positives that become true negatives) than the number of predictions that went from correct to incorrect (61 cases, 3.3%, were either true positives that become false negatives or true negatives that became false positives).

More detailed analyses await with regard to the relationship between human prediction and the other (textual) features used for prediction.

# combine: grd_truth, non_hybrid predictions, hybrd predictions
preds_comp <-
  cbind(as.data.frame(test_set_ex$grd_truth),
        as.data.frame(preds_non_hybrid),
        as.data.frame(preds_hybrid))

# label columns
colnames(preds_comp) <- c("grd_truth", "non_hyb", "hyb")

# print
# preds_comp

# generate additional information
preds_tally <-
  preds_comp %>%
  group_by(grd_truth, non_hyb, hyb) %>%
  summarize(n = n()) %>%
  arrange(desc(grd_truth), desc(non_hyb), desc(hyb)) %>%
  mutate(change_present = case_when(non_hyb == hyb ~ "no",
                            non_hyb != hyb ~ "yes")) %>%
  mutate(change_type = case_when(grd_truth == "truth" & non_hyb == "truth" & hyb == "truth" ~ "TP -> TP",
                            grd_truth == "truth" & non_hyb == "truth" & hyb == "lie" ~ "TP -> FN",
                            grd_truth == "truth" & non_hyb == "lie" & hyb == "truth" ~ "FN -> TP",
                            grd_truth == "truth" & non_hyb == "lie" & hyb == "lie" ~ "FN -> FN",
                            grd_truth == "lie" & non_hyb == "truth" & hyb == "truth" ~ "FP -> FP",
                            grd_truth == "lie" & non_hyb == "truth" & hyb == "lie" ~ "FP -> TN",
                            grd_truth == "lie" & non_hyb == "lie" & hyb == "truth" ~ "TN -> FP",
                            grd_truth == "lie" & non_hyb == "lie" & hyb == "lie" ~ "TN -> TN")) %>%
  mutate(change_effect = case_when(change_type == "TP -> TP" ~ "nothing",
                                   change_type == "TN -> TN" ~ "nothing",
                                   change_type == "FP -> FP" ~ "nothing",
                                   change_type == "FN -> FN" ~ "nothing",
                                   change_type == "FN -> TP" ~ "gain",
                                   change_type == "FP -> TN" ~ "gain",
                                   change_type == "TP -> FN" ~ "loss",
                                   change_type == "TN -> FP" ~ "loss")) %>%
  arrange(change_effect)

# print
preds_tally
# tally the total number of times each type of "change_effect" occurred
preds_tally %>%
  group_by(change_effect) %>%
  summarize(n = sum(n))

FULL (Hybrid and Non-Hybrid Logistic Regression Models)

To get more reliable estimates of the performance of non-hybrid and hybrid support vector machine models, let’s repeat the exact model training and testing procedure from above 10 times, storing and aggregating the results.

Run 10 models

Below is the code that runs through this modeling process 10 different times and saves the result from each round.

# # -----------------------------------------------------------------------------
# STEP 0: set seed, so that statistics don't keep changing for every analysis
set.seed(2019)

# # -----------------------------------------------------------------------------
# STEP 1: decide how many times to run the model
rounds <- 10

# -----------------------------------------------------------------------------
# STEP 2: set up object to store results
# part a: create names of results to store
result_cols <- c("model_type", "hyb_type", "round", "accuracy", "accuracy_LL", "accuracy_UL",
                 "sensitivity", "specificity", "precision", "npv", "n",
                 "total_truths", "total_lies", "total_truth_guesses", "total_lie_guesses")

# part b: create matrix
results <-
  matrix(nrow = rounds,
         ncol = length(result_cols))

# part c: actually name columns in results marix
colnames(results) <- result_cols

# part d: convert to df (so multiple variables of different types can be stored)
results <- data.frame(results)

# -----------------------------------------------------------------------------
# STEP 2: start timer
start_time <- Sys.time()

# -----------------------------------------------------------------------------
# STEP 3: create rounds number of models, and store results each time

for (i in 1:rounds){
  
  # part a: partition data in 50-50 lgocv split (create index for test set)
  index_train <- 
    createDataPartition(y = stats_combo$stat_id,
                        p = 0.50,
                        times = 1,
                        list = FALSE)
  
  # part b: create testing and training data sets
  train_set <- stats_combo[index_train, ]
  test_set <- stats_combo[-index_train, ]
  
  
  # part c: use caret "train" function to train hybrid and non-hybrid logistic regression model
  model_hybrid_i <- 
    train(form = grd_truth ~ . -stat_id,
          data = train_set,
          method = "svmRadial",
          tuneGrid = tune_grid_hyb_svm,
          trControl = train_control_svm,
          metric = "Accuracy") # how to select among models

  model_non_hybrid_i <-
    train(form = grd_truth ~ . -stat_id -predict,
          data = train_set,
          method = "svmRadial",
          tuneGrid = tune_grid_non_hyb_svm,
          trControl = train_control_svm,
          metric = "Accuracy") # how to select among models
    
  # part d: make predictions, using both hybrid and non-hybrid models
  preds_hyb_i <-
    predict(object = model_hybrid_i,
            newdata = test_set,
            type = "raw")
  
  preds_non_hyb_i <-
    predict(object = model_non_hybrid_i,
            newdata = test_set,
            type = "raw")
  
  # part e: store model performance of both hybrid and non-hybrid model
  conf_hyb_i <-
    confusionMatrix(data = preds_hyb_i,
                    reference = test_set$grd_truth,
                    positive = "truth")
  
  conf_non_hyb_i <-
    confusionMatrix(data = preds_non_hyb_i,
                    reference = test_set$grd_truth,
                    positive = "truth")
  
  # part f: store hybrid and non-hybrod model results
  # model type
  results[2*i-1, 1] <- "svm"
  results[2*i, 1] <- "svm"
  # hybrid o non-hybrid
  results[2*i-1, 2] <- "hybrid"
  results[2*i, 2] <- "non_hybrid"
  # round
  results[2*i-1, 3] <- i
  results[2*i, 3] <- i
  # accuracy
  results[2*i-1, 4] <- conf_hyb_i$overall[1]
  results[2*i, 4] <- conf_non_hyb_i$overall[1]
  # accuracy LL
  results[2*i-1, 5] <- conf_hyb_i$overall[3]
   results[2*i, 5] <- conf_non_hyb_i$overall[3]
  # accuracy UL
  results[2*i-1, 6] <- conf_hyb_i$overall[4]
  results[2*i, 6] <- conf_non_hyb_i$overall[4]
  # sensitivity
  results[2*i-1, 7] <- conf_hyb_i$byClass[1]
  results[2*i, 7] <- conf_non_hyb_i$byClass[1]
  # specificity
  results[2*i-1, 8] <- conf_hyb_i$byClass[2]
  results[2*i, 8] <- conf_non_hyb_i$byClass[2]
  # precision
  results[2*i-1, 9] <- conf_hyb_i$byClass[3]
  results[2*i, 9] <- conf_non_hyb_i$byClass[3]
  # negative predictive value
  results[2*i-1, 10] <- conf_hyb_i$byClass[4]
  results[2*i, 10] <- conf_non_hyb_i$byClass[4]
  # sample size (total)
  results[2*i-1, 11] <- sum(conf_hyb_i$table)
  results[2*i, 11] <- sum(conf_non_hyb_i$table)
  # total truths
  results[2*i-1, 12] <- conf_hyb_i$table[3] + conf_hyb_i$table[4]
  results[2*i, 12] <- conf_non_hyb_i$table[3] + conf_hyb_i$table[4]
  # total lies
  results[2*i-1, 13] <- conf_hyb_i$table[1] + conf_hyb_i$table[2]
  results[2*i, 13] <- conf_non_hyb_i$table[1] + conf_non_hyb_i$table[2]
  # total truth guesses
  results[2*i-1, 14] <- conf_hyb_i$table[2] + conf_hyb_i$table[4]
  results[2*i, 14] <- conf_non_hyb_i$table[2] + conf_non_hyb_i$table[4]
  # total lie guesses
  results[2*i-1, 15] <- conf_hyb_i$table[1] + conf_hyb_i$table[3]
  results[2*i, 15] <- conf_non_hyb_i$table[1] + conf_non_hyb_i$table[3]
  
  # part g: print round and total elapsed time so far
  cumul_time <- difftime(Sys.time(), start_time, units = "mins")
  print(paste("round #", i, ": cumulative time ", round(cumul_time, 2), " mins",
              sep = ""))
  print("--------------------------------------")

}

View Results (Tabular)

Directly below, I’ve displayed a raw tabular summary of the results. We have 20 rows, which result from the 2 models (hybrid and non-hybrid) created in each of 10 rounds. One pattern that is again immediately apparent is how overall accuracy of the hybrid model compares with the overall accuracy of the non-hybrid model in each of the the 10 rounds. We see that in 8 of the 10 rounds, the hybrid model has a higher overall accuracy score than the non-hybrid model. However, when I conduct a simple sign test, which uses a binomial distribution to examine how likely it is that one model should have a higher accuracy score than the other model for 8 out of 10 rounds, under the null assumption that each model should have a higher accuracy score an equal number of times, this pattern does not diverge significanty from that null hypothesis, p = 0.109. Similarly, when I conduct a Wilcoxon signed-rank test (which is non-parametric test which tests whether there is sufficient evidence to reject the null hypothesis that two dependent samples are drawn from the same population), this difference does not appear to be significant, V = 39, p = 0.261. Although these results are not statistically significant, if this hybrid model dominance rate (80%) keeps up over a higher number of rounds (e.g. 100 rounds), then this would provide some supporting evidence that hybrid human-computer models perform better than non-hybrid computer models. This pattern of results is also consistent with the pattern we saw in the hybrid logistic regression analyses “View Results (Tabular)” section, where the hybrid model outperformed the non-hybrid model in each of the 10 rounds.

# print results (long form)
results
# get data in wide format (with accuracy for hybrid and non_hybrid models next to each other)
results_acc_wide <-
  results %>%
    select(hyb_type, round, accuracy) %>%
    spread(key = hyb_type,
           value = accuracy)

# print
results_acc_wide %>%
  mutate(winner = case_when(hybrid > non_hybrid ~ "hybrid",
                            hybrid < non_hybrid ~ "non-hybrid",
                            hybrid == non_hybrid ~ "tie"))
# sign test, via binomial test
binom.test(x = 8,
           n = 10,
           p = 0.5)
## 
##  Exact binomial test
## 
## data:  8 and 10
## number of successes = 8, number of trials = 10, p-value = 0.1094
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.4439045 0.9747893
## sample estimates:
## probability of success 
##                    0.8
# conduct Wilcoxon signed-rank test 
wilcox.test(x = results_acc_wide$hybrid,
            y = results_acc_wide$non_hybrid,
            paired = TRUE,
            conf.int = 0.95)
## Warning in wilcox.test.default(x = results_acc_wide$hybrid, y =
## results_acc_wide$non_hybrid, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = results_acc_wide$hybrid, y =
## results_acc_wide$non_hybrid, : cannot compute exact confidence interval
## with ties
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  results_acc_wide$hybrid and results_acc_wide$non_hybrid
## V = 39, p-value = 0.261
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -0.007387152  0.015220451
## sample estimates:
## (pseudo)median 
##    0.006554365

View Results (Graphically; Overall Accuracy)

When we quickly visually examine the average overall accuracy rate of non-hybrid and hybrid models across 10 rounds, we see that the hybrid model has a slightly higher level of overall accuracy than the non-hybrid model (although, as we see the confidence intervals for the two models overall accuracy rates overlap).

# average n in test set for hybrid and non-hybrid models
total_hyb <- mean((results %>% filter(hyb_type == "hybrid"))$n)
total_non_hyb <- mean((results %>% filter(hyb_type == "non_hybrid"))$n)

# vector of denominators to use, when creating error bars
denoms_overall <- c(total_hyb,
                    total_non_hyb)

# visualize overall accuracy
results %>%
  group_by(hyb_type) %>%
  summarize(accuracy = mean(accuracy)) %>%
  ggplot(aes(x = hyb_type,
             y = accuracy)) +
  geom_point(size = 2,
             color = "#545EDF") +
  geom_errorbar(aes(ymin = accuracy - 1.96*sqrt(accuracy*(1-accuracy)/denoms_overall),
                     ymax = accuracy + 1.96*sqrt(accuracy*(1-accuracy)/denoms_overall)),
                color = "#545EDF",
                width = 0.05,
                size = 1) +
  geom_hline(yintercept = 0.5,
             linetype = "dashed",
             size = 0.5,
             color = "red") +
  scale_y_continuous(breaks = seq(from = 0.49, to = 0.70, by = 0.01),
                     limits = c(0.49, 0.70)) +
  scale_x_discrete(limits = c("non_hybrid", "hybrid")) +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.y = element_line(color = "grey",
                                          size = 0.25),
        panel.background = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(hjust = 0.5),
        axis.title.y = element_text(margin = 
                                      margin(t = 0, r = 10, b = 0, l = 0)),
        axis.title.x = element_text(margin = 
                                      margin(t = 10, r = 00, b = 0, l = 0))) +
  labs(title = "Accuracy by Model Type",
       x = "Model Type",
       y = "Overall Accuracy")

View Results (Graphically; Sensitivity, Specificity, Precision, Negative Predictive Value)

We see a similar pattern of results when we examine the other four major performance statistics average across 10 rounds. The hybrid model has a higher average level of sensitivity, specificity, precision, and negative predictive value (although again confidence intervals overlap).

# average n in test set for hybrid and non-hybrid models
total_truths_hyb <- mean((results %>% filter(hyb_type == "hybrid"))$total_truths)
total_truths_non_hyb <- mean((results %>% filter(hyb_type == "non_hybrid"))$total_truths)
total_lies_hyb <- mean((results %>% filter(hyb_type == "hybrid"))$total_lies)
total_lies_non_hyb <- mean((results %>% filter(hyb_type == "non_hybrid"))$total_lies)
total_truth_guesses_hyb <- mean((results %>% filter(hyb_type == "hybrid"))$total_truth_guesses)
total_truth_guesses_non_hyb <- mean((results %>% filter(hyb_type == "non_hybrid"))$total_truth_guesses)
total_lie_guesses_hyb <- mean((results %>% filter(hyb_type == "hybrid"))$total_lie_guesses)
total_lie_guesses_non_hyb <- mean((results %>% filter(hyb_type == "non_hybrid"))$total_lie_guesses)

# vector of denominators to use, when creating error bars
denoms_perf <- c(total_truths_non_hyb,
                 total_truths_hyb,
                 total_lies_non_hyb,
                 total_lies_hyb,
                 total_truth_guesses_non_hyb,
                 total_truth_guesses_hyb,
                 total_lie_guesses_non_hyb,
                 total_lie_guesses_hyb)

# visualize results
results %>%
  select(hyb_type, round, sensitivity, specificity, precision, npv) %>%
  gather(key = "metric",
         value = "value",
         sensitivity, specificity, precision, npv) %>%
  group_by(hyb_type, metric) %>%
  summarize(value = mean(value)) %>%
  ungroup() %>%
  mutate(metric = factor(metric,
                            levels = c("sensitivity", "specificity", "precision", "npv"))) %>%
  ggplot(aes(x = hyb_type,
             y = value)) +
  geom_point(size = 2,
             color = "#545EDF") +
  geom_errorbar(aes(ymin = value - 1.96*sqrt(value*(1-value)/denoms_perf),
                     ymax = value + 1.96*sqrt(value*(1-value)/denoms_perf)),
                color = "#545EDF",
                width = 0.05,
                size = 1) +
  geom_hline(yintercept = 0.5,
             linetype = "dashed",
             size = 0.5,
             color = "red") +
  scale_y_continuous(breaks = seq(from = 0.50, to = 0.70, by = 0.05),
                     limits = c(0.49, 0.70)) +
  scale_x_discrete(limits = c("non_hybrid", "hybrid")) +
  facet_grid(metric ~ .) +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.y = element_line(color = "grey",
                                          size = 0.25),
        plot.background = element_blank(),
        panel.background = element_blank(),
        panel.border = element_rect(colour = "black", fill=NA, size=1),
        axis.ticks = element_blank(),
        plot.title = element_text(hjust = 0.5),
        axis.title.y = element_text(margin = 
                                      margin(t = 0, r = 10, b = 0, l = 0)),
        axis.title.x = element_text(margin = 
                                      margin(t = 10, r = 00, b = 0, l = 0))) +
  labs(title = "Metrics by Model Type",
       x = "Model Type",
       y = "Proportion")

Save Results

For now, this concludes our analysis. As our last step, lets save our results.

# rename results df, to be particular to this model type (for disambiguation later)
results_HYB_svm <- results

# clear results variable
rm(results)

# save results in Rda file
save(results_HYB_svm,
     file = "results_HYB_svm.Rda")

Citations

END