[return to overview page]

In this section, we will generate our first hybrid human-computer model. This will be a logistic regression model. The general training-testing and model building procedure will be exactly the same as for our earlier logistic regression model, with the exception of course that our hybrid model will also take into account human predictions as a feature (in addition to the textual features used earlier). The performance of this model will be compared to the performance of a non-hybrid model (which uses only textual features) trained on the same statements.

Packages

Again, I will start by loading relevant packages.

# before knitting: message = FALSE, warning = FALSE
library(tidyverse) # cleaning and visualization
library(caret) # modeling

Load Data

Next, I will load the data file, we created earlier, which has both the cleaned and processed textual features and human predictions. Note, again, that these data files consists of a total of 3,663 statments.

# load df of combined human and processed textual feature and ground truth
load("stats_combo.Rda")

# For rendering, I'm going to cheat here and load results created when this model was first run
# For some reason, chunks that were supposed to be cached when originally run are rerunning
load("results_HYB_log.Rda")
# change the specific names (renamed at end), back to generic name
results_HYB_log -> results
# model_hybrid_log -> model_hybrid
# model_non_hybrid_log -> model_non_hybrid

# print combined file
stats_combo