[return to overview page]

In this section, I will build a support vector machine (SVM) model to make truth and lie predictions on our statements. I turn to this technique because it is one of the popular modeling methods in “machine learning”. Kuhn & Johnson (2013, p.343) state that SVMs have become “one of the most flexible and effective machine learning tools available” since they were first developed by Vladimir Vapnik in the 1960s. Indeed, SVMs are used in countless research papers in the realm of computational social science and text analysis (e.g. Bakshy, Messing, & Adamic (2015) use an SVM for news article classification in their highly cited paper on selective news exposure on Facebook, and Wu et al. (2008) list it in their article “Top 10 algorithms in data mining”, a paper which itself has over 3,900 citations).

I’m not going to pretend to have a deep understanding of the mathematics that underlie this model (obviously I don’t). However, I will try to recapitulate the main intuitions from those, like Kuhn & Johnson (2013), who have tried to bring machine learning methods to a larger audience.

In the figure below, Kuhn & Johnson (2013, p.344) have us imagine a case where we are using two predictor variables (i.e. Predictor A and Predictor B, along the x and y axes) to predict binary outcomes (i.e. classify/separate the red circles and blue squares). In cases where these two binary outcomes are perfectly seperable, an infinite number of lines can be generated that would indeed successfully separate these two classes (left panel). For each of these individual lines, we can imagine perpendicular lines radiating outwards from both sides. The distance that each of these lines can radiate outwards before they would bump into a data point is called the “margin”. In each case, the bounds of this margin are entirely determined by those nearest abutting data points, which are called “support vectors” (because they sort of “support” the margin; and hence the name “support vector machines”). In the simplest case, it is my understanding that support vector machines essentially construct a linear model with the largest possible margin given the data points.

This strategy can then be generalized to cases of higher dimensions (e.g. defining a 2-d plane (instead of a 1-d line) to seperate binary outcomes when we have 3 predictors (instead of 2) and are thus working in three dimensions, etc). And further, of course in most cases, the two classes are not perfectly seperable. However, this problem can be dealt with by applying a penalties for each of the misclassified data points when generating maximum margin classifiers. This introduces a new parameter that we must set for our model – the cost penalty for misclassification. (There is no “correct” cost penalty. And the actual costs penalties used are usually determined through another process of training and testing that is “nested” within the training data set, which cycles through the performance of models with different cost penalties. A cost penality of 1 penalizes “hits” and “misses” equally. And larger cost penalties tend to result in more overfitting (Kuhn & Johnson, 2013, p.346-347).)