Logistic Regression with SGD in 2 dimensions

Here are some training data points with labels, and a weight vector w initialized to all-zeros.

In [1]:
import numpy as np

# load data
Xy = np.loadtxt('synthetic.txt')
X = Xy[:, :-1]
y = Xy[:, -1]
# split into train and test
ttsplit = int(y.size*.8)
trainX = X[:ttsplit, :]
trainy = y[:ttsplit]
testX = X[ttsplit:, :]
testy = y[ttsplit:]

import matplotlib.pyplot as plt
%matplotlib inline
from demofuncs import *

plot_labeled_data(trainX, trainy)

Run a perceptron learner (the basic one without bias, etc) on this training data, and measure the accuracy on the test set.

Accuracy is measured in two ways: raw, and mean squared, which is

$$\dfrac{1}{p}\sum_{i=1}^p (y-h_\theta(x^{(i)}))^2$$
In [4]:
def perceptron_train(X, y, maxiter): 
    num_points, num_dims = X.shape
    w = np.zeros(num_dims)
    for epoch in range(maxiter):
        num_errors = 0
        for i in range(num_points):
            if y[i]*w.dot(X[i])<=0:
                w += y[i]*X[i]
                num_errors +=1
        print 'Epoch', epoch, ':', num_errors, 'errors', w
        
        if epoch%1==0:
            plot_labeled_data(X, y, w)

        if num_errors==0:
            break
    return w

w = perceptron_train(trainX, trainy, 20)
meansq_acc, acc = score(testX, testy, w)
print 'test accuracy: mean square=', meansq_acc, 'raw=', acc 
Epoch 0 : 8 errors [ 1.41675788 -0.65539514]
Epoch 1 : 1 errors [ 1.3890965  -0.71399474]
Epoch 2 : 0 errors [ 1.3890965  -0.71399474]
test accuracy: mean square= 0.869028557654 raw= 1.0

Now learn the hyperplane with logistic regression (stochastic gradient descent, eta=0.8):

In [5]:
w = logreg_train(trainX, trainy, 20)
meansq_acc, acc = score(testX, testy, w)
print 'test accuracy: mean square=', meansq_acc, 'raw=', acc 
Epoch 0 : 4 errors [ 3.20080905 -1.43296741]
Epoch 1 : 2 errors [ 4.52455114 -2.06947624]
Epoch 2 : 2 errors [ 5.44081693 -2.51873752]
Epoch 3 : 2 errors [ 6.16232983 -2.87569646]
Epoch 4 : 2 errors [ 6.7650757  -3.17627077]
Epoch 5 : 2 errors [ 7.2862502  -3.43811726]
Epoch 6 : 2 errors [ 7.74727984 -3.67137002]
Epoch 7 : 2 errors [ 8.16181289 -3.88246521]
Epoch 8 : 2 errors [ 8.53916045 -4.07578737]
Epoch 9 : 2 errors [ 8.88599954 -4.25447964]
Epoch 10 : 2 errors [ 9.20730335 -4.42088398]
Epoch 11 : 2 errors [ 9.50688805 -4.57679907]
Epoch 12 : 2 errors [ 9.78775333 -4.72364042]
Epoch 13 : 2 errors [ 10.05230431  -4.86254453]
Epoch 14 : 2 errors [ 10.30250181  -4.99443943]
Epoch 15 : 2 errors [ 10.53996725  -5.12009383]
Epoch 16 : 2 errors [ 10.76605788  -5.24015235]
Epoch 17 : 2 errors [ 10.98192206  -5.35516147]
Epoch 18 : 2 errors [ 11.18854051  -5.46558884]
Epoch 19 : 2 errors [ 11.38675787  -5.57183803]
test accuracy: mean square= 0.98553495913 raw= 1.0

The above experiment shows that logistic regression is better than the perceptron at finding a hyperplane that separates the test points well (where "well" means having a good mean square accuracy).

What happens when the data is not linearly separable?

In [6]:
nonlinX = np.vstack((trainX, [1, -0.5]))
nonliny = np.append(trainy, -1)

plot_labeled_data(nonlinX, nonliny, [0, 0])
In [7]:
w = perceptron_train(nonlinX, nonliny, 20)
meansq_acc, acc = score(testX, testy, w)
print 'test accuracy: mean square=', meansq_acc, 'raw=', acc 
Epoch 0 : 9 errors [ 0.41675788 -0.15539514]
Epoch 1 : 2 errors [-0.6109035   0.28600526]
Epoch 2 : 5 errors [-0.1209306   0.04633054]
Epoch 3 : 9 errors [ 0.68191618 -0.63870174]
Epoch 4 : 8 errors [ 1.00171985  0.22253424]
Epoch 5 : 7 errors [ 0.66498973 -0.06809072]
Epoch 6 : 8 errors [ 0.28041934 -0.1464825 ]
Epoch 7 : 1 errors [-0.71958066  0.3535175 ]
Epoch 8 : 5 errors [-0.22960775  0.11384278]
Epoch 9 : 9 errors [ 0.57323903 -0.5711895 ]
Epoch 10 : 8 errors [ 0.89304269  0.29004648]
Epoch 11 : 8 errors [ 0.48402368 -0.47056766]
Epoch 12 : 9 errors [ 0.75172424 -0.2582918 ]
Epoch 13 : 6 errors [ 0.17749152 -0.10189397]
Epoch 14 : 1 errors [-0.82250848  0.39810603]
Epoch 15 : 5 errors [-0.33253557  0.15843131]
Epoch 16 : 9 errors [ 0.4703112  -0.52660097]
Epoch 17 : 8 errors [ 0.5764883 -0.0275209]
Epoch 18 : 8 errors [ 0.19191792 -0.10591268]
Epoch 19 : 1 errors [-0.80808208  0.39408732]
test accuracy: mean square= 0.646383395811 raw= 0.0
In [8]:
w = logreg_train(nonlinX, nonliny, 20)
meansq_acc, acc = score(testX, testy, w)
print 'test accuracy: mean square=', meansq_acc, 'raw=', acc 
Epoch 0 : 5 errors [ 2.41641448 -1.04077012]
Epoch 1 : 3 errors [ 3.27860938 -1.45031627]
Epoch 2 : 3 errors [ 3.77223826 -1.69079293]
Epoch 3 : 3 errors [ 4.09692967 -1.84988036]
Epoch 4 : 3 errors [ 4.32470561 -1.96176911]
Epoch 5 : 3 errors [ 4.49047721 -2.04334696]
Epoch 6 : 3 errors [ 4.61397604 -2.10420861]
Epoch 7 : 3 errors [ 4.70745113 -2.15032678]
Epoch 8 : 3 errors [ 4.77899812 -2.18565835]
Epoch 9 : 3 errors [ 4.83420882 -2.21294241]
Epoch 10 : 3 errors [ 4.87707165 -2.23413657]
Epoch 11 : 3 errors [ 4.91050022 -2.25067335]
Epoch 12 : 3 errors [ 4.93666177 -2.26361987]
Epoch 13 : 3 errors [ 4.95719084 -2.27378196]
Epoch 14 : 3 errors [ 4.97333342 -2.2817745 ]
Epoch 15 : 3 errors [ 4.98604723 -2.2880705 ]