Logistic Regression with SGD in 2 dimensions¶

Here are some training data points with labels, and a weight vector w initialized to all-zeros.

import numpy as np

# load data
Xy = np.loadtxt('synthetic.txt')
X = Xy[:, :-1]
y = Xy[:, -1]
# split into train and test
ttsplit = int(y.size*.8)
trainX = X[:ttsplit, :]
trainy = y[:ttsplit]
testX = X[ttsplit:, :]
testy = y[ttsplit:]

import matplotlib.pyplot as plt
%matplotlib inline
from demofuncs import *

plot_labeled_data(trainX, trainy)

Run a perceptron learner (the basic one without bias, etc) on this training data, and measure the accuracy on the test set.

Accuracy is measured in two ways: raw, and mean squared, which is

$$\dfrac{1}{p}\sum_{i=1}^p (y-h_\theta(x^{(i)}))^2$$

def perceptron_train(X, y, maxiter): 
    num_points, num_dims = X.shape
    w = np.zeros(num_dims)
    for epoch in range(maxiter):
        num_errors = 0
        for i in range(num_points):
            if y[i]*w.dot(X[i])<=0:
                w += y[i]*X[i]
                num_errors +=1
        print 'Epoch', epoch, ':', num_errors, 'errors', w
        
        if epoch%1==0:
            plot_labeled_data(X, y, w)

        if num_errors==0:
            break
    return w

w = perceptron_train(trainX, trainy, 20)
meansq_acc, acc = score(testX, testy, w)
print 'test accuracy: mean square=', meansq_acc, 'raw=', acc

Epoch 0 : 8 errors [ 1.41675788 -0.65539514]

Epoch 1 : 1 errors [ 1.3890965  -0.71399474]

Epoch 2 : 0 errors [ 1.3890965  -0.71399474]

test accuracy: mean square= 0.869028557654 raw= 1.0

Now learn the hyperplane with logistic regression (stochastic gradient descent, eta=0.8):

w = logreg_train(trainX, trainy, 20)
meansq_acc, acc = score(testX, testy, w)
print 'test accuracy: mean square=', meansq_acc, 'raw=', acc

Epoch 0 : 4 errors [ 3.20080905 -1.43296741]

Epoch 1 : 2 errors [ 4.52455114 -2.06947624]

Epoch 2 : 2 errors [ 5.44081693 -2.51873752]

Epoch 3 : 2 errors [ 6.16232983 -2.87569646]

Epoch 4 : 2 errors [ 6.7650757  -3.17627077]

Epoch 5 : 2 errors [ 7.2862502  -3.43811726]

Epoch 6 : 2 errors [ 7.74727984 -3.67137002]

Epoch 7 : 2 errors [ 8.16181289 -3.88246521]

Epoch 8 : 2 errors [ 8.53916045 -4.07578737]

Epoch 9 : 2 errors [ 8.88599954 -4.25447964]

The above experiment shows that logistic regression is better than the perceptron at finding a hyperplane that separates the test points well (where "well" means having a good mean square accuracy).

What happens when the data is not linearly separable?

nonlinX = np.vstack((trainX, [1, -0.5]))
nonliny = np.append(trainy, -1)

plot_labeled_data(nonlinX, nonliny, [0, 0])

w = perceptron_train(nonlinX, nonliny, 20)
meansq_acc, acc = score(testX, testy, w)
print 'test accuracy: mean square=', meansq_acc, 'raw=', acc

Epoch 0 : 9 errors [ 0.41675788 -0.15539514]

Epoch 1 : 2 errors [-0.6109035   0.28600526]

Epoch 2 : 5 errors [-0.1209306   0.04633054]

Epoch 3 : 9 errors [ 0.68191618 -0.63870174]

Epoch 4 : 8 errors [ 1.00171985  0.22253424]

Epoch 5 : 7 errors [ 0.66498973 -0.06809072]

Epoch 6 : 8 errors [ 0.28041934 -0.1464825 ]

Epoch 7 : 1 errors [-0.71958066  0.3535175 ]

Epoch 8 : 5 errors [-0.22960775  0.11384278]

Epoch 9 : 9 errors [ 0.57323903 -0.5711895 ]

w = logreg_train(nonlinX, nonliny, 20)
meansq_acc, acc = score(testX, testy, w)
print 'test accuracy: mean square=', meansq_acc, 'raw=', acc

Epoch 0 : 5 errors [ 2.41641448 -1.04077012]

Epoch 1 : 3 errors [ 3.27860938 -1.45031627]

Epoch 2 : 3 errors [ 3.77223826 -1.69079293]

Epoch 3 : 3 errors [ 4.09692967 -1.84988036]

Epoch 4 : 3 errors [ 4.32470561 -1.96176911]

Epoch 5 : 3 errors [ 4.49047721 -2.04334696]

Epoch 6 : 3 errors [ 4.61397604 -2.10420861]

Epoch 7 : 3 errors [ 4.70745113 -2.15032678]

Epoch 8 : 3 errors [ 4.77899812 -2.18565835]

Epoch 9 : 3 errors [ 4.83420882 -2.21294241]