(Part 1 of a series on logic gates)
Theano is a powerful Python library that provides some useful tools for machine learning, such as GPU training and symbolic differentiation of the cost function during gradient descent.

It can be a bit challenging to understand how Theano works, so before jumping into more complex non-linear models, we can get to grips with Theano by implementing something simple like an OR gate.

An OR gate receives 2 inputs and will output true if either of the inputs are true. So, there are 3 cases where an OR gate will output a true value:

Input 1 Input 2 Output
0 0 0
0 1 1
1 0 1
1 1 1

We can also represent the problem visually:

The goal is to create a model that receives 2 inputs and outputs 1 value. The model will learn a linear separator that can split the 2 output categories. Looking at the plot above, the red line is an ideal separator as it maximizes the margins between the categories. However, the blue line can also work as a separator, with one of the classes falling directly on it.

I won’t go into installing/setting up Theano here as there are many good guides on the topic (see here and here). However, in order to utilize a GPU during training we need to include the device flag and set float values to 32bit in the .theanorc file (typically found in your home directory):

We’ll begin by importing some modules:

Next, we provide the training examples and the correct output labels. There are only 4 possible input combinations:

Set the learning rate and number of training iterations for batch gradient descent:

In Theano, we first have to define symbols that represent each variable (x and y) and their type (matrix and vector). b is a shared variable used by multiple functions and contains the model bias value:

Next, we need to randomly initialize the weights. We do this by creating a numpy array with dimensions (2,1) containing random values sampled from a uniform distribution. The data type is set to float32 as defined in the .theanorc config file.

We can (optionally) set a random seed for reproducible results:

Next, we have to define expressions that tell Theano how to evaluate things like the hypothesis and cost values using the symbols/tensors we defined earlier.

For example, the first line below calculates the dot product of the variables (x) and weights (w), adds the bias (b) term, and wraps the result in a sigmoid activation function. This is basically a logistic regression model.

It is important to note that no values are actually calculated at this stage. We are simply telling Theano how these values are calculated.

We’ll use binary cross entropy as the cost function. One advantage of Theano is that it can differentiate the function for us automatically.

The update_rules are used during gradient descent and tells Theano how to adjust the w and b values during back propagation. It is during this stage that we ask for the gradient (T.grad()) of the cost function with respect to the different parameters:

Now that we’ve defined expressions and Theano knows how to calculate various values, we need to create some functions that can make use of those expressions.

During training, we need to evaluate the hypothesis and cost expressions, so we set those as the outputs for the train function. The inputs are the non-shared symbols/tensors required by those expressions (x and y). We also tell the function how parameters should be updated by passing in our update_rules:

Once our expressions and functions are in place, training is pretty straightforward. We loop over a number of training_iterations, and within each loop we call the train() function and pass in the inputs and outputs we defined earlier.

This step is where the bulk of the work happens. The model parameters/weights are adjusted after each iteration, converging on values that provide the best linear separator for our 2 classes.

We can optionally append the cost of each iteration to a list so we can plot a training curve later:

Plotting a training curve is as simple as plotting the cost value after each iteration of training. The plot shows that gradient descent is converging correctly:

Finally, we can use the predict() function we defined earlier to test the accuracy of our trained model. For the following test data, an OR gate should return values of [1, 1, 1, 1]:

The full code can be found in my GitHub repo here

Leave a Reply


This site uses Akismet to reduce spam. Learn how your comment data is processed.

Notify of