Credit card fraud detection 2 – using Restricted Boltzmann Machine in TensorFlow

In my previous post, I have demo-ed how to use Autoencoder for credit card fraud detection and achieved an AUC score of 0.94. This time, I will be exploring another model – Restricted Boltzmann Machine – as well as its detailed implementation and results in tensorflow.

RBM was one of the earliest models introduced in the world of deep learning. There have been many successful use cases of RBM in areas such as dimensionality reduction, classification, collaborative filtering, feature learning as well as anomaly detection.

In this tutorial, we will also use the unsupervised way – without feeding labels to model – to train on our data, and will achieve a slightly improved result over auto-encoder!

The whole post is divided into three parts below:

  1. Introduction to RBM
  2. Implementation in TensorFlow 
  3. Results and interpretation 

All codes can be found in github.

Overview of the data set

I will be using the exact same credit card data here. So please go to my previous post if you would like to see more about it. Also, you can download the data from Kaggle here if you want.

RBM – a quick introduction

There are many good online resources that offer either brief or in-depth explanation of RBM:

  1. https://www.youtube.com/watch?v=FJ0z3Ubagt4 & https://www.youtube.com/watch?v=p4Vh_zMw-HQ – Best youtube explanations to RBM in my opinion.
  2. https://deeplearning4j.org/restrictedboltzmannmachine – Gives a very intuitive and easy-to-understand ways of RBM.
  3. http://deeplearning.net/tutorial/rbm.html – Theories + Theano implementation

Basically, a RBM is network consists of two layers – the Visible Layer and Hidden layer. There are symmetric connections between any pair of nodes from Visible and Hidden layers, and no connections within each layer.

For majority cases, both hidden and visible layers are binary-valued. There are also extensions with visible layer being Gaussian, and hidden layer being Bernoulli. The latter will be our case for fraud detection. (Since our input data will be normalized into mean of 0 and std of 1)

Screen Shot 2017-07-27 at 5.25.15 PM.png

When signals propagate from visible to hidden, the input layer (i.e. the data sample) will be multiplied by the matrix W, added with bias vector b of hidden layer, and finally go through the sigmoid function to be squashed to be within 0 and 1, which are also the probabilities for each hidden neuron to be on. However, it is very important to keep the hidden states binary (0 or 1 for each), rather than using the probabilities itself*. And only during the last update of the gibbs sampling should we use probabilities for hidden layer, which we will talk about later on.

During backward pass, or reconstruction, the hidden layer activation will become the input, which is multiplied by the same matrix W, added with visible biases, and then will either go through the sigmoid function (for Bernoulli visible), or being sampled from a multivariate Gaussian distribution (for Gaussian visible), as below:

Capture.PNG

Intuitively, we can understand that the model is adjusting its weights, during training, such that it could best approximate the training data distribution p with its reconstruction distribution q, as below:

Capture.PNG

We first define our so-called Energy Function E(x, h)as well as the joint probability p(x, h)given any pair of visible and hidden layers, as below:

Capture.PNG

In the above equations, x is the visible layer, h is the hidden layer activations, W, b and c are the matrix, hidden bias and visible bias, respectively.

So basically, for any pair of x and h, we are able to calculate E(x, h). Its value is a scalar. And the higher the value of Energy, the lower the p(x, h).

1) How to detect fraud with RBM?

From the energy function, we are able to derive the equation below for so-called Free Energy of x:

Capture.PNG

This Free Energy, F(x), which is also a scalar, is exactly what we will need for test data, from which we will use the distribution to detect anomalies. The higher the free energy, the higher the chance of x being a fraud. 

2) How to update model parameters?

To update our parameters W, b and c, we use below equations combined with SGD (the alpha is learning rate):

Capture.PNG

The left part, h(x^{(t)})x^{(t)^T}, is easy to calculate, as x(t) is just the training sample, and h(x(t)) is simply below:

Capture.PNG

So the outcome is simply a vector multiplication which gives same shape as W.

Okay, that’s easy! But how to calculate h(\widetilde{x})\widetilde{x}^{T}? The answer is to use gibbs sampling:

Capture.PNG

We start with a training sample x(t), and sample each hj by first calculating:

Capture.PNG

Then we draw a value from a uniform distribution [0, 1], and if the drawn value is smaller than the one calculated above, we assign 1 to hj, otherwise we assign 0. And then we do this for each h

Next, we sample visible layer xk, by using previously sampled hidden layer as input, with a similar equation below:

Capture.PNG

Here, we directly use the results from sigmoid , without sampling it into states of 0 and 1, to get visible layer.

However, we will do this step slightly different if the input data, or visible layer x, is a Gaussian distribution. If Gaussian, we will sample the x vector using Gaussian with mean μ = c + WTh as well as identity covariance matrix. This part is fully implemented in the code where you can check for verification (under equation sample_visible_from_hidden).

We will do this k steps, which is referred to as Contrastive Divergence, or CDk. 

After last step k, we will use the sampled visible layer as \widetilde{x}, together with the last hidden probabilities as h(\widetilde{x}). Please note that what we are using here is the probabilities, not the sampled 0 and 1 states, for h(\widetilde{x})

In summary, the whole process looks like this:

  • Start with training sample x – x^{(t)}
  • Sample h from input x            – h(x^{(t)})
  • Sample x from h
  • Sample h from x
  • Sample x from h                       – \widetilde{x}
  • Sample h from x                       – h(\widetilde{x})

In practice, using k = 1 can give a good result.

Until here, we have covered the whole process of model updating.

3) How to tune hyper-parameters?

We will stick to data-driven approach.

Split the data into training and validation sets, and train the model on training set, while evaluate the performance on validation.

Start the hidden layer with a smaller dimension than input layer(e.g. 5, 10), set the learning rate to be a small value (such as 0.001), monitor the validation data set reconstruction error (not the actual error against labels)

The reconstruction error is basically the mean squared of the difference between predicted \widetilde{x} and the actual data x, averaged over the entire mini-batch. 

If the reconstruction error stops decreasing, that would be a sign for early-stopping.

However, a comprehensive guide to tuning RBM is fully covered in Geoffrey Hinton’s notes,  which you are encouraged to take a look at.

Coding the RBM

The code was modified from here, which was an excellent implementation in TensorFlow. I only added in a few changes to its implementation:

  1. Implemented Momentum for faster convergence.
  2. Added in L2 regularisation.
  3. Added in methods for retrieving Free Energy as well as Reconstruction Error in validation data.
  4. Simplified the code a bit by removing parts of tf summaries (originally not compatible with tf version 1.1 above)
  5. Added in a bit utilities such as plotting training loss

Basically, the code is a sklearn – style RBM class that you can directly use to train and predict.

Training and Results

1) Training and validation

We split our data by transaction time into training and validation by 50 – 50, and train our model on the training set.

TEST_RATIO = 0.50

df.sort_values('Time', inplace = True)
TRA_INDEX = int((1-TEST_RATIO) * df.shape[0])
train_x = df.iloc[:TRA_INDEX, 1:-2].values
train_y = df.iloc[:TRA_INDEX, -1].values

test_x = df.iloc[TRA_INDEX:, 1:-2].values
test_y = df.iloc[TRA_INDEX:, -1].values

After we train the model, we will calculate the Free Energy of val set, and visualize the distributions for both fraud as well as non-fraud samples.

2) Data pre-processing 

Since the data are already PCA transformed, we will only need to standardize them with z-score to get mean of 0 and standard deviation of 1, as below:

cols_mean = []
cols_std = []
for c in range(train_x.shape[1]):
   cols_mean.append(train_x[:,c].mean())
   cols_std.append(train_x[:,c].std())
   train_x[:, c] = (train_x[:, c] - cols_mean[-1]) / cols_std[-1]
   test_x[:, c] = (test_x[:, c] - cols_mean[-1]) / cols_std[-1]

Please note that you need to calculate statistics using training set only, as I did above, instead of on the full data set (training and val combined).

After that, we will fit the data using model with Gaussian visible layer. (This is the Gaussian – Bernoulli RBM, since the hidden layer is still binary-valued)

3) Visualization of results

It is clearly seen that fraud data has Free Energy much more uniformly distributed than non-fraud data.

hist_non_fraud

hist_fraud.png

If you calculate the AUC score (Area Under the ROC Curve) on val set, you will get a score around 0.96!

4) Real time application

To enable it as real time fraud detector, we will need to find a threshold based on validation data set. This can be done by trading off the precision & recall curves as below (e.g. a value of 100 might give a relative good balance):

precision_2recall_2

5) A further interpretation at the val AUC score of 0.96

There’s another way to intuitively look at the AUC score:

The val data’s fraud percentage is around 0.16%;

For example, if we choose top 500 transactions ranked by model’s free energy predictions, the number of fraud is around 11.82% …

So, precision increases from 0.16% to 11.82% in the top 500 …

Again, please find in github for details on code implementation and notebook.

 

*References

 

Exercises: 

Try to use Reconstruction Error in place of Free Energy as fraud score, and compare the result to using Free Energy. This is similar to what I did previously for autoencoder tutorial. 

 

 

 

 

10 thoughts on “Credit card fraud detection 2 – using Restricted Boltzmann Machine in TensorFlow

  1. The visualization shows that normal energy and fraud energry are different. However, it does not make sense in case unsupervised learning because you make the energy distribution graph by using label data column test_y ==1. So, If we don’t use test_y, how can we decide a cut off point of fraud and normal energy?

    Like

    1. You are supposed to collect validation data for threshold selection. In reality, you can also work with your fraud ops team to create feedback loop for the task.

      Like

  2. Thanks for the article and sample code. Can I re-use your code? I understand that your code was derived from Gabriele Angeletti’s .gist table { margin-bottom: 0; } import tensorflow as tf import numpy as np import os import zconfig import utils class RBM(object): """ Restricted Boltzmann Machine implementation using TensorFlow. The interface of the class is sklearn-like. """ def __init__(self, num_visible, num_hidden, visible_unit_type='bin', main_dir='rbm', model_name='rbm_model', gibbs_sampling_steps=1, learning_rate=0.01, batch_size=10, num_epochs=10, stddev=0.1, verbose=0): """ :param num_visible: number of visible units :param num_hidden: number of hidden units :param visible_unit_type: type of the visible units (binary or gaussian) :param main_dir: main directory to put the models, data and summary directories :param model_name: name of the model, used to save data :param gibbs_sampling_steps: optional, default 1 :param learning_rate: optional, default 0.01 :param batch_size: optional, default 10 :param num_epochs: optional, default 10 :param stddev: optional, default 0.1. Ignored if visible_unit_type is not 'gauss' :param verbose: level of verbosity. optional, default 0 """ self.num_visible = num_visible self.num_hidden = num_hidden self.visible_unit_type = visible_unit_type self.main_dir = main_dir self.model_name = model_name self.gibbs_sampling_steps = gibbs_sampling_steps self.learning_rate = learning_rate self.batch_size = batch_size self.num_epochs = num_epochs self.stddev = stddev self.verbose = verbose self.models_dir, self.data_dir, self.summary_dir = self._create_data_directories() self.model_path = self.models_dir + self.model_name self.W = None self.bh_ = None self.bv_ = None self.w_upd8 = None self.bh_upd8 = None self.bv_upd8 = None self.encode = None self.loss_function = None self.input_data = None self.hrand = None self.vrand = None self.validation_size = None self.tf_merged_summaries = None self.tf_summary_writer = None self.tf_session = None self.tf_saver = None def fit(self, train_set, validation_set=None, restore_previous_model=False): """ Fit the model to the training data. :param train_set: training set :param validation_set: validation set. optional, default None :param restore_previous_model: if true, a previous trained model with the same name of this model is restored from disk to continue training. :return: self """ if validation_set is not None: self.validation_size = validation_set.shape[0] self._build_model() with tf.Session() as self.tf_session: self._initialize_tf_utilities_and_ops(restore_previous_model) self._train_model(train_set, validation_set) self.tf_saver.save(self.tf_session, self.model_path) def _initialize_tf_utilities_and_ops(self, restore_previous_model): """ Initialize TensorFlow operations: summaries, init operations, saver, summary_writer. Restore a previously trained model if the flag restore_previous_model is true. """ self.tf_merged_summaries = tf.merge_all_summaries() init_op = tf.initialize_all_variables() self.tf_saver = tf.train.Saver() self.tf_session.run(init_op) if restore_previous_model: self.tf_saver.restore(self.tf_session, self.model_path) self.tf_summary_writer = tf.train.SummaryWriter(self.summary_dir, self.tf_session.graph_def) def _train_model(self, train_set, validation_set): """ Train the model. :param train_set: training set :param validation_set: validation set. optional, default None :return: self """ for i in range(self.num_epochs): self._run_train_step(train_set) if validation_set is not None: self._run_validation_error_and_summaries(i, validation_set) def _run_train_step(self, train_set): """ Run a training step. A training step is made by randomly shuffling the training set, divide into batches and run the variable update nodes for each batch. :param train_set: training set :return: self """ np.random.shuffle(train_set) batches = [_ for _ in utils.gen_batches(train_set, self.batch_size)] updates = [self.w_upd8, self.bh_upd8, self.bv_upd8] for batch in batches: self.tf_session.run(updates, feed_dict=self._create_feed_dict(batch)) def _run_validation_error_and_summaries(self, epoch, validation_set): """ Run the summaries and error computation on the validation set. :param epoch: current epoch :param validation_set: validation data :return: self """ result = self.tf_session.run([self.tf_merged_summaries, self.loss_function], feed_dict=self._create_feed_dict(validation_set)) summary_str = result[0] err = result[1] self.tf_summary_writer.add_summary(summary_str, 1) if self.verbose == 1: print("Validation cost at step %s: %s" % (epoch, err)) def _create_feed_dict(self, data): """ Create the dictionary of data to feed to TensorFlow's session during training. :param data: training/validation set batch :return: dictionary(self.input_data: data, self.hrand: random_uniform, self.vrand: random_uniform) """ return { self.input_data: data, self.hrand: np.random.rand(data.shape[0], self.num_hidden), self.vrand: np.random.rand(data.shape[0], self.num_visible) } def _build_model(self): """ Build the Restricted Boltzmann Machine model in TensorFlow. :return: self """ self.input_data, self.hrand, self.vrand = self._create_placeholders() self.W, self.bh_, self.bv_ = self._create_variables() hprobs0, hstates0, vprobs, hprobs1, hstates1 = self.gibbs_sampling_step(self.input_data) positive = self.compute_positive_association(self.input_data, hprobs0, hstates0) nn_input = vprobs for step in range(self.gibbs_sampling_steps – 1): hprobs, hstates, vprobs, hprobs1, hstates1 = self.gibbs_sampling_step(nn_input) nn_input = vprobs negative = tf.matmul(tf.transpose(vprobs), hprobs1) self.encode = hprobs1 # encoded data, used by the transform method self.w_upd8 = self.W.assign_add(self.learning_rate * (positive – negative)) self.bh_upd8 = self.bh_.assign_add(self.learning_rate * tf.reduce_mean(hprobs0 – hprobs1, 0)) self.bv_upd8 = self.bv_.assign_add(self.learning_rate * tf.reduce_mean(self.input_data – vprobs, 0)) self.loss_function = tf.sqrt(tf.reduce_mean(tf.square(self.input_data – vprobs))) _ = tf.scalar_summary("cost", self.loss_function) def _create_placeholders(self): """ Create the TensorFlow placeholders for the model. :return: tuple(input(shape(None, num_visible)), hrand(shape(None, num_hidden)) vrand(shape(None, num_visible))) """ x = tf.placeholder('float', [None, self.num_visible], name='x-input') hrand = tf.placeholder('float', [None, self.num_hidden], name='hrand') vrand = tf.placeholder('float', [None, self.num_visible], name='vrand') return x, hrand, vrand def _create_variables(self): """ Create the TensorFlow variables for the model. :return: tuple(weights(shape(num_visible, num_hidden), hidden bias(shape(num_hidden)), visible bias(shape(num_visible))) """ W = tf.Variable(tf.random_normal((self.num_visible, self.num_hidden), mean=0.0, stddev=0.01), name='weights') bh_ = tf.Variable(tf.zeros([self.num_hidden]), name='hidden-bias') bv_ = tf.Variable(tf.zeros([self.num_visible]), name='visible-bias') return W, bh_, bv_ def gibbs_sampling_step(self, visible): """ Performs one step of gibbs sampling. :param visible: activations of the visible units :return: tuple(hidden probs, hidden states, visible probs, new hidden probs, new hidden states) """ hprobs, hstates = self.sample_hidden_from_visible(visible) vprobs = self.sample_visible_from_hidden(hprobs) hprobs1, hstates1 = self.sample_hidden_from_visible(vprobs) return hprobs, hstates, vprobs, hprobs1, hstates1 def sample_hidden_from_visible(self, visible): """ Sample the hidden units from the visible units. This is the Positive phase of the Contrastive Divergence algorithm. :param visible: activations of the visible units :return: tuple(hidden probabilities, hidden binary states) """ hprobs = tf.nn.sigmoid(tf.matmul(visible, self.W) + self.bh_) hstates = utils.sample_prob(hprobs, self.hrand) return hprobs, hstates def sample_visible_from_hidden(self, hidden): """ Sample the visible units from the hidden units. This is the Negative phase of the Contrastive Divergence algorithm. :param hidden: activations of the hidden units :return: visible probabilities """ visible_activation = tf.matmul(hidden, tf.transpose(self.W)) + self.bv_ if self.visible_unit_type == 'bin': vprobs = tf.nn.sigmoid(visible_activation) elif self.visible_unit_type == 'gauss': vprobs = tf.truncated_normal((1, self.num_visible), mean=visible_activation, stddev=self.stddev) else: vprobs = None return vprobs def compute_positive_association(self, visible, hidden_probs, hidden_states): """ Compute positive associations between visible and hidden units. :param visible: visible units :param hidden_probs: hidden units probabilities :param hidden_states: hidden units states :return: positive association = dot(visible.T, hidden) """ if self.visible_unit_type == 'bin': positive = tf.matmul(tf.transpose(visible), hidden_states) elif self.visible_unit_type == 'gauss': positive = tf.matmul(tf.transpose(visible), hidden_probs) else: positive = None return positive def _create_data_directories(self): """ Create the three directories for storing respectively the models, the data generated by training and the TensorFlow's summaries. :return: tuple of strings(models_dir, data_dir, summary_dir) """ self.main_dir = self.main_dir + '/' if self.main_dir[–1] != '/' else self.main_dir models_dir = config.models_dir + self.main_dir data_dir = config.data_dir + self.main_dir summary_dir = config.summary_dir + self.main_dir for d in [models_dir, data_dir, summary_dir]: if not os.path.isdir(d): os.mkdir(d) return models_dir, data_dir, summary_dir def transform(self, data, name='train', save=False): """ Transform data according to the model. :type data: array_like :param data: Data to transform :type name: string, default 'train' :param name: Identifier for the data that is being encoded :type save: boolean, default 'False' :param save: If true, save data to disk :return: transformed data """ with tf.Session() as self.tf_session: self.tf_saver.restore(self.tf_session, self.model_path) encoded_data = self.encode.eval(self._create_feed_dict(data)) if save: np.save(self.data_dir + self.model_name + '-' + name, encoded_data) return encoded_data def load_model(self, shape, gibbs_sampling_steps, model_path): """ Load a trained model from disk. The shape of the model (num_visible, num_hidden) and the number of gibbs sampling steps must be known in order to restore the model. :param shape: tuple(num_visible, num_hidden) :param gibbs_sampling_steps: :param model_path: :return: self """ self.num_visible, self.num_hidden = shape[0], shape[1] self.gibbs_sampling_steps = gibbs_sampling_steps self._build_model() init_op = tf.initialize_all_variables() self.tf_saver = tf.train.Saver() with tf.Session() as self.tf_session: self.tf_session.run(init_op) self.tf_saver.restore(self.tf_session, model_path) def get_model_parameters(self): """ Return the model parameters in the form of numpy arrays. :return: model parameters """ with tf.Session() as self.tf_session: self.tf_saver.restore(self.tf_session, self.model_path) return { 'W': self.W.eval(), 'bh_': self.bh_.eval(), 'bv_': self.bv_.eval() } def get_weights_as_images(self, width, height, outdir='img/', n_images=10, img_type='grey'): """ Create and save the weights of the hidden units with respect to the visible units as images. :param width: :param height: :param outdir: :param n_images: :param img_type: :return: self """ outdir = self.data_dir + outdir with tf.Session() as self.tf_session: self.tf_saver.restore(self.tf_session, self.model_path) weights = self.W.eval() perm = np.random.permutation(self.num_hidden)[:n_images] for p in perm: w = np.array([i[p] for i in weights]) image_path = outdir + self.model_name + '_{}.png'.format(p) utils.gen_image(w, width, height, image_path, img_type) view raw rbm_after_refactor.py hosted with ❤ by GitHub from tensorflow.python.framework import ops import tensorflow as tf import numpy as np import os import zconfig import utils class RBM(object): """ Restricted Boltzmann Machine implementation using TensorFlow. The interface of the class is sklearn-like. """ def __init__(self, nvis, nhid, vis_type='bin', directory_name='rbm', model_name='', gibbs_k=1, learning_rate=0.01, batch_size=10, n_iter=10, stddev=0.1, verbose=0): self.nvis = nvis self.nhid = nhid self.vis_type = vis_type self.directory_name = directory_name self.model_name = model_name self.gibbs_k = gibbs_k self.learning_rate = learning_rate self.batch_size = batch_size self.n_iter = n_iter self.stddev = stddev self.verbose = verbose # Directories paths self.directory_name = self.directory_name + '/' if self.directory_name[–1] != '/' else self.directory_name self.models_dir = config.models_dir + self.directory_name self.data_dir = config.data_dir + self.directory_name self.summary_dir = config.summary_dir + self.directory_name # Create dirs for d in [self.models_dir, self.data_dir, self.summary_dir]: if not os.path.isdir(d): os.mkdir(d) if self.model_name == '': # Assign model complete name self.model_name = 'rbm-{}-{}-{}-{}-{}-{}'.format( self.nvis, self.nhid, self.n_iter, self.batch_size, self.learning_rate, self.batch_size) # ############################# # # Computational graph nodes # # ############################# # # Model parameters self.W = None self.bh_ = None self.bv_ = None self.w_upd8 = None self.bh_upd8 = None self.bv_upd8 = None self.encode = None self.cost = None self.hrand = None self.vrand = None self.validation_size = None self.sess = None self.saver = None def _create_graph(self): # Symbolic variables self.x = tf.placeholder('float', [None, self.nvis], name='x-input') self.hrand = tf.placeholder('float', [None, self.nhid], name='hrand') self.vrand = tf.placeholder('float', [None, self.nvis], name='vrand-train') # Biases self.bh_ = tf.Variable(tf.zeros([self.nhid]), name='hidden-bias') self.bv_ = tf.Variable(tf.zeros([self.nvis]), name='visible-bias') self.W = tf.Variable(tf.random_normal((self.nvis, self.nhid), mean=0.0, stddev=0.01), name='weights') nn_input = self.x # Initialization hprobs0 = None hprobs = None positive = None vprobs = None hprobs1 = None hstates1 = None for step in range(self.gibbs_k): # Positive Contrastive Divergence phase hprobs = tf.nn.sigmoid(tf.matmul(nn_input, self.W) + self.bh_) hstates = utils.sample_prob(hprobs, self.hrand) # Compute positive associations in step 0 if step == 0: hprobs0 = hprobs # save the activation probabilities of the first step if self.vis_type == 'bin': positive = tf.matmul(tf.transpose(nn_input), hstates) elif self.vis_type == 'gauss': positive = tf.matmul(tf.transpose(nn_input), hprobs) # Negative Contrastive Divergence phase visible_activation = tf.matmul(hprobs, tf.transpose(self.W)) + self.bv_ if self.vis_type == 'bin': vprobs = tf.nn.sigmoid(visible_activation) elif self.vis_type == 'gauss': vprobs = tf.truncated_normal((1, self.nvis), mean=visible_activation, stddev=self.stddev) # Sample again from the hidden units hprobs1 = tf.nn.sigmoid(tf.matmul(vprobs, self.W) + self.bh_) hstates1 = utils.sample_prob(hprobs1, self.hrand) # Use the reconstructed visible units as input for the next step nn_input = vprobs negative = tf.matmul(tf.transpose(vprobs), hprobs1) self.encode = hprobs # encoded data self.w_upd8 = self.W.assign_add(self.learning_rate * (positive – negative)) self.bh_upd8 = self.bh_.assign_add(self.learning_rate * tf.reduce_mean(hprobs0 – hprobs1, 0)) self.bv_upd8 = self.bv_.assign_add(self.learning_rate * tf.reduce_mean(self.x – vprobs, 0)) self.cost = tf.sqrt(tf.reduce_mean(tf.square(self.x – vprobs))) _ = tf.scalar_summary("cost", self.cost) def fit(self, trX, vlX=None, restore_previous_model=False): if vlX is not None: self.validation_size = vlX.shape[0] # Reset tensorflow's default graph ops.reset_default_graph() self._create_graph() merged = tf.merge_all_summaries() init_op = tf.initialize_all_variables() self.saver = tf.train.Saver() with tf.Session() as self.sess: self.sess.run(init_op) if restore_previous_model: # Restore previous model self.saver.restore(self.sess, self.models_dir + self.model_name) # Change model name self.model_name += '-restored{}'.format(self.n_iter) # Write tensorflow summaries to summary dir writer = tf.train.SummaryWriter(self.summary_dir, self.sess.graph_def) for i in range(self.n_iter): # Randomly shuffle the input np.random.shuffle(trX) batches = [_ for _ in utils.gen_batches(trX, self.batch_size)] for batch in batches: self.sess.run([self.w_upd8, self.bh_upd8, self.bv_upd8], feed_dict={self.x: batch, self.hrand: np.random.rand(batch.shape[0], self.nhid), self.vrand: np.random.rand(batch.shape[0], self.nvis)}) if i % 5 == 0: # Record summary data if vlX is not None: feed = {self.x: vlX, self.hrand: np.random.rand(self.validation_size, self.nhid), self.vrand: np.random.rand(self.validation_size, self.nvis)} result = self.sess.run([merged, self.cost], feed_dict=feed) summary_str = result[0] err = result[1] writer.add_summary(summary_str, 1) if self.verbose == 1: print("Validation cost at step %s: %s" % (i, err)) # Save trained model self.saver.save(self.sess, self.models_dir + self.model_name) def transform(self, data, name='train', gibbs_k=1, save=False, models_dir=''): """ Transform data according to the model. :type data: array_like :param data: Data to transform :type name: string, default 'train' :param name: Identifier for the data that is being encoded :type gibbs_k: 1 :param gibbs_k: Gibbs sampling steps :type save: boolean, default 'False' :param save: If true, save data to disk :return: transformed data """ with tf.Session() as self.sess: # Restore trained model self.saver.restore(self.sess, self.models_dir + self.model_name) # Return the output of the encoding layer encoded_data = self.encode.eval({self.x: data, self.hrand: np.random.rand(data.shape[0], self.nhid), self.vrand: np.random.rand(data.shape[0], self.nvis)}) if save: # Save transformation to output file np.save(self.data_dir + self.model_name + '-' + name, encoded_data) return encoded_data def load_model(self, shape, gibbs_k, model_path): """ :param shape: tuple(nvis, nhid) :param model_path: :return: """ self.nvis, self.nhid = shape[0], shape[1] self.gibbs_k = gibbs_k self._create_graph() # Initialize variables init_op = tf.initialize_all_variables() # Add ops to save and restore all the variables self.saver = tf.train.Saver() with tf.Session() as self.sess: self.sess.run(init_op) # Restore previous model self.saver.restore(self.sess, model_path) def get_model_parameters(self): """ Return the model parameters in the form of numpy arrays. :return: model parameters """ with tf.Session() as self.sess: # Restore trained model self.saver.restore(self.sess, self.models_dir + self.model_name) return { 'W': self.W.eval(), 'bh_': self.bh_.eval(), 'bv_': self.bv_.eval() } def get_weights_as_images(self, width, height, outdir='img/', n_images=10, img_type='grey'): outdir = self.data_dir + outdir with tf.Session() as self.sess: self.saver.restore(self.sess, self.models_dir + self.model_name) weights = self.W.eval() perm = np.random.permutation(self.nhid)[:n_images] for p in perm: w = np.array([i[p] for i in weights]) image_path = outdir + self.model_name + '_{}.png'.format(p) utils.gen_image(w, width, height, image_path, img_type) view raw rbm_before_refactor.py hosted with ❤ by GitHub from scipy import misc import tensorflow as tf import numpy as np def sample_prob(probs, rand): """ Takes a tensor of probabilities (as from a sigmoidal activation) and samples from all the distributions :param probs: tensor of probabilities :param rand: tensor (of the same shape as probs) of random values :return : binary sample of probabilities """ return tf.nn.relu(tf.sign(probs – rand)) def gen_batches(data, batch_size): """ Divide input data into batches. :param data: input data :param batch_size: size of each batch :return: data divided into batches """ data = np.array(data) for i in range(0, data.shape[0], batch_size): yield data[i:i+batch_size] def gen_image(img, width, height, outfile, img_type='grey'): assert len(img) == width * height or len(img) == width * height * 3 if img_type == 'grey': misc.imsave(outfile, img.reshape(width, height)) elif img_type == 'color': misc.imsave(outfile, img.reshape(3, width, height)) view raw utils.py hosted with ❤ by GitHub models_dir = 'models/' # dir to save/restore models data_dir = 'data/' # directory to store algorithm data summary_dir = 'logs/' # directory to store tensorflow summaries view raw zconfig.py hosted with ❤ by GitHub or https://github.com/blackecho/Deep-Learning-TensorFlow Can you confirm that it is also released under the MIT license? LikeLike
  3. Why do you compute positive = tf.matmul(tf.transpose(visible), hidden_states) if visible_unit is binary. I think the right formula is positive=tf.matmul(tf.transpose(visible), hidden_probs).

    Like

  4. Hey, do you know how you would code in Persistent Contrast Divergence instead of Contrast Divergence?

    Like

  5. Just wondering – is the implementation of CD actually PCD ?

    Like

  6. How do you calculate ROC or AUC when the test_cost only provides the free energy and not a 0 or 1 classification??

    this is the exact code line
    “fpr, tpr, _ = roc_curve(test_y, test_cost)”

    where
    test_cost = model.getFreeEnergy(test_x).reshape(-1)

    can you explain to me in this case how do you calculate the FPR and such ??

    Like

    1. To calculate AUC you only need scores / predicted probabilities (test_cost) so that you can rank the data. And you will need ground truth binary labels (0 or 1) which is provided in test_y. You can’t convert test_cost into 0 or 1 to calculate AUC, you need the raw scores. Please refer to online / Google for AUC calculation details.

      Like

  7. Is Free Energy as reconstruction error when I use RBM to anomaly detection?
    Can I use MSE(prediction values – true values) as reconstruction error when I use RBM to anomaly detection?
    Thank you

    Like

Leave a Reply to Ajit Kirpekar Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close