Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Fair Transformer GAN class

A class that defines the Fair Transformer GAN model. Processed data from the Dataset object is passed to the Model object, which trains the model and generates less-biased data. The Metrics object then uses this generated data to calculate model performance.

The train folder contains the script to train the model via the command line.

Function Description
init Intializes instance of FairTransformerGAN class
loadData Load processed data created from Dataset class
buildAutoencoder Build autoencoder that encodes the input data
buildGenerator Build the generator for training
buildGeneratorTest Build the generator when generating new data
getDiscriminatorResults Calculate the discriminator predictions
buildDiscriminator Build the discriminator
print2file Print the training metrics to the log file
generateData Generate new data using the trained model
calculateDiscAuc Calculate discriminator AUC
calculateDiscAccuracy Calculate discriminator accuracy
calculateGenAccuracy Calculate generator accuracy
pair_rd Calculate total pairwise risk difference
calculateRD Calculate risk difference score across all protected attribute classes
calculateClassifierAccuracy Calculate classifier accuracy
calculateClassifierRD Calculate classifier risk difference score across all protected attribute classes
create_z_masks Calculate mask for each protected attribute class
train Train model

FairTransformerGAN

CLASS

FairTransformerGAN()

init

Method

__init__(dataType='binary',inputDim=58,embeddingDim=32,randomDim=32,generatorDims=(32, 32),discriminatorDims=(32, 16, 1),compressDims=(),decompressDims=(),bnDecay=0.99,l2scale=0.001,lambda_fair=1)

Initializes FairTransformerGAN model with given parameters. Based on MedGAN architecture.

Parameters

  • dataType [str]: specifies if the input data contains only binary (0, 1) values or continuous values
  • inputDim [int]: number of columns in the input data, not including the protected attribute and outcome columns
  • embeddingDim [int]: dimension size of the embedding, which will be generated by the generator
  • randomDim [int]: dimension size of the random noise, on which the generator is conditioned
  • generatorDims [tuple]: dimensions of the generator. Note that another layer of size “embeddingDim” is always added.
  • discriminatorDims [tuple]: dimensions of the discriminator
  • compressDims [tuple]: dimensions of the encoder part of the autoencoder. Note that another layer of size “embeddingDim” is always added. Therefore this can be a blank tuple.
  • decompressDims [tuple]: dimensions of the decoder part of the autoencoder. Note that another layer, whose size is equal to the dimension of the input data, is always added. Therefore this can be a blank tuple.
  • bnDecay [float]: decay value for the moving average used in Batch Normalization
  • l2scale [float]: L2 regularization coefficient for all weights
  • lambda_fair [float]: coefficient of the fair regularization term

Return type

None

loadData

Method

loadData(dataPath='')

Loads data from given path and splits it into train and validation sets.

Parameters

  • dataPath [str]: absolute path to processed numpy data file

Return type

  • trainX, validX, trainz, validz, trainy, validy [np.ndarray]: arrays of split train and validation data arrays of split train and validation data

buildAutoencoder

Method

buildAutoencoder(x_input)

Builds the autoencoder that encodes, compresses and then decompresses the input. Calculates the loss between the decompressed input and the original input.

Parameters

  • x_input [tf.Tensor]: tensor of x input data

Return type

  • loss [tf.Tensor]: float tensor loss between the decompressed input and the original x input
  • decodeVariables [dict]: variable that stores weights and biases of decompressed x input

buildGenerator

Method

buildGenerator(x_input, y_input, z_input, bn_train)

Builds the generator. Generates the x data given the y outcome and z protected attribute during training. Applies multi-head self-attention to x data using MultiHeadSelfAttention class.

Parameters

  • x_input [tf.Tensor]: tensor of x input data
  • y_input [tf.Tensor]: tensor of y outcome data
  • z_input [tf.Tensor]: tensor of z protected attribute
  • bn_train [tf.Tensor]: boolean tensor specifying whether we are in the training phase of generator

Return type

  • output [tf.Tensor]: generated x data

buildGeneratorTest

Method

buildGenerator(x_input, y_input, z_input, bn_train)

Builds the generator for post model training. Generates the x data given the y outcome and z protected attribute post model training. Applies multi-head self-attention to x data using MultiHeadSelfAttention class.

Parameters

  • x_input [tf.Tensor]: tensor of x input data
  • y_input [tf.Tensor]: tensor of y outcome data
  • z_input [tf.Tensor]: tensor of z protected attribute
  • bn_train [tf.Tensor]: boolean tensor specifying whether we are in the training phase of generator

Return type

  • output [tf.Tensor]: generated x data

getDiscriminatorResults

Method

getDiscriminatorResults(x_input, y_bool, keepRate, z_mask0, z_mask1, z_mask2, z_mask3, reuse=False)

Calculates the discriminator predictions.

Parameters

  • x_input [tf.Tensor]: tensor of x input data
  • y_bool [tf.Tensor]: boolean tensor representing the boolean value of outcome y
  • keepRate [float]: dropout rate of discriminator
  • z_mask0 [tf.Tensor]: boolean tensor which is True where the z protected attribute is 0
  • z_mask1 [tf.Tensor]: boolean tensor which is True where the z protected attribute is 1
  • z_mask2 [tf.Tensor]: boolean tensor which is True where the z protected attribute is 2
  • z_mask3 [tf.Tensor]: boolean tensor which is True where the z protected attribute is 3
  • reuse [bool]: whether or not to reuse TensorFlow variables

Return type

  • f_hat [tf.Tensor]: probabilities of generated x data being real/fake based the protected attribute z
  • y_hat [tf.Tensor]: probabilities of generated y classified by classifier based on the generated x data
  • z_hat [tf.Tensor]: probabilities of protected attribute z based based on generated y
  • g_hat [tf.Tensor]: probabilities of generated x input

buildDiscriminator

Method

buildDiscriminator(x_real, y_real, x_fake, y_fake, yb_real, yb_fake, keepRate, decodeVariables, z_r_mask0, z_r_mask1, z_r_mask2, z_r_mask3, z_r_mask4, z_f_mask0, z_f_mask1, z_f_mask2, z_f_mask3, z_f_mask4)

Builds the discriminator.

Parameters

  • x_real [tf.Tensor]: tensor of x real input
  • y_real [tf.Tensor]: tensor of y real outcome
  • x_fake [tf.Tensor]: tensor of x fake input
  • y_fake [tf.Tensor]: tensor of y fake outcome
  • yb_real [tf.Tensor]: boolean tensor which is True where the real y real outcome is 0
  • yb_fake [tf.Tensor]: boolean tensor which is True where the real y fake outcome is 0
  • keepRate [float]: dropout rate of discriminator
  • decodeVariables [dict]: variable that stores weights and biases of decompressed x input
  • z_r_mask0 [tf.Tensor]: boolean tensor which is True where the real z protected attribute is 0
  • z_r_mask1 [tf.Tensor]: boolean tensor which is True where the real z protected attribute is 1
  • z_r_mask2 [tf.Tensor]: boolean tensor which is True where the real z protected attribute is 2
  • z_r_mask3 [tf.Tensor]: boolean tensor which is True where the real z protected attribute is 3
  • z_r_mask4 [tf.Tensor]: boolean tensor which is True where the real z protected attribute is 4
  • z_f_mask0 [tf.Tensor]: boolean tensor which is True where the fake z protected attribute is 0
  • z_f_mask1 [tf.Tensor]: boolean tensor which is True where the fake z protected attribute is 1
  • z_f_mask2 [tf.Tensor]: boolean tensor which is True where the fake z protected attribute is 2
  • z_f_mask3 [tf.Tensor]: boolean tensor which is True where the fake z protected attribute is 3
  • z_f_mask4 [tf.Tensor]: boolean tensor which is True where the fake z protected attribute is 4

Return type

  • tensors [tf.Tensors]: decoded/predicted x, losses and probabilities of real and fake variables f, y, z, and g

print2file

Method

print2file(buf, outFile)

Writes training metrics to log file.

Parameters

  • buf [str]: data to write to file
  • outFile [str]: file path to model output

Return type

None

generateData

Method

generateData(nSamples=100,modelFile='model',batchSize=100, outFile='out', p_z=[], p_y=[])

Generates less-biased data using trained model and save to output path specified.

Parameters

  • nSamples [int]: size of entire original dataset
  • modelFile [str]: path to trained Fair Transformer GAN model
  • batchSize [int]: size each batch
  • outFile [str]: path to generated data files in numpy format
  • p_z [list]: probability distribution of protected attribute
  • p_y [list]: probability distribution of outcome

Return type

None

calculateDiscAuc

Method

calculateDiscAuc(preds_real, preds_fake)

Calculates discriminator AUC from real and fake predictions.

Parameters

  • preds_real [numpy.ndarray]: array of real predictions
  • preds_fake [numpy.ndarray]: array of fake predictions

Return type

  • auc [float]: discriminator AUC

calculateDiscAccuracy

Method

calculateDiscAccuracy(preds_real, preds_fake)

Calculates discriminator accuracy from real and fake predictions.

Parameters

  • preds_real [numpy.ndarray]: array of real predictions
  • preds_fake [numpy.ndarray]: array of fake predictions

Return type

  • acc [float]: discriminator accuracy

calculateGenAccuracy

Method

calculateGenAccuracy(preds_real, preds_fake)

Calculates generator accuracy from real and fake predictions.

Parameters

  • preds_real [numpy.ndarray]: array of real predictions
  • preds_fake [numpy.ndarray]: array of fake predictions

Return type

  • acc [float]: generator accuracy

pair_rd

Method

pair_rd(y_real, z_real)

Helper function to calculate total pairwise risk difference across all z protected attribute classes.

Parameters

  • y_real [numpy.ndarray]: array of y outcome values
  • z_real [numpy.ndarray]: array of z protected attribute values

Return type

  • risk_diff [float]: total risk difference score across all z protected attribute classes.

calculateRD

Method

calculateRD(y_real, z_real)

Calculates risk difference score across all z protected attribute classes during training. Calls pair_rd() function.

Parameters

  • y_real [numpy.ndarray]: array of original y outcome values
  • z_real [numpy.ndarray]: array of original z protected attribute values

Return type

  • risk_diff [float]: total risk difference score across all z protected attribute classes

calculateClassifierAccuracy

Method

calculateClassifierAccuracy(preds_real, y_real)

Calculates classifier accuracy between real y outcome and predicted y.

Parameters

  • preds_real [numpy.ndarray]: array of predicted y based on x data generated from real x data
  • y_real [numpy.ndarray]: array of original y outcome values

Return type

  • acc [float]: classider accuracy

calculateClassifierRD

Method

calculateClassifierRD(preds_real, z_real, yreal)

Calculate classifier risk difference score across all z protected attribute classes during training.

Parameters

  • preds_real [numpy.ndarray]: array of predicted y based on x data generated from real x data
  • z_real [numpy.ndarray]: array of original z protected attribute values
  • y_real [numpy.ndarray]: array of original y outcome values

Return type

  • rd [float]: total risk difference score across all z protected attribute classes
  • rd1 [float]: risk difference score across all z protected attribute classes when y outcome = 1
  • rd0 [float]: risk difference score across all z protected attribute classes when y outcome = 0

create_z_masks

Method

create_z_masks(z_arr)

Create a z_mask for each class (max 5) of protected attribute in z array. Each boolean mask is True at each index it exists in the z array.

Parameters

  • z_arr [numpy.ndarray]: array of z protected attribute values

Return type

  • z_mask0 [numpy.ndarray]: array of z_mask for protected attribute class 0
  • z_mask1 [numpy.ndarray]: array of z_mask for protected attribute class 1
  • z_mask2 [numpy.ndarray]: array of z_mask for protected attribute class 2
  • z_mask3 [numpy.ndarray]: array of z_mask for protected attribute class 3
  • z_mask4 [numpy.ndarray]: array of z_mask for protected attribute class 4

train

Method

create_z_masks(dataPath='data',modelPath='',outPath='out',pretrainEpochs=100,nEpochs=300,generatorTrainPeriod=1,discriminatorTrainPeriod=2,pretrainBatchSize=100,batchSize=1000,saveMaxKeep=0, p_z=[], p_y=[])

Train the Fair Transformer GAN model and save it to the output path specified.

Parameters

  • dataPath [str]: path to input dataset
  • modelPath [str]: path to existing model, if it exists
  • outPath [str]: path to store model output and logs
  • nEpochs [int]: number of epochs to train the model
  • discriminatorTrainPeriod [int]: number of periods to train the discriminator per batch per epoch
  • generatorTrainPeriod [int]: number of periods to train the generator per batch per epoch
  • pretrainBatchSize [int]: size of pretraining batch
  • batchSize [int]: size of training batch
  • pretrainEpochs [int: number of epochs to pretrain model
  • saveMaxKeep [int]: number of checkpoint files to save
  • p_z [list]: probability distribution of protected attribute
  • p_y [list]: probability distribution of outcome

Return type

None

MultiHeadSelfAttention class

Transformer multi-head self attention block that is used in generator of FairTransformerGAN. Used by buildGenerator and buildGeneratorTest of FairTransformerGAN class.ßß

Function Description
init Intializes instance of MultiHeadSelfAttention class
call Calculates the attention of the inputted data

MultiHeadSelfAttention

CLASS

MultiHeadSelfAttention()

init

Method

__init__(num_heads, input_dim, dropout_rate=0.0)

Initializes multi-head self-attention block.

Parameters

  • num_heads [int]: number of heads of self-attention
  • input_dim [int]: size of input layer
  • dropout_rate [float]: proportion of neurons to drop in a layer

Return type

None

call

Method

call(inputs, mask=None, training=None)

Calculates the mult-head self-attention of the inputs

Parameters

  • inputs [tf.Tensor]: input layer
  • mask [tf.Tensor]: mask layer
  • training [bool]: whether we should add dropout during attention calculation

Return type

  • attention_output [tf.Tensor]: self-attention block resized back to input dimension