Model Training for Markov Chains¶

Markov Chains were inherently developed for predicting the next state in the sequence given the previous state. Markov Chains have an Inital Probability for the states and a Transition Probability from one state to another.

Functions¶

fill_q0(in_dict, dict_type):

Add initial probabilites for all the items in the dataset to intial probability dict eg. items in proxyObservationType, units, interpretation/variable and interpretation/variableDetail

Parameters:

in_dictdict
Initial probability dict

dict_typedict
Iterate over this dict to add its values to the initial probability dict.

Returns:

None.

calc_freq_multiple(dataframe_obj, ini_map, *argv):

Calculate the frequency of items for all the columns in argv. Conditional Probability of last column given all the other columns except the last.

Parameters:

dataframe_objpandas dataframe
Dataframe object containing training data.

ini_mapdict
Contains all the items to be considered for the model.

*argvlist
Contains the names for the columns that are being considered for calculating frequency.

Returns:

counter_dictdict
Containing count for all the items that appear against each item in the last column.

calc_freq(dataframe_obj, col1, col2, ini_map):

Calculate the frequency of items in col2 for each item in column 1. Conditional Probability of col2 given column 1

Parameters:

dataframe_objpandas dataframe
Dataframe object containing training data.

col1str
Column for which data is being calculated.

col2str
Column whose count is being taken.

ini_mapdict
Contains all the items to be considered for the model.

Returns:

counter_dictdict
Containing count for all the items that appear against each item in col1.

add_extra_keys(all_keys, transition_matrix):

Add missing items for transition from single key to all items in the dataset.

Parameters:

all_keysset
Contains all the items that should be in the transition dict for each item.

transition_matrixdict
Transition dict object according to the chain type.

Returns:

transition_matdict
Updated dict after adding all the items in the transition dict for each item.

add_one_smoothing(transition_matrix):

Add one smoothing to prevent the issue caused due to 0 transition probability from one item to the next. Convert counts to log probabilities

Parameters:

transition_matrixdict
Transition dict for all items.

Returns:

transition_matdict
Updated transition dict with log probabilities.

Usage¶

Please change the directory to /training/markovchain/ To run the code execute the following command:

cd /training/markovchain/
python mctrain.py

Extensions¶

Introduction of new fieldTypes to the sequence

All the new items for this fieldType will need to be added to the Initial Probability Dict. Apart from this the transition from one fieldType to another will change as well. Code changes will require to call calc_freq() or calc_freq_multiple() to generate the transition counts for the required columns. These transition counts will be added to the main Transition probability dict. Calling the add_one_smoothing() method will ensure that there are no 0 probabilities in the Transition Probability Dict