Metalearner Utils

convert_pd_to_np[source]

convert_pd_to_np(*args)

check_treatment_vector[source]

check_treatment_vector(treatment, control_name=None)

check_p_conditions[source]

check_p_conditions(p, t_groups)

check_explain_conditions[source]

check_explain_conditions(method, models, X=None, treatment=None, y=None)

clean_xgboost_objective[source]

clean_xgboost_objective(objective)

Translate objective to be compatible with loaded xgboost version

Args

objective : string The objective to translate.

Returns

The translated objective, or original if no translation was required.

get_xgboost_objective_metric[source]

get_xgboost_objective_metric(objective)

Get the xgboost version-compatible objective and evaluation metric from a potentially version-incompatible input.

Args

objective : string An xgboost objective that may be incompatible with the installed version.

Returns

A tuple with the translated objective and evaluation metric.

ape[source]

ape(y, p)

Absolute Percentage Error (APE). Args: y (float): target p (float): prediction

Returns: e (float): APE

mape[source]

mape(y, p)

Mean Absolute Percentage Error (MAPE). Args: y (numpy.array): target p (numpy.array): prediction

Returns: e (numpy.float64): MAPE

smape[source]

smape(y, p)

Symmetric Mean Absolute Percentage Error (sMAPE). Args: y (numpy.array): target p (numpy.array): prediction

Returns: e (numpy.float64): sMAPE

rmse[source]

rmse(y, p)

Root Mean Squared Error (RMSE). Args: y (numpy.array): target p (numpy.array): prediction

Returns: e (numpy.float64): RMSE

gini[source]

gini(y, p)

Normalized Gini Coefficient.

Args: y (numpy.array): target p (numpy.array): prediction

Returns: e (numpy.float64): normalized Gini coefficient

regression_metrics[source]

regression_metrics(y, p, w=None, metrics={'RMSE': <function rmse at 0x7f2b5b564a60>, 'sMAPE': <function smape at 0x7f2b5b5649d0>, 'Gini': <function gini at 0x7f2b5b564af0>})

Log metrics for regressors.

Args: y (numpy.array): target p (numpy.array): prediction w (numpy.array, optional): a treatment vector (1 or True: treatment, 0 or False: control). If given, log metrics for the treatment and control group separately metrics (dict, optional): a dictionary of the metric names and functions

logloss[source]

logloss(y, p)

Bounded log loss error. Args: y (numpy.array): target p (numpy.array): prediction Returns: bounded log loss error

classification_metrics[source]

classification_metrics(y, p, w=None, metrics={'AUC': <function roc_auc_score at 0x7f2b6c9455e0>, 'Log Loss': <function logloss at 0x7f2b5b564c10>})

Log metrics for classifiers.

Args: y (numpy.array): target p (numpy.array): prediction w (numpy.array, optional): a treatment vector (1 or True: treatment, 0 or False: control). If given, log metrics for the treatment and control group separately metrics (dict, optional): a dictionary of the metric names and functions

smd[source]

smd(feature, treatment)

Calculate the standard mean difference (SMD) of a feature between the treatment and control groups.

The definition is available at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144483/#s11title

Args: feature (pandas.Series): a column of a feature to calculate SMD for treatment (pandas.Series): a column that indicate whether a row is in the treatment group or not

Returns: (float): The SMD of the feature

create_table_one[source]

create_table_one(data, treatment_col, features)

Report balance in input features between the treatment and control groups.

References: R's tableone at CRAN: https://github.com/kaz-yos/tableone Python's tableone at PyPi: https://github.com/tompollard/tableone

Args: data (pandas.DataFrame): total or matched sample data treatment_col (str): the column name for the treatment features (list of str): the column names of features

Returns: (pandas.DataFrame): A table with the means and standard deviations in the treatment and control groups, and the SMD between two groups for the features.

class NearestNeighborMatch[source]

NearestNeighborMatch(caliper=0.2, replace=False, ratio=1, shuffle=True, random_state=None)

Propensity score matching based on the nearest neighbor algorithm.

Attributes: caliper (float): threshold to be considered as a match. replace (bool): whether to match with replacement or not ratio (int): ratio of control / treatment to be matched. used only if replace=True. shuffle (bool): whether to shuffle the treatment group data before matching random_state (numpy.random.RandomState or int): RandomState or an int seed

class MatchOptimizer[source]

MatchOptimizer(treatment_col='is_treatment', ps_col='pihat', user_col=None, matching_covariates=['pihat'], max_smd=0.1, max_deviation=0.1, caliper_range=(0.01, 0.5), max_pihat_range=(0.95, 0.999), max_iter_per_param=5, min_users_per_group=1000, smd_cols=['pihat'], dev_cols_transformations={'pihat': <function mean at 0x7f2ba076db80>}, dev_factor=1.0, verbose=True)