convert_pd_to_np
[source]
convert_pd_to_np
(*args
)
check_treatment_vector
[source]
check_treatment_vector
(treatment
,control_name
=None
)
check_p_conditions
[source]
check_p_conditions
(p
,t_groups
)
check_explain_conditions
[source]
check_explain_conditions
(method
,models
,X
=None
,treatment
=None
,y
=None
)
clean_xgboost_objective
[source]
clean_xgboost_objective
(objective
)
Translate objective to be compatible with loaded xgboost version
Args
objective : string The objective to translate.
Returns
The translated objective, or original if no translation was required.
get_xgboost_objective_metric
[source]
get_xgboost_objective_metric
(objective
)
Get the xgboost version-compatible objective and evaluation metric from a potentially version-incompatible input.
Args
objective : string An xgboost objective that may be incompatible with the installed version.
Returns
A tuple with the translated objective and evaluation metric.
ape
[source]
ape
(y
,p
)
Absolute Percentage Error (APE). Args: y (float): target p (float): prediction
Returns: e (float): APE
mape
[source]
mape
(y
,p
)
Mean Absolute Percentage Error (MAPE). Args: y (numpy.array): target p (numpy.array): prediction
Returns: e (numpy.float64): MAPE
smape
[source]
smape
(y
,p
)
Symmetric Mean Absolute Percentage Error (sMAPE). Args: y (numpy.array): target p (numpy.array): prediction
Returns: e (numpy.float64): sMAPE
rmse
[source]
rmse
(y
,p
)
Root Mean Squared Error (RMSE). Args: y (numpy.array): target p (numpy.array): prediction
Returns: e (numpy.float64): RMSE
gini
[source]
gini
(y
,p
)
Normalized Gini Coefficient.
Args: y (numpy.array): target p (numpy.array): prediction
Returns: e (numpy.float64): normalized Gini coefficient
regression_metrics
[source]
regression_metrics
(y
,p
,w
=None
,metrics
={'RMSE': <function rmse at 0x7f2b5b564a60>, 'sMAPE': <function smape at 0x7f2b5b5649d0>, 'Gini': <function gini at 0x7f2b5b564af0>}
)
Log metrics for regressors.
Args: y (numpy.array): target p (numpy.array): prediction w (numpy.array, optional): a treatment vector (1 or True: treatment, 0 or False: control). If given, log metrics for the treatment and control group separately metrics (dict, optional): a dictionary of the metric names and functions
logloss
[source]
logloss
(y
,p
)
Bounded log loss error. Args: y (numpy.array): target p (numpy.array): prediction Returns: bounded log loss error
classification_metrics
[source]
classification_metrics
(y
,p
,w
=None
,metrics
={'AUC': <function roc_auc_score at 0x7f2b6c9455e0>, 'Log Loss': <function logloss at 0x7f2b5b564c10>}
)
Log metrics for classifiers.
Args: y (numpy.array): target p (numpy.array): prediction w (numpy.array, optional): a treatment vector (1 or True: treatment, 0 or False: control). If given, log metrics for the treatment and control group separately metrics (dict, optional): a dictionary of the metric names and functions
smd
[source]
smd
(feature
,treatment
)
Calculate the standard mean difference (SMD) of a feature between the treatment and control groups.
The definition is available at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144483/#s11title
Args: feature (pandas.Series): a column of a feature to calculate SMD for treatment (pandas.Series): a column that indicate whether a row is in the treatment group or not
Returns: (float): The SMD of the feature
create_table_one
[source]
create_table_one
(data
,treatment_col
,features
)
Report balance in input features between the treatment and control groups.
References: R's tableone at CRAN: https://github.com/kaz-yos/tableone Python's tableone at PyPi: https://github.com/tompollard/tableone
Args: data (pandas.DataFrame): total or matched sample data treatment_col (str): the column name for the treatment features (list of str): the column names of features
Returns: (pandas.DataFrame): A table with the means and standard deviations in the treatment and control groups, and the SMD between two groups for the features.
class
NearestNeighborMatch
[source]
NearestNeighborMatch
(caliper
=0.2
,replace
=False
,ratio
=1
,shuffle
=True
,random_state
=None
)
Propensity score matching based on the nearest neighbor algorithm.
Attributes: caliper (float): threshold to be considered as a match. replace (bool): whether to match with replacement or not ratio (int): ratio of control / treatment to be matched. used only if replace=True. shuffle (bool): whether to shuffle the treatment group data before matching random_state (numpy.random.RandomState or int): RandomState or an int seed
class
MatchOptimizer
[source]
MatchOptimizer
(treatment_col
='is_treatment'
,ps_col
='pihat'
,user_col
=None
,matching_covariates
=['pihat']
,max_smd
=0.1
,max_deviation
=0.1
,caliper_range
=(0.01, 0.5)
,max_pihat_range
=(0.95, 0.999)
,max_iter_per_param
=5
,min_users_per_group
=1000
,smd_cols
=['pihat']
,dev_cols_transformations
={'pihat': <function mean at 0x7f2ba076db80>}
,dev_factor
=1.0
,verbose
=True
)