revscoring.scoring.statistics

Statistics represent the fitness of a revscoring.Model. They can be fit() to scores and labels and then output using format(). Once initialize, a Statistics instance behaves like a dict of statistics values.

Classification

Classification statistics can be generated for “Classifiers” – models that produce factors (aka levels) as an ouput. E.g. True and False or “A”, “B”, or “C”.

class revscoring.scoring.statistics.Classification(labels, multilabel=False, prediction_key='prediction', decision_key=None, threshold_ndigits=None, population_rates=None, **kwargs)[source]
fit(score_labels)[source]

Fit to scores and labels.

Parameters:
score_labels : [( dict, mixed )]

A collection of scores-label pairs generated using revscoring.Model.score. Note that fitting is usually done using data withheld during model training

format_json(path_tree, **kwargs)[source]

Formats a json-able dictionary including rounding to at most ndigits.

format_str(path_tree, **kwargs)[source]

Formats path tree into a table and rounding to at most ndigits.

lookup(path)[source]

Looks up a specific information value based on either a string pattern or a path.

For example, the pattern “stats.roc_auc.labels.true” is the same as the path ['stats', 'roc_auc', 'labels', True].

Parameters:
path : str | list

The location of the information to lookup.

class revscoring.scoring.statistics.classification.Counts(labels, score_labels, prediction_key)[source]
class revscoring.scoring.statistics.classification.Rates(counts, population_rates=None)[source]
class revscoring.scoring.statistics.classification.MicroMacroStats(stats, field)[source]
class revscoring.scoring.statistics.classification.ScaledPredictionStatistics(y_preds=None, y_trues=None, counts=None, population_rate=None)[source]
accuracy()[source]

The proportion of predictions that were right.

accuracy = correct / n
f1()[source]

An information theoretic statistic that balances specificity with sensitivity.

filter_rate()[source]

The proportion of observations that are not matched.

filter-rate = 1 - match-rate
fpr()[source]

False-positive rate. The proportion of proportion of non-target class items that are not matched.

fpr = false-positives / !target-class
match_rate()[source]

The proportion of observations that are matched in prediction.

match-rate = positives / n
precision()[source]

The proportion of matched observations that are correctly matched. AKA “positive predictive value”.

precision = true-positives / true-predicions
recall()[source]

The proportion of the target class that the classifier matches. AKA “true-positive rate” and “sensitivity”.

recall = true-positives / target-class
class revscoring.scoring.statistics.classification.ScaledThresholdStatistics(y_decisions, y_trues, population_rate=None, threshold_ndigits=None)[source]
class revscoring.scoring.statistics.classification.ScaledClassificationMatrix(y_preds=None, y_trues=None, counts=None, population_rate=None)[source]
fit(y_preds, y_trues)[source]
Parameters:
y_preds : [ bool ]

Predictions where True represents a prediction of the target class

y_trues : [ bool ]

Labels where True represents a label matching the target class

rescale(tp, fp, tn, fn)[source]

Re-scale a matrix based on sample counts

Parameters:
tp : int

True positives

fp : int

False positives

tn : int

True negatives

fn : int

False negatives

class revscoring.scoring.statistics.classification.ThresholdOptimization(maximize, target_stat, cond_stat, greater, cond_value)[source]
get_optimal(threshold_statistics)[source]

Generates an optimized value by scanning a sequence of ScaledThresholdStatistics for a the best threshold that matches the conditional criteria. This function returns the entire ScaledPredictionStatistics mapping at the optimal threshold.

optimize_from(threshold_statistics)[source]

Generates an optimized value by scanning a sequence of ScaledThresholdStatistics for a the best threshold that matches the conditional criteria. This function returns the value of the optimized target statistic (or None).

classmethod parse(pattern)[source]

Parse a formatted string representing a threshold optimization. E.g. ‘maximum recall @ precision >= 0.9’ or ‘minimum match_rate @ recall >= 0.9’.

Parameters:
pattern : str

The optimization pattern to parse

Abstract base class

class revscoring.scoring.Statistics(*args, **kwargs)[source]
fit(score_labels)[source]

Fit to scores and labels.

Parameters:
score_labels : [( dict, mixed )]

A collection of scores-label pairs generated using revscoring.Model.score. Note that fitting is usually done using data withheld during model training