revscoring.scoring.models

This module contains a collection of models that implement a simple function: score(). Currently, all models are a subclass of revscoring.scoring.models.Learned which means that they also implement train() and cross_validate().

Gradient Boosting

A collection of Gradient Boosting type classifier models.

class revscoring.scoring.models.GradientBoosting(features, labels, multilabel=False, statistics=None, population_rates=None, threshold_ndigits=None, **kwargs)[source]

Implements a Gradient Boosting model.

Estimator

alias of sklearn.ensemble.gradient_boosting.GradientBoostingClassifier

Naive Bayes

A collection of Naive Bayes type classifier models.

class revscoring.scoring.models.GaussianNB(features, labels, multilabel=False, statistics=None, population_rates=None, threshold_ndigits=None, **kwargs)[source]

Implements a Gaussian Naive Bayes model

Estimator

alias of sklearn.naive_bayes.GaussianNB

class revscoring.scoring.models.MultinomialNB(features, labels, multilabel=False, statistics=None, population_rates=None, threshold_ndigits=None, **kwargs)[source]

Implements a Multinomial Naive Bayes model

Estimator

alias of sklearn.naive_bayes.MultinomialNB

class revscoring.scoring.models.BernoulliNB(features, labels, multilabel=False, statistics=None, population_rates=None, threshold_ndigits=None, **kwargs)[source]

Implements a Bernoulli Naive Bayes model

Estimator

alias of sklearn.naive_bayes.BernoulliNB

Linear Regression

A collection of linear classifier models.

class revscoring.scoring.models.LogisticRegression(*args, label_weights=None, **kwargs)[source]

Implements a Logistic Regression

Estimator

alias of sklearn.linear_model.logistic.LogisticRegression

Support Vector

A collection of Support Vector Machine type classifier models.

class revscoring.scoring.models.LinearSVC(features, labels, multilabel=False, statistics=None, population_rates=None, threshold_ndigits=None, **kwargs)[source]

Implements a Support Vector Classifier model with a Linear kernel.

class revscoring.scoring.models.RBFSVC(features, labels, multilabel=False, statistics=None, population_rates=None, threshold_ndigits=None, **kwargs)[source]

Implements a Support Vector Classifier model with an RBF kernel.

class revscoring.scoring.models.SVC(features, labels, multilabel=False, statistics=None, population_rates=None, threshold_ndigits=None, **kwargs)[source]

Implements a Support Vector Classifier model.

Estimator

alias of sklearn.svm.classes.SVC

Random Forest

A collection of Random Forest type classifier models.

class revscoring.scoring.models.RandomForest(features, labels, multilabel=False, statistics=None, population_rates=None, threshold_ndigits=None, **kwargs)[source]

Implements a Random Forest model.

Estimator

alias of sklearn.ensemble.forest.RandomForestClassifier

Abstract classes

All scoring models are an implementation of revscoring.Model.

class revscoring.scoring.models.Learned(*args, scale=False, center=False, **kwargs)[source]
cross_validate(values_labels, folds=10, processes=1)[source]

Trains and tests the model agaists folds of labeled data.

Parameters:
values_labels : [( <feature_values>, <label> )]

an iterable of labeled data Where <values_labels> is an ordered collection of predictive values that correspond to the Feature s provided to the constructor

folds : int

When set to 1, cross-validation will run in the parent thread. When set to 2 or greater, a multiprocessing.Pool will be created.

fit_scaler_and_transform(fv_vectors)[source]

Fits the internal scale to labeled data.

Parameters:
fv_vectors : iterable (( <feature_values>, <label> ))

an iterable of labeled data Where <values_labels> is an ordered collection of predictive values that correspond to the Feature s provided to the constructor

Returns:

A dictionary of model statistics.

train(values_labels)[source]

Fits the model using labeled data by learning its shape.

Parameters:
values_labels : [( <feature_values>, <label> )]

an iterable of labeled data Where <values_labels> is an ordered collection of predictive values that correspond to the revscoring.Feature s provided to the constructor

class revscoring.scoring.models.Classifier(features, labels, multilabel=False, population_rates=None, **kwargs)[source]

SciKit Learn-based models

Implements the basics of all sklearn based models.

class revscoring.scoring.models.sklearn.Classifier(features, labels, multilabel=False, version=None, label_weights=None, population_rates=None, scale=False, center=False, statistics=None, estimator=None, **estimator_params)[source]
score(feature_values)[source]

Generates a score for a single revision based on a set of extracted feature_values.

Parameters:
feature_values : collection(mixed)

an ordered collection of values that correspond to the Feature s provided to the constructor

Returns:

A dict with the fields:

  • prediction – The most likely class
score_many(feature_values)[source]

Generates a score for a bunch of revisions based on a set of extracted feature_values.

Parameters:
feature_values : collection(mixed)

an ordered collection of values that correspond to the Feature s provided to the constructor

Returns:

A dict with the fields:

  • prediction – The most likely class
train(values_labels, **kwargs)[source]

Fits the internal model to the provided values_labels.

Returns:

A dictionary with the fields:

  • seconds_elapsed – Time in seconds spent fitting the model
class revscoring.scoring.models.sklearn.ProbabilityClassifier(features, labels, multilabel=False, statistics=None, population_rates=None, threshold_ndigits=None, **kwargs)[source]
score(feature_values)[source]

Generates a score for a single revision based on a set of extracted feature_values.

Parameters:
feature_values : collection(mixed)

an ordered collection of values that correspond to the Feature s provided to the constructor

Returns:

A dict with the fields:

  • prediction – The most likely class
  • probability – A mapping of probabilities for input classes
    corresponding to the classes the classifier was trained on. Generating this probability is slower than a simple prediction.
score_many(feature_values)[source]

Generates a score for a bunch of revisions based on a set of extracted feature_values.

Parameters:
feature_values : array(collection(mixed))

an ordered collection of values that correspond to the Feature s provided to the constructor

Returns:

A dict with the fields:

  • prediction – The most likely class
  • probability – A mapping of probabilities for input classes
    corresponding to the classes the classifier was trained on. Generating this probability is slower than a simple prediction.