revscoring.languages.features¶
Dictionary¶
Implements a feature set based off of dictionary lookup.
-
class
revscoring.languages.features.
Dictionary
(name, dictionary_check)[source]¶ Parameters: - name : str
A name for the collection
- dictionary_check : func
A function that, given a word, performs a dictionary check and returns True if the word exists.
Supporting classes¶
-
class
revscoring.languages.features.dictionary.
Revision
(name, revision_datasources)[source]¶ -
dict_words
= None¶ int : A count of the number of dictionary words in the revision
-
non_dict_words
= None¶ int : A count of the number of non-dictionary words in the revision
-
-
class
revscoring.languages.features.dictionary.
Diff
(name, diff_datasources)[source]¶ -
dict_words_added
= None¶ int : A count of the number of dictionary words added
-
dict_words_removed
= None¶ int : A count of the number of dictionary words removed
-
non_dict_words_added
= None¶ int : A count of the number of non-dictionary words added
-
non_dict_words_removed
= None¶ int : A count of the number of non-dictionary words removed
-
dict_word_delta_sum
= None¶ int : The sum of word frequency deltas for dictionary words
-
dict_word_delta_increase
= None¶ int : The sum of word frequency delta increases for dictionary words
-
dict_word_delta_decrease
= None¶ int : The sum of word frequency delta decreases for dictionary words
-
non_dict_word_delta_sum
= None¶ int : The sum of word frequency deltas for non-dictionary words
-
non_dict_word_delta_increase
= None¶ int : The sum of word frequency delta increases for non-dictionary words
-
non_dict_word_delta_decrease
= None¶ int : The sum of word frequency delta decreases for non-dictionary words
-
dict_word_prop_delta_sum
= None¶ float : The sum of word frequency proportional delta for dictionary words
-
dict_word_prop_delta_increase
= None¶ float : The sum of word frequency proportional delta increases for dictionary words
-
dict_word_prop_delta_decrease
= None¶ float : The sum of word frequency proportional delta decreases for dictionary words
-
non_dict_word_prop_delta_sum
= None¶ float : The sum of word frequency proportional delta for non-dictionary words
-
non_dict_word_prop_delta_increase
= None¶ float : The sum of word frequency proportional delta increase for non-dictionary words
-
non_dict_word_prop_delta_decrease
= None¶ float : The sum of word frequency proportional delta decrease for non-dictionary words
-
RegexMatches¶
Implements a feature set based off of a set of regexes applied to strings.
-
class
revscoring.languages.features.
RegexMatches
(name, regexes, exclusions=None, wrapping=('\b', '\b'), text_preprocess=None)[source]¶ Parameters: - name : str
A name for the collection
- regexes : list ( str )
A list of regex patterns to match.
- exclusions : list ( str )
A list of terms to explicitly not match
- wrapping : tuple ( str, str )
Insert these characters around matches in the regular expression
-
excluding
(exclusions, name=None)[source]¶ Returns a new
RegexMatches
that includes a set of exclusions.Parameters: - exclusions : list ( str )
A list of terms to explicitly not match
- name : str
A new name for the collection. If unspecified, the old name will be used
Supporting classes¶
-
class
revscoring.languages.features.regex_matches.
Revision
(name, regexes, revision_datasources)[source]¶ -
matches
= None¶ int : A count of the number of matches found in the text
-
-
class
revscoring.languages.features.regex_matches.
Diff
(name, regexes, diff_datasources)[source]¶ -
matches_added
= None¶ int : The number of matches added in the edit
-
matches_removed
= None¶ int : The number of matches removed in the edit
-
match_delta_sum
= None¶ int : The sum of frequency delta for matched strings
-
match_delta_increase
= None¶ int : The sum of frequency delta increases for matched strings
-
match_delta_decrease
= None¶ int : The sum of frequency delta decreases for matched strings
-
match_prop_delta_sum
= None¶ int : The sum of proportional frequency delta for matched strings
-
match_prop_delta_increase
= None¶ int : The sum of proportional frequency delta increases for matched strings
-
match_prop_delta_decrease
= None¶ int : The sum of proportional frequency delta decreases for matched strings
-
Stopwords¶
Implements a feature set based off of filtering words for stopwords
-
class
revscoring.languages.features.
Stopwords
(name, stopword_set)[source]¶ Parameters: - name : str
A name for the collection
- stopword_set : set ( str )
A set of stopwords
Supporting classes¶
-
class
revscoring.languages.features.stopwords.
Revision
(name, revision_datasources)[source]¶ -
stopwords
= None¶ int : A count of the number of stopwords in the content
-
non_stopwords
= None¶ int : A count of the number of non-stopwords in the content
-
-
class
revscoring.languages.features.stopwords.
Diff
(name, diff_datasources)[source]¶ -
stopwords_added
= None¶ int : A count of stopwords added
-
stopwords_removed
= None¶ int : A count of stopwords removed
-
non_stopwords_added
= None¶ int : A count of non-stopwords added
-
non_stopwords_removed
= None¶ int : A count of non-stopwords removed
-
stopword_delta_sum
= None¶ int : The sum of word frequency deltas for stopwords
-
stopword_delta_increase
= None¶ int : The sum of word frequency delta increases for stopwords
-
stopword_delta_decrease
= None¶ int : The sum of word frequency delta decreases for stopwords
-
non_stopword_delta_sum
= None¶ int : The sum of word frequency deltas for non-stopwords
-
non_stopword_delta_increase
= None¶ int : The sum of word frequency delta increases for non-stopwords
-
non_stopword_delta_decrease
= None¶ int : The sum of word frequency delta decreases for non-stopwords
-
stopword_prop_delta_sum
= None¶ float : The sum of proportional word frequency deltas for stopwords
-
stopword_prop_delta_increase
= None¶ float : The sum of proportional word frequency delta increases for stopwords
-
stopword_prop_delta_decrease
= None¶ float : The sum of proportional word frequency delta decreases for stopwords
-
non_stopword_prop_delta_sum
= None¶ float : The sum of proportional word frequency deltas for non-stopwords
-
non_stopword_prop_delta_increase
= None¶ float : The sum of proportional word frequency delta increases for non-stopwords
-
non_stopword_prop_delta_decrease
= None¶ float : The sum of proportional word frequency delta decreases for non-stopwords
-
Stemmed¶
Implements a feature set based off of stemmer applied to words.
-
class
revscoring.languages.features.
Stemmed
(name, stem_word)[source]¶ Parameters: - name : str
A name for the collection
- stem_word : func
A function that, give a word, will return a stemmed version of that word
Supporting classes¶
-
class
revscoring.languages.features.stemmed.
Revision
(name, revision_datasources)[source]¶ -
unique_stems
= None¶ int : A count of unique stemmed words.
-
stem_chars
= None¶ int : A count of characters in stemmed words.
-
-
class
revscoring.languages.features.stemmed.
Diff
(name, diff_datasources)[source]¶ -
stem_delta_sum
= None¶ int : The sum of frequency deltas for stemmed words
-
stem_delta_increase
= None¶ int : The sum of frequency delta increases for stemmed words
-
stem_delta_decrease
= None¶ int : The sum of frequency delta decreases for stemmed words
-
stem_prop_delta_sum
= None¶ int : The sum of proportional frequency deltas for stemmed words
-
stem_prop_delta_increase
= None¶ int : The sum of proportional frequency delta increases for stemmed words
-
stem_prop_delta_decrease
= None¶ int : The sum of proportional frequency delta decreases for stemmed words
-