revscoring.datasources.meta¶
Meta-Features are classes that extend Datasource
and
implement common operations on other Datasource
.
dicts¶
These meta-datasources operate on revscoring.Datasource
’s that
return dict’s
-
class
revscoring.datasources.meta.dicts.
keys
(dict_datasource, name=None)[source]¶ Generates a set of dict keys
Parameters: - dict_datasource :
revscoring.Datasource
A datasource that generates a dict
- name : str
A name for the new datasource.
- dict_datasource :
-
class
revscoring.datasources.meta.dicts.
values
(dict_datasource, name=None)[source]¶ Generates a list of dict values
Parameters: - dict_datasource :
revscoring.Datasource
A datasource that generates a dict
- name : str
A name for the new datasource.
- dict_datasource :
extractors¶
These meta-datasources operate on revscoring.Datasource
’s that
return str’s or list ( str ) and extract information from them.
-
class
revscoring.datasources.meta.extractors.
regex
(regexes, text_datasource, regex_flags=<RegexFlag.IGNORECASE: 2>, wrapping=('\b', '\b'), exclusions=None, name=None)[source]¶ Generates a list of strings that match any of a set of privided regexes
Parameters: - regexes : list ( str )
A list of regexes to find in the text
- text_datasource :
revscoring.Datasource
A datasource that returns a str or a list of str
- regex_flags : int
A set of regex flags to use in matching
- wrapping : ( str, str )
Wrap all regexes with these values. This is useful for languages that have word boundaries.
- name : str
A name for the new datasource
filters¶
These meta-datasources operate on revscoring.Datasource
’s that
return list’s and produce sub-lists.
-
class
revscoring.datasources.meta.filters.
filter
(include, items_datasource, inverse=False, name=None)[source]¶ Generates a filtered list of items
Parameters: - include : func
A function that returns True when an item should be included
- items_datasource :
revscoring.Datasource
A datasource that generates a list of items
- name : str
A name for the datasource.
-
class
revscoring.datasources.meta.filters.
regex_matching
(regex, strs_datasource, name=None)[source]¶ Generates a filtered list of items
Parameters: - regex : str | compiled re
A regular expression to match (case-insensitive if a str is provided)
- items_datasource :
revscoring.Datasource
A datasource that generates a list of items
- name : str
A name for the datasource.
-
class
revscoring.datasources.meta.filters.
positive
(numbers_datasource, name=None)[source]¶ Generates a filtered list of positive numbers from a list of numbers.
Parameters: - numbers_datasource :
revscoring.Datasource
A datasource that generates the subset of numbers that are positive
- name : str
A name for the datasource.
- numbers_datasource :
-
class
revscoring.datasources.meta.filters.
negative
(numbers_datasource, name=None)[source]¶ Generates a filtered list of negative numbers from a list of numbers.
Parameters: - numbers_datasource :
revscoring.Datasource
A datasource that generates the subset of numbers that are negative
- name : str
A name for the datasource.
- numbers_datasource :
frequencies¶
These meta-datasources operate on revscoring.Datasource
’s that
return list’s of items and produce frequency tables.
-
class
revscoring.datasources.meta.frequencies.
table
(items_datasource, name=None)[source]¶ Generates a frequency table for a list of items generated by another datasource.
Parameters: - items_datasource :
revscoring.Datasource
A datasource that generates a list of some hashable item
- name : str
A name for the datasource.
- items_datasource :
-
class
revscoring.datasources.meta.frequencies.
delta
(old_ft_datasource, new_ft_datasource, name=None)[source]¶ Generates a frequency table diff by comparing two frequency tables.
Parameters: - old_ft_datasource :
revscoring.Datasource
A frequency table datasource
- new_ft_datasource :
revscoring.Datasource
A frequency table datasource
- name : str
A name for the datasource.
- old_ft_datasource :
-
class
revscoring.datasources.meta.frequencies.
prop_delta
(old_ft_datasource, delta_datasource, name=None)[source]¶ Generates a proportional frequency table diff by comparing a frequency table diff with an old frequency table.
Parameters: - old_ft_datasource :
revscoring.Datasource
A frequency table datasource
- new_ft_datasource :
revscoring.Datasource
A frequency table datasource
- name : str
A name for the datasource.
- old_ft_datasource :
-
class
revscoring.datasources.meta.frequencies.
positive
(table_datasource, name=None)[source]¶ Filters a table (counts, delta, prop_delta, etc.) for positive values.
Parameters: - table_datasource :
revscoring.Datasource
A frequency table datasource
- name : str
A name for the datasource.
- table_datasource :
-
class
revscoring.datasources.meta.frequencies.
negative
(table_datasource, absolute=False, name=None)[source]¶ Filters a table (counts, delta, prop_delta, etc.) for negative values.
Parameters: - table_datasource :
revscoring.Datasource
A frequency table datasource
- absolute : bool
Make negative values positive
- name : str
A name for the datasource.
- table_datasource :
gramming¶
These meta-datasources operate on revscoring.Datasource
’s that returns
a list of strings (i.e. “tokens”) and produces a list of ngram/skipgram
sequences.
-
class
revscoring.datasources.meta.gramming.
gram
(items_datasource, grams=[(0,)], name=None)[source]¶ Converts a sequence of items into ngrams.
Parameters: - items_datasource :
revscoring.Datasource
A datasource that generates a list of some item
- grams : list ( tuple ( int ) )
A list of ngram and/or skipgram sequences to produce
- name : str
A name for the datasource.
- items_datasource :
hashing¶
These meta-datasources operate on revscoring.Datasource
’s that returns
a list of strings (i.e. “tokens”) and produces a list of ngram/skipgram
sequences.
-
class
revscoring.datasources.meta.hashing.
hash
(items_datasource, n=1048576, name=None)[source]¶ Converts a sequence of items into a sequence of portable hashes (int) based on the result of applying str(). E.g. str([“foo”]) = ‘[“foo”]’
Parameters: - items_datasource :
revscoring.Datasource
A datasource that generates a list of items to be hashed
- n : int
The number of potential hashes that can be produced
- name : str
A name for the datasource.
- items_datasource :
indexable¶
These meta-datasources operate on revscoring.Datasource
’s that
return list’s and tuple’s
-
class
revscoring.datasources.meta.indexable.
index
(i, datasources, default=NotImplemented, name=None)[source]¶ Generates a datasource that returns the value that appears at i
Parameters: - i : int
The index of a value to return
- default : mixed
The value to return if no value exists at i. If not specified, an IndexError will be raised
- name : str
A name for the new datasource.
mappers¶
These meta-datasources operate on revscoring.Datasource
’s that
return list’s and apply a specific function to each item.
-
class
revscoring.datasources.meta.mappers.
map
(apply, items_datasource, name=None)[source]¶ Returns a
revscoring.Datasource
that applies a function over a set of items generated by another datasource.Parameters: - apply : func
A function to apply to each item generated by items_datasource
- items_datasource :
revscoring.Datasource
A datasource that generates a list of some item
- name : str
A name for the datasource.
-
class
revscoring.datasources.meta.mappers.
lower_case
(strs_datasource, name=None)[source]¶ Returns a
revscoring.Datasource
that lower cases a list of str returned by another datasource.Parameters: - strs_datasource :
revscoring.Datasource
A datasource that generates a list of str
- name : str
A name for the datasource.
- strs_datasource :
-
class
revscoring.datasources.meta.mappers.
derepeat
(strs_datasource, name=None)[source]¶ Returns a
revscoring.Datasource
that prevents a list of str from having repeated characters (e.g. “foo” –> “fo”).Parameters: - strs_datasource :
revscoring.Datasource
A datasource that generates a list of str
- name : str
A name for the datasource.
- strs_datasource :
-
class
revscoring.datasources.meta.mappers.
abs
(numbers_datasource, name=None)[source]¶ Returns a
revscoring.Datasource
that converts a list of numeric values into a list of absolute numeric values.Parameters: - numbers_datasource :
revscoring.Datasource
A datasource that generates a list of numeric values
- name : str
A name for the datasource.
- numbers_datasource :
selectors¶
These meta-datasources operate on revscoring.Datasource
’s that return
a flat dict of key-value pairs (aka a “table”) and filter (“select”) keys
and/or weight values.
-
class
revscoring.datasources.meta.selectors.
tfidf
(table_datasource, max_terms=None, weight=True, boolean=False, name=None)[source]¶ Selects a subset of a frequency table based on term utility and applies TF-iDF weighting.
Parameters: - table_datasource :
revscoring.Datasource
A datasource that generates a dict of term frequency counts
- max_terms : int
The maximum number of terms that will be selected. The terms with the highest proportional representation in a label class are selected.
- weight : bool
Should TF-iDF weighting be applied to output counts?
- boolean : bool
Normalize counts to 0 (not in document) and 1 (in document). Note that negative frequencies will be converted to -1.
- name : str
A name for the datasource.
- table_datasource :
-
class
revscoring.datasources.meta.selectors.
filter_keys
(table_datasource, keys, name=None)[source]¶ Selects a subset of features (key/values) based a set of keys.
Parameters: - table_datasource :
revscoring.Datasource
A datasource that generates a table including only the specified keys
- keys : iterable ( hashable )
The keys to select from the table
- name : str
A name for the datasource.
- table_datasource :
timestamp¶
These meta-datasources operate on revscoring.Datasource
’s that
return mwtypes.Timestamp of the given string.