revscoring.languages¶
This module implements a set of languages as collections of features that are language specific.
feature collections¶
Languages implement a subset of feature collections (e.g.
Dictionary
,
Stopwords
,
Stemmed
and
RegexMatches
) based on what
language assets are available. See revscoring.languages.features
.
albanian¶
-
revscoring.languages.albanian.
badwords
= {albanian.badwords}¶ RegexMatches
features via a list of badword detecting regexes.
-
revscoring.languages.albanian.
informals
= {albanian.informals}¶ RegexMatches
features via a list of informal word detecting regexes.
arabic¶
bengali¶
-
revscoring.languages.bengali.
badwords
= {bengali.badwords}¶ RegexMatches
features via a list of badword detecting regexes.
-
revscoring.languages.bengali.
informals
= {bengali.informals}¶ RegexMatches
features via a list of informal word detecting regexes.
-
revscoring.languages.bengali.
name
= 'bengali'¶ - try:
- import enchant dictionary = enchant.Dict(“bn”)
- except enchant.errors.DictNotFoundError:
- raise ImportError(“No enchant-compatible dictionary found for ‘bn’. ” +
- “Consider installing ‘aspell-bn’.”)
dictionary = Dictionary(name + “.dictionary”, dictionary.check) “””
Dictionary
features viaenchant.Dict
“bn”. Provided by aspell-bn “”“
-
revscoring.languages.bengali.
stopwords
= {bengali.stopwords}¶ Stopwords
features copied from “common words” in https://meta.wikimedia.org/wiki/?oldid=16626444
bosnian¶
catalan¶
chinese¶
-
revscoring.languages.chinese.
badwords
= {chinese.badwords}¶ RegexMatches
features via a list of badword detecting regexes.
-
revscoring.languages.chinese.
informals
= {chinese.informals}¶ RegexMatches
features via a list of informal word detecting regexes.
-
revscoring.languages.chinese.
words_to_watch
= {chinese.words_to_watch}¶ RegexMatches
features via a list of advertising language regexes.
croatian¶
czech¶
dutch¶
english¶
estonian¶
finnishswedish
french¶
galician¶
german¶
greek¶
hebrew¶
hindi¶
hungarian¶
indonesian¶
italian¶
japanese¶
-
revscoring.languages.japanese.
badwords
= {japanese.badwords}¶ RegexMatches
features via a list of badword detecting regexes.
-
revscoring.languages.japanese.
informals
= {japanese.informals}¶ RegexMatches
features via a list of informal word detecting regexes.
korean¶
-
revscoring.languages.korean.
badwords
= {korean.badwords}¶ RegexMatches
features via a list of badword detecting regexes.
-
revscoring.languages.korean.
informals
= {korean.informals}¶ RegexMatches
features via a list of informal word detecting regexes.
latvian¶
norwegian¶
persian¶
portuguese¶
romanian¶
spanish¶
swedish¶
tamil¶
-
revscoring.languages.tamil.
badwords
= {tamil.badwords}¶ RegexMatches
features via a list of badword detecting regexes.
-
revscoring.languages.tamil.
informals
= {tamil.informals}¶ RegexMatches
features via a list of informal word detecting regexes.
-
revscoring.languages.tamil.
name
= 'tamil'¶ - try:
- import enchant dictionary = enchant.Dict(“ta”)
- except enchant.errors.DictNotFoundError:
- raise ImportError(“No enchant-compatible dictionary found for ‘ta’. ” +
- “Consider installing ‘aspell-ta’.”)
dictionary = Dictionary(name + “.dictionary”, dictionary.check) “””
Dictionary
features viaenchant.Dict
“ta”. Provided by aspell-ta. “”“