revscoring.languages¶
This module implements a set of languages as collections of features that are language specific.
feature collections¶
Languages implement a subset of feature collections (e.g.
Dictionary,
Stopwords,
Stemmed and
RegexMatches) based on what
language assets are available. See revscoring.languages.features.
albanian¶
-
revscoring.languages.albanian.badwords= {albanian.badwords}¶ RegexMatchesfeatures via a list of badword detecting regexes.
-
revscoring.languages.albanian.informals= {albanian.informals}¶ RegexMatchesfeatures via a list of informal word detecting regexes.
arabic¶
bengali¶
-
revscoring.languages.bengali.badwords= {bengali.badwords}¶ RegexMatchesfeatures via a list of badword detecting regexes.
-
revscoring.languages.bengali.informals= {bengali.informals}¶ RegexMatchesfeatures via a list of informal word detecting regexes.
-
revscoring.languages.bengali.name= 'bengali'¶ - try:
- import enchant dictionary = enchant.Dict(“bn”)
- except enchant.errors.DictNotFoundError:
- raise ImportError(“No enchant-compatible dictionary found for ‘bn’. ” +
- “Consider installing ‘aspell-bn’.”)
dictionary = Dictionary(name + “.dictionary”, dictionary.check) “””
Dictionaryfeatures viaenchant.Dict“bn”. Provided by aspell-bn “”“
-
revscoring.languages.bengali.stopwords= {bengali.stopwords}¶ Stopwordsfeatures copied from “common words” in https://meta.wikimedia.org/wiki/?oldid=16626444
bosnian¶
catalan¶
chinese¶
-
revscoring.languages.chinese.badwords= {chinese.badwords}¶ RegexMatchesfeatures via a list of badword detecting regexes.
-
revscoring.languages.chinese.informals= {chinese.informals}¶ RegexMatchesfeatures via a list of informal word detecting regexes.
-
revscoring.languages.chinese.words_to_watch= {chinese.words_to_watch}¶ RegexMatchesfeatures via a list of advertising language regexes.
croatian¶
czech¶
dutch¶
english¶
estonian¶
finnishswedish
french¶
galician¶
german¶
greek¶
hebrew¶
hindi¶
hungarian¶
indonesian¶
italian¶
japanese¶
-
revscoring.languages.japanese.badwords= {japanese.badwords}¶ RegexMatchesfeatures via a list of badword detecting regexes.
-
revscoring.languages.japanese.informals= {japanese.informals}¶ RegexMatchesfeatures via a list of informal word detecting regexes.
korean¶
-
revscoring.languages.korean.badwords= {korean.badwords}¶ RegexMatchesfeatures via a list of badword detecting regexes.
-
revscoring.languages.korean.informals= {korean.informals}¶ RegexMatchesfeatures via a list of informal word detecting regexes.
latvian¶
norwegian¶
persian¶
portuguese¶
romanian¶
spanish¶
swedish¶
tamil¶
-
revscoring.languages.tamil.badwords= {tamil.badwords}¶ RegexMatchesfeatures via a list of badword detecting regexes.
-
revscoring.languages.tamil.informals= {tamil.informals}¶ RegexMatchesfeatures via a list of informal word detecting regexes.
-
revscoring.languages.tamil.name= 'tamil'¶ - try:
- import enchant dictionary = enchant.Dict(“ta”)
- except enchant.errors.DictNotFoundError:
- raise ImportError(“No enchant-compatible dictionary found for ‘ta’. ” +
- “Consider installing ‘aspell-ta’.”)
dictionary = Dictionary(name + “.dictionary”, dictionary.check) “””
Dictionaryfeatures viaenchant.Dict“ta”. Provided by aspell-ta. “”“