revscoring.languages

This module implements a set of languages as collections of features that are language specific.

feature collections

Languages implement a subset of feature collections (e.g. Dictionary, Stopwords, Stemmed and RegexMatches) based on what language assets are available. See revscoring.languages.features.

albanian

revscoring.languages.albanian.badwords = {albanian.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.albanian.informals = {albanian.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.albanian.stopwords = {albanian.stopwords}

Stopwords features copied from

arabic

bengali

revscoring.languages.bengali.badwords = {bengali.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.bengali.informals = {bengali.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.bengali.name = 'bengali'
try:
import enchant dictionary = enchant.Dict(“bn”)
except enchant.errors.DictNotFoundError:
raise ImportError(“No enchant-compatible dictionary found for ‘bn’. ” +
“Consider installing ‘aspell-bn’.”)

dictionary = Dictionary(name + “.dictionary”, dictionary.check) “”” Dictionary features via enchant.Dict “bn”. Provided by aspell-bn “”“

revscoring.languages.bengali.stopwords = {bengali.stopwords}

Stopwords features copied from “common words” in https://meta.wikimedia.org/wiki/?oldid=16626444

bosnian

catalan

chinese

revscoring.languages.chinese.badwords = {chinese.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.chinese.informals = {chinese.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.chinese.words_to_watch = {chinese.words_to_watch}

RegexMatches features via a list of advertising language regexes.

croatian

czech

dutch

english

estonian

finnishswedish

french

galician

german

greek

hebrew

hindi

hungarian

indonesian

italian

japanese

revscoring.languages.japanese.badwords = {japanese.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.japanese.informals = {japanese.informals}

RegexMatches features via a list of informal word detecting regexes.

korean

revscoring.languages.korean.badwords = {korean.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.korean.informals = {korean.informals}

RegexMatches features via a list of informal word detecting regexes.

latvian

norwegian

persian

portuguese

romanian

spanish

swedish

tamil

revscoring.languages.tamil.badwords = {tamil.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.tamil.informals = {tamil.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.tamil.name = 'tamil'
try:
import enchant dictionary = enchant.Dict(“ta”)
except enchant.errors.DictNotFoundError:
raise ImportError(“No enchant-compatible dictionary found for ‘ta’. ” +
“Consider installing ‘aspell-ta’.”)

dictionary = Dictionary(name + “.dictionary”, dictionary.check) “”” Dictionary features via enchant.Dict “ta”. Provided by aspell-ta. “”“

turkish

ukrainian

vietnamese