diff options
Diffstat (limited to 'extensions/SpamBlacklist/README')
-rw-r--r-- | extensions/SpamBlacklist/README | 165 |
1 files changed, 165 insertions, 0 deletions
diff --git a/extensions/SpamBlacklist/README b/extensions/SpamBlacklist/README new file mode 100644 index 00000000..370a90b3 --- /dev/null +++ b/extensions/SpamBlacklist/README @@ -0,0 +1,165 @@ +MediaWiki extension: SpamBlacklist +---------------------------------- + +SpamBlacklist is a simple edit filter extension. When someone tries to save the +page, it checks the text against a potentially very large list of "bad" +hostnames. If there is a match, it displays an error message to the user and +refuses to save the page. + +To enable it, first download a copy of the SpamBlacklist directory and put it +into your extensions directory. Then put the following at the end of your +LocalSettings.php: + +require_once( "$IP/extensions/SpamBlacklist/SpamBlacklist.php" ); + +The list of bad URLs can be drawn from multiple sources. These sources are +configured with the $wgSpamBlacklistFiles global variable. This global variable +can be set in LocalSettings.php, AFTER including SpamBlacklist.php. + +$wgSpamBlacklistFiles is an array, each value containing either a URL, a filename +or a database location. Specifying a database location allows you to draw the +blacklist from a page on your wiki. The format of the database location +specifier is "DB: <db name> <title>". + +Example: + +require_once( "$IP/extensions/SpamBlacklist/SpamBlacklist.php" ); +$wgSpamBlacklistFiles = array( + "$IP/extensions/SpamBlacklist/wikimedia_blacklist", // Wikimedia's list + +// database title + "DB: wikidb My_spam_blacklist", +); + +The local pages [[MediaWiki:Spam-blacklist]] and [[MediaWiki:Spam-whitelist]] +will always be used, whatever additional files are listed. + +Compatibility +----------- + +This extension is primarily maintained to run on the latest release version +of MediaWiki (1.22.x as of this writing) and development versions, however +the current version should work up to 1.21. + +If you are using an older version of MediaWiki, you can checkout an +older release branch, for example MediaWiki 1.20 would use REL1_20. + +For even older versions, you may be able to dig older versions out of the +Git repository which work, but if using Wikimedia's blacklist file +you will likely have problems with failure due to the large size of the +blacklist not being handled by old versions of the code. + + +File format +----------- + +In simple terms: + * Everything from a "#" character to the end of the line is a comment + * Every non-blank line is a regex fragment which will only match inside URLs + +Internally, a regex is formed which looks like this: + + !http://[a-z0-9\-.]*(line 1|line 2|line 3|....)!Si + +A few notes about this format. It's not necessary to add www to the start of +hostnames, the regex is designed to match any subdomain. Don't add patterns +to your file which may run off the end of the URL, e.g. anything containing +".*". Unlike in some similar systems, the line-end metacharacter "$" will not +assert the end of the hostname, it'll assert the end of the page. + +Performance +----------- + +This extension uses a small "loader" file, to avoid loading all the code on +every page view. This means that page view performance will not be affected +even if you are not running a PHP bytecode cache such as Turck MMCache. Note +that a bytecode cache is strongly recommended for any MediaWiki installation. + +The regex match itself generally adds an insignificant overhead to page saves, +on the order of 100ms in our experience. However loading the spam file from disk +or the database, and constructing the regex, may take a significant amount of +time depending on your hardware. If you find that enabling this extension slows +down saves excessively, try installing MemCached or another supported data +caching solution. The SpamBlacklist extension will cache the constructed regex +if such a system is present. + +Caching behavior +---------------- + +Blacklist files loaded from remote web sites are cached locally, in the cache +subsystem used for MediaWiki's localization. (This usually means the objectcache +table on a default install.) + +By default, the list is cached for 15 minutes (if successfully fetched) or +10 minutes (if the network fetch failed), after which point it will be fetched +again when next requested. This should be a decent balance between avoiding +too-frequent fetches if your site is frequently used and staying up to date. + +Fully-processed blacklist data may be cached in memcached or another shared +memory cache if it's been configured in MediaWiki. + + +Stability +--------- + +This extension has not been widely tested outside Wikimedia. Although it has +been in production on Wikimedia websites since December 2004, it should be +considered experimental. Its design is simple, with little input validation, so +unexpected behavior due to incorrect regular expression input or non-standard +configuration is entirely possible. + +Obtaining or making blacklists +------------------------------ + +The primary source for a MediaWiki-compatible blacklist file is the Wikimedia +spam blacklist on meta: + + http://meta.wikimedia.org/wiki/Spam_blacklist + +In the default configuration, the extension loads this list from our site +once every 10-15 minutes. + +The Wikimedia spam blacklist can only be edited by trusted administrators. +Wikimedia hosts large, diverse wikis with many thousands of external links, +hence the Wikimedia blacklist is comparatively conservative in the links it +blocks. You may want to add your own keyword blocks or even ccTLD blocks. +You may suggest modifications to the Wikimedia blacklist at: + + http://meta.wikimedia.org/wiki/Talk:Spam_blacklist + +To make maintenance of local lists easier, you may wish to add a DB: source to +$wgSpamBlacklistFiles and hence create a blacklist on your wiki. If you do this, +it is strongly recommended that you protect the page from general editing. +Besides the obvious danger that someone may add a regex that matches everything, +please note that an attacker with the ability to input arbitrary regular +expressions may be able to generate segfaults in the PCRE library. + +Whitelisting +------------ + +You may sometimes find that a site listed in a centrally-maintained blacklist +contains something you nonetheless want to link to. + +A local whitelist can be maintained by creating a [[MediaWiki:Spam-whitelist]] +page and listing hostnames in it, using the same format as the blacklists. +URLs matching the whitelist will be ignored locally. + +Logging +------- + +To aid with tracking which domains are being spammed, this extension has +multiple logging features. By default, hits are included in the standard +debug log (controlled by $wgDebugLogFile). You can grep for 'SpamBlacklistHit', +which includes the IP of the user and the URL they tried to submit. This +file is only availible for people with server access and includes private info. + +You can also enable logging to [[Special:Log]] by setting $wgLogSpamBlacklistHits to +true. This will include the account which tripped the blacklist, the page title the +edit was attempted on, and the specific URL. By default this log is only viewable +to wiki administrators, and you can grant other groups access by giving them the +"spamblacklistlog" permission. + +Copyright +--------- +This extension and this documentation was written by Tim Starling and is +ambiguously licensed. |