diff options
author | Pierre Schmitz <pierre@archlinux.de> | 2006-10-11 18:12:39 +0000 |
---|---|---|
committer | Pierre Schmitz <pierre@archlinux.de> | 2006-10-11 18:12:39 +0000 |
commit | 183851b06bd6c52f3cae5375f433da720d410447 (patch) | |
tree | a477257decbf3360127f6739c2f9d0ec57a03d39 /docs |
MediaWiki 1.7.1 wiederhergestellt
Diffstat (limited to 'docs')
-rw-r--r-- | docs/.htaccess | 1 | ||||
-rw-r--r-- | docs/README | 17 | ||||
-rw-r--r-- | docs/database.txt | 174 | ||||
-rw-r--r-- | docs/deferred.txt | 19 | ||||
-rw-r--r-- | docs/design.txt | 128 | ||||
-rw-r--r-- | docs/export-0.1.xsd | 76 | ||||
-rw-r--r-- | docs/export-0.2.xsd | 100 | ||||
-rw-r--r-- | docs/export-0.3.xsd | 154 | ||||
-rw-r--r-- | docs/export-demo.xml | 115 | ||||
-rw-r--r-- | docs/globals.txt | 74 | ||||
-rw-r--r-- | docs/hooks.txt | 502 | ||||
-rw-r--r-- | docs/html/README | 4 | ||||
-rw-r--r-- | docs/language.txt | 24 | ||||
-rw-r--r-- | docs/linkcache.txt | 18 | ||||
-rw-r--r-- | docs/magicword.txt | 44 | ||||
-rw-r--r-- | docs/memcached.txt | 132 | ||||
-rw-r--r-- | docs/php-memcached/ChangeLog | 45 | ||||
-rw-r--r-- | docs/php-memcached/Documentation | 258 | ||||
-rw-r--r-- | docs/schema.txt | 6 | ||||
-rw-r--r-- | docs/skin.txt | 48 | ||||
-rw-r--r-- | docs/title.txt | 72 | ||||
-rw-r--r-- | docs/user.txt | 63 |
22 files changed, 2074 insertions, 0 deletions
diff --git a/docs/.htaccess b/docs/.htaccess new file mode 100644 index 00000000..b63d4018 --- /dev/null +++ b/docs/.htaccess @@ -0,0 +1 @@ +Deny from All diff --git a/docs/README b/docs/README new file mode 100644 index 00000000..43ac8ef5 --- /dev/null +++ b/docs/README @@ -0,0 +1,17 @@ +[July 5th 2005] + +The 'docs' directory contain various text files that should help you +understand the most importants classes in MediaWiki. + +API documentation is sometime generated and uploaded at: + http://svn.wikimedia.org/doc/ + +You can get a fresh version using 'make doc' or mwdocgen.php +in the ../maintenance/ directory. + + + +For end user / administrators, most of the documentation +is located online at: + http://meta.wikimedia.org/wiki/Help:Help + diff --git a/docs/database.txt b/docs/database.txt new file mode 100644 index 00000000..679492a1 --- /dev/null +++ b/docs/database.txt @@ -0,0 +1,174 @@ +Some information about database access in MediaWiki. +By Tim Starling, January 2006. + +------------------------------------------------------------------------ + API +------------------------------------------------------------------------ + +For a database API reference, please see the auto-generated +documentation: + +http://wikipedia.sourceforge.net/doc/MediaWiki/Database.html + +To make a read query, something like this usually suffices: + +$dbr =& wfGetDB( DB_SLAVE ); +$res = $dbr->select( /* ...see docs... */ ); +while ( $row = $dbr->fetchObject( $res ) ) { + ... +} +$dbr->freeResult( $res ); + +Note the assignment operator in the while condition. + +For a write query, use something like: + +$dbw =& wfGetDB( DB_MASTER ); +$dbw->insert( /* ...see docs... */ ); + +We use the convention $dbr for read and $dbw for write to help you keep +track of whether the database object is a slave (read-only) or a master +(read/write). If you write to a slave, the world will explode. Or to be +precise, a subsequent write query which succeeded on the master may fail +when replicated to the slave due to a unique key collision. Replication +on the slave will stop and it may take hours to repair the database and +get it back online. Setting read_only in my.cnf on the slave will avoid +this scenario, but given the dire consequences, we prefer to have as +many checks as possible. + +We provide a query() function for raw SQL, but the wrapper functions +like select() and insert() are usually more convenient. They take care +of things like table prefixes and escaping for you. If you really need +to make your own SQL, please read the documentation for tableName() and +addQuotes(). You will need both of them. + + +------------------------------------------------------------------------ + Basic query optimisation +------------------------------------------------------------------------ + +MediaWiki developers who need to write DB queries should have some +understanding of databases and the performance issues associated with +them. Patches containing unacceptably slow features will not be +accepted. Unindexed queries are generally not welcome in MediaWiki, +except in special pages derived from QueryPage. It's a common pitfall +for new developers to submit code containing SQL queries which examine +huge numbers of rows. Remember that COUNT(*) is O(N), counting rows in a +table is like counting beans in a bucket. + + +------------------------------------------------------------------------ + Replication +------------------------------------------------------------------------ + +The largest installation of MediaWiki, Wikimedia, uses a large set of +slave MySQL servers replicating writes made to a master MySQL server. It +is important to understand the issues associated with this setup if you +want to write code destined for Wikipedia. + +It's often the case that the best algorithm to use for a given task +depends on whether or not replication is in use. Due to our unabashed +Wikipedia-centrism, we often just use the replication-friendly version, +but if you like, you can use $wgLoadBalancer->getServerCount() > 1 to +check to see if replication is in use. + +=== Lag === + +Lag primarily occurs when large write queries are sent to the master. +Writes on the master are executed in parallel, but they are executed in +serial when they are replicated to the slaves. The master writes the +query to the binlog when the transaction is committed. The slaves poll +the binlog and start executing the query as soon as it appears. They can +service reads while they are performing a write query, but will not read +anything more from the binlog and thus will perform no more writes. This +means that if the write query runs for a long time, the slaves will lag +behind the master for the time it takes for the write query to complete. + +Lag can be exacerbated by high read load. MediaWiki's load balancer will +stop sending reads to a slave when it is lagged by more than 30 seconds. +If the load ratios are set incorrectly, or if there is too much load +generally, this may lead to a slave permanently hovering around 30 +seconds lag. + +If all slaves are lagged by more than 30 seconds, MediaWiki will stop +writing to the database. All edits and other write operations will be +refused, with an error returned to the user. This gives the slaves a +chance to catch up. Before we had this mechanism, the slaves would +regularly lag by several minutes, making review of recent edits +difficult. + +In addition to this, MediaWiki attempts to ensure that the user sees +events occuring on the wiki in chronological order. A few seconds of lag +can be tolerated, as long as the user sees a consistent picture from +subsequent requests. This is done by saving the master binlog position +in the session, and then at the start of each request, waiting for the +slave to catch up to that position before doing any reads from it. If +this wait times out, reads are allowed anyway, but the request is +considered to be in "lagged slave mode". Lagged slave mode can be +checked by calling $wgLoadBalancer->getLaggedSlaveMode(). The only +practical consequence at present is a warning displayed in the page +footer. + +=== Lag avoidance === + +To avoid excessive lag, queries which write large numbers of rows should +be split up, generally to write one row at a time. Multi-row INSERT ... +SELECT queries are the worst offenders should be avoided altogether. +Instead do the select first and then the insert. + +=== Working with lag === + +Despite our best efforts, it's not practical to guarantee a low-lag +environment. Lag will usually be less than one second, but may +occasionally be up to 30 seconds. For scalability, it's very important +to keep load on the master low, so simply sending all your queries to +the master is not the answer. So when you have a genuine need for +up-to-date data, the following approach is advised: + +1) Do a quick query to the master for a sequence number or timestamp 2) +Run the full query on the slave and check if it matches the data you got +from the master 3) If it doesn't, run the full query on the master + +To avoid swamping the master every time the slaves lag, use of this +approach should be kept to a minimum. In most cases you should just read +from the slave and let the user deal with the delay. + + +------------------------------------------------------------------------ + Lock contention +------------------------------------------------------------------------ + +Due to the high write rate on Wikipedia (and some other wikis), +MediaWiki developers need to be very careful to structure their writes +to avoid long-lasting locks. By default, MediaWiki opens a transaction +at the first query, and commits it before the output is sent. Locks will +be held from the time when the query is done until the commit. So you +can reduce lock time by doing as much processing as possible before you +do your write queries. Update operations which do not require database +access can be delayed until after the commit by adding an object to +$wgPostCommitUpdateList. + +Often this approach is not good enough, and it becomes necessary to +enclose small groups of queries in their own transaction. Use the +following syntax: + +$dbw =& wfGetDB( DB_MASTER ); +$dbw->immediateBegin(); +/* Do queries */ +$dbw->immediateCommit(); + +There are functions called begin() and commit() but they don't do what +you would expect. Don't use them. + +Use of locking reads (e.g. the FOR UPDATE clause) is not advised. They +are poorly implemented in InnoDB and will cause regular deadlock errors. +It's also surprisingly easy to cripple the wiki with lock contention. If +you must use them, define a new flag for $wgAntiLockFlags which allows +them to be turned off, because we'll almost certainly need to do so on +the Wikimedia cluster. + +Instead of locking reads, combine your existence checks into your write +queries, by using an appropriate condition in the WHERE clause of an +UPDATE, or by using unique indexes in combination with INSERT IGNORE. +Then use the affected row count to see if the query succeeded. + diff --git a/docs/deferred.txt b/docs/deferred.txt new file mode 100644 index 00000000..445eb0e4 --- /dev/null +++ b/docs/deferred.txt @@ -0,0 +1,19 @@ + +deferred.txt + +A few of the database updates required by various functions here +can be deferred until after the result page is displayed to the +user. For example, updating the view counts, updating the +linked-to tables after a save, etc. PHP does not yet have any +way to tell the server to actually return and disconnect while +still running these updates (as a Java servelet could), but it +might have such a feature in the future. + +We handle these by creating a deferred-update object (in a real +O-O language these would be classes that implement an interface) +and putting those objects on a global list, then executing the +whole list after the page is displayed. We don't do anything +smart like collating updates to the same table or such because +the list is almost always going to have just one item on it, if +that, so it's not worth the trouble. + diff --git a/docs/design.txt b/docs/design.txt new file mode 100644 index 00000000..5fff9fd0 --- /dev/null +++ b/docs/design.txt @@ -0,0 +1,128 @@ +This is a brief overview of the new design. + +Primary source files/objects: + + index.php + Main script. It creates the necessary global objects and parses + the URL to determine what to do, which it then generally passes + off to somebody else (depending on the action to be taken). + + All of the functions to which it might delegate generally do + their job by sending content to the $wgOut object. After returning, + the script flushes that out by calling $wgOut->output(). If there + are any changes that need to be made to the database that can be + deferred until after page display, those happen at the end. + + Note that the order in the includes is touchy; Language uses + some global functions, etc. Likewise with the creation of the + global variables. Don't move them around without some forethought. + + User + Encapsulates the state of the user viewing/using the site. + Can be queried for things like the user's settings, name, etc. + Handles the details of getting and saving to the "user" table + of the database, and dealing with sessions and cookies. + More details in USER.TXT. + + OutputPage + Encapsulates the entire HTML page that will be sent in + response to any server request. It is used by calling its + functions to add text, headers, etc., in any order, and then + calling output() to send it all. It could be easily changed + to send incrementally if that becomes useful, but I prefer + the flexibility. This should also do the output encoding. + The system allocates a global one in $wgOut. This class + also handles converting wikitext format to HTML. + + Title + Represents the title of an article, and does all the work + of translating among various forms such as plain text, URL, + database key, etc. For convenience, and for historical + reasons, it also represents a few features of articles that + don't involve their text, such as access rights. + + Article + Encapsulates access to the "cur" table of the database. The + object represents a an article, and maintains state such as + text (in Wikitext format), flags, etc. + + Skin + Encapsulates a "look and feel" for the wiki. All of the + functions that render HTML, and make choices about how to + render it, are here, and are called from various other + places when needed (most notably, OutputPage::addWikiText()). + The StandardSkin object is a complete implementation, and is + meant to be subclassed with other skins that may override + some of its functions. The User object contains a reference + to a skin (according to that user's preference), and so + rather than having a global skin object we just rely on the + global User and get the skin with $wgUser->getSkin(). + + Language + Represents the language used for incidental text, and also + has some character encoding functions and other locale stuff. + A global one is allocated in $wgLang. + + LinkCache + Keeps information on existence of articles. See LINKCACHE.TXT. + +Naming/coding conventions: + + These are meant to be descriptive, not dictatorial; I won't + presume to tell you how to program, I'm just describing the + methods I chose to use for myself. If you do choose to + follow these guidelines, it will probably be easier for you + to collaborate with others on the project, but if you want + to contribute without bothering, by all means do so (and don't + be surprised if I reformat your code). + + - I have the code indented with tabs to save file size and + so that users can set their tab stops to any depth they like. + I use 4-space tab stops, which work well. I also use K&R brace + matching style. I know that's a religious issue for some, + so if you want to use a style that puts opening braces on the + next line, that's OK too, but please don't use a style where + closing braces don't align with either the opening brace on + its own line or the statement that opened the block--that's + confusing as hell. + + - PHP doesn't have "private" member variables of functions, + so I've used the comment "/* private */" in some places to + indicate my intent. Don't access things marked that way + from outside the class def--use the accessor functions (or + make your own if you need them). Yes, even some globals + are marked private, because PHP is broken and doesn't + allow static class variables. + + - Member variables are generally "mXxx" to distinguish them. + This should make it easier to spot errors of forgetting the + required "$this->", which PHP will happily accept by creating + a new local variable rather than complaining. + + - Globals are particularly evil in PHP; it sets a lot of them + automatically from cookies, query strings, and such, leading to + namespace conflicts; when a variable name is used in a function, + it is silently declared as a new local masking the global, so + you'll get weird error because you forgot the global declaration; + lack of static class member variables means you have to use + globals for them, etc. Evil, evil. + + I think I've managed to pare down the number of globals we use + to a scant few dozen or so, and I've prefixed them all with "wg" + so you can spot errors better (odds are, if you see a "wg" + variable being used in a function that doesn't declare it global, + that's probably an error). + + Other conventions: Top-level functions are wfFuncname(), names + of session variables are wsName, cookies wcName, and form field + values wpName ("p" for "POST"). + + - Be kind to your release manager and don't use CVS keywords (Id, + Revision, etc.) to mark file versions. They make merging code + between different branches a pain for CVS, and are kind of sketchy + for versions after that. (Yes, you can use the '-kk' flag so that + merges ignore keywords, but that messes up binary files. See + https://www.cvshome.org/docs/manual/cvs-1.11.18/cvs_5.html#SEC64). + + + diff --git a/docs/export-0.1.xsd b/docs/export-0.1.xsd new file mode 100644 index 00000000..0b3eb179 --- /dev/null +++ b/docs/export-0.1.xsd @@ -0,0 +1,76 @@ +<?xml version="1.0" encoding="UTF-8" ?> +<!-- + This is an XML Schema description of the format + output by MediaWiki's Special:Export system. + + The canonical URL to the schema document is: + http://www.mediawiki.org/xml/export-0.1.xsd + + Use the namespace: + http://www.mediawiki.org/xml/export-0.1/ +--> +<schema xmlns="http://www.w3.org/2001/XMLSchema" + xmlns:mw="http://www.mediawiki.org/xml/export-0.1/" + targetNamespace="http://www.mediawiki.org/xml/export-0.1/" + elementFormDefault="qualified"> + + <annotation> + <documentation xml:lang="en"> + MediaWiki's page export format + </documentation> + </annotation> + + <!-- Need this to reference xml:lang --> + <import namespace="http://www.w3.org/XML/1998/namespace" + schemaLocation="http://www.w3.org/2001/xml.xsd"/> + + <!-- Our root element --> + <element name="mediawiki" type="mw:MediaWikiType"/> + + <complexType name="MediaWikiType"> + <sequence> + <element name="page" type="mw:PageType" + minOccurs="0" maxOccurs="unbounded"/> + </sequence> + <attribute name="version" type="string" use="required"/> + <attribute ref="xml:lang" use="required"/> + </complexType> + + <complexType name="PageType"> + <sequence> + <!-- Title in text form. (Using spaces, not underscores; with namespace ) --> + <element name="title" type="string"/> + + <!-- optional page ID number --> + <element name="id" type="positiveInteger" minOccurs="0"/> + + <!-- comma-separated list of string tokens, if present --> + <element name="restrictions" type="string" minOccurs="0"/> + + <!-- Zero or more sets of revision data --> + <element name="revision" type="mw:RevisionType" + minOccurs="0" maxOccurs="unbounded"/> + </sequence> + </complexType> + + <complexType name="RevisionType"> + <sequence> + <element name="id" type="positiveInteger" minOccurs="0"/> + <element name="timestamp" type="dateTime"/> + <element name="contributor" type="mw:ContributorType"/> + <element name="minor" minOccurs="0" /> + <element name="comment" type="string" minOccurs="0"/> + <element name="text" type="string"/> + </sequence> + </complexType> + + <complexType name="ContributorType"> + <sequence> + <element name="username" type="string" minOccurs="0"/> + <element name="id" type="positiveInteger" minOccurs="0" /> + + <element name="ip" type="string" minOccurs="0"/> + </sequence> + </complexType> + +</schema> diff --git a/docs/export-0.2.xsd b/docs/export-0.2.xsd new file mode 100644 index 00000000..8acbf543 --- /dev/null +++ b/docs/export-0.2.xsd @@ -0,0 +1,100 @@ +<?xml version="1.0" encoding="UTF-8" ?> +<!-- + This is an XML Schema description of the format + output by MediaWiki's Special:Export system. + + Version 0.2 adds optional basic file upload info support, + which is used by our OAI export/import submodule. + + The canonical URL to the schema document is: + http://www.mediawiki.org/xml/export-0.2.xsd + + Use the namespace: + http://www.mediawiki.org/xml/export-0.2/ +--> +<schema xmlns="http://www.w3.org/2001/XMLSchema" + xmlns:mw="http://www.mediawiki.org/xml/export-0.2/" + targetNamespace="http://www.mediawiki.org/xml/export-0.2/" + elementFormDefault="qualified"> + + <annotation> + <documentation xml:lang="en"> + MediaWiki's page export format + </documentation> + </annotation> + + <!-- Need this to reference xml:lang --> + <import namespace="http://www.w3.org/XML/1998/namespace" + schemaLocation="http://www.w3.org/2001/xml.xsd"/> + + <!-- Our root element --> + <element name="mediawiki" type="mw:MediaWikiType"/> + + <complexType name="MediaWikiType"> + <sequence> + <element name="page" type="mw:PageType" + minOccurs="0" maxOccurs="unbounded"/> + </sequence> + <attribute name="version" type="string" use="required"/> + <attribute ref="xml:lang" use="required"/> + </complexType> + + <complexType name="PageType"> + <sequence> + <!-- Title in text form. (Using spaces, not underscores; with namespace ) --> + <element name="title" type="string"/> + + <!-- optional page ID number --> + <element name="id" type="positiveInteger" minOccurs="0"/> + + <!-- comma-separated list of string tokens, if present --> + <element name="restrictions" type="string" minOccurs="0"/> + + <!-- Zero or more sets of revision or upload data --> + <choice minOccurs="0" maxOccurs="unbounded"> + <element name="revision" type="mw:RevisionType" /> + <element name="upload" type="mw:UploadType" /> + </choice> + </sequence> + </complexType> + + <complexType name="RevisionType"> + <sequence> + <element name="id" type="positiveInteger" minOccurs="0"/> + <element name="timestamp" type="dateTime"/> + <element name="contributor" type="mw:ContributorType"/> + <element name="minor" minOccurs="0" /> + <element name="comment" type="string" minOccurs="0"/> + <element name="text" type="string"/> + </sequence> + </complexType> + + <complexType name="ContributorType"> + <sequence> + <element name="username" type="string" minOccurs="0"/> + <element name="id" type="positiveInteger" minOccurs="0" /> + + <element name="ip" type="string" minOccurs="0"/> + </sequence> + </complexType> + + <complexType name="UploadType"> + <sequence> + <!-- Revision-style data... --> + <element name="timestamp" type="dateTime"/> + <element name="contributor" type="mw:ContributorType"/> + <element name="comment" type="string" minOccurs="0"/> + + <!-- Filename. (Using underscores, not spaces. No 'Image:' namespace marker.) --> + <element name="filename" type="string"/> + + <!-- URI at which this resource can be obtained --> + <element name="src" type="anyURI"/> + + <element name="size" type="positiveInteger" /> + + <!-- TODO: add other metadata fields --> + </sequence> + </complexType> + +</schema> diff --git a/docs/export-0.3.xsd b/docs/export-0.3.xsd new file mode 100644 index 00000000..1e0b7c88 --- /dev/null +++ b/docs/export-0.3.xsd @@ -0,0 +1,154 @@ +<?xml version="1.0" encoding="UTF-8" ?> +<!-- + This is an XML Schema description of the format + output by MediaWiki's Special:Export system. + + Version 0.2 adds optional basic file upload info support, + which is used by our OAI export/import submodule. + + Version 0.3 adds some site configuration information such + as a list of defined namespaces. + + The canonical URL to the schema document is: + http://www.mediawiki.org/xml/export-0.3.xsd + + Use the namespace: + http://www.mediawiki.org/xml/export-0.3/ +--> +<schema xmlns="http://www.w3.org/2001/XMLSchema" + xmlns:mw="http://www.mediawiki.org/xml/export-0.3/" + targetNamespace="http://www.mediawiki.org/xml/export-0.3/" + elementFormDefault="qualified"> + + <annotation> + <documentation xml:lang="en"> + MediaWiki's page export format + </documentation> + </annotation> + + <!-- Need this to reference xml:lang --> + <import namespace="http://www.w3.org/XML/1998/namespace" + schemaLocation="http://www.w3.org/2001/xml.xsd"/> + + <!-- Our root element --> + <element name="mediawiki" type="mw:MediaWikiType"/> + + <complexType name="MediaWikiType"> + <sequence> + <element name="siteinfo" type="mw:SiteInfoType" + minOccurs="0" maxOccurs="1"/> + <element name="page" type="mw:PageType" + minOccurs="0" maxOccurs="unbounded"/> + </sequence> + <attribute name="version" type="string" use="required"/> + <attribute ref="xml:lang" use="required"/> + </complexType> + + <complexType name="SiteInfoType"> + <sequence> + <element name="sitename" type="string" minOccurs="0" /> + <element name="base" type="anyURI" minOccurs="0" /> + <element name="generator" type="string" minOccurs="0" /> + <element name="case" type="mw:CaseType" minOccurs="0" /> + <element name="namespaces" type="mw:NamespacesType" minOccurs="0" /> + </sequence> + </complexType> + + <simpleType name="CaseType"> + <restriction base="NMTOKEN"> + <!-- Cannot have two titles differing only by case of first letter. --> + <!-- Default behavior through 1.5, $wgCapitalLinks = true --> + <enumeration value="first-letter" /> + + <!-- Complete title is case-sensitive --> + <!-- Behavior when $wgCapitalLinks = false --> + <enumeration value="case-sensitive" /> + + <!-- Cannot have two titles differing only by case. --> + <!-- Not yet implemented as of MediaWiki 1.5 --> + <enumeration value="case-insensitive" /> + </restriction> + </simpleType> + + <complexType name="NamespacesType"> + <sequence> + <element name="namespace" type="mw:NamespaceType" + minOccurs="0" maxOccurs="unbounded" /> + </sequence> + </complexType> + + <complexType name="NamespaceType"> + <simpleContent> + <extension base="string"> + <attribute name="key" type="integer" /> + </extension> + </simpleContent> + </complexType> + + <complexType name="PageType"> + <sequence> + <!-- Title in text form. (Using spaces, not underscores; with namespace ) --> + <element name="title" type="string"/> + + <!-- optional page ID number --> + <element name="id" type="positiveInteger" minOccurs="0"/> + + <!-- comma-separated list of string tokens, if present --> + <element name="restrictions" type="string" minOccurs="0"/> + + <!-- Zero or more sets of revision or upload data --> + <choice minOccurs="0" maxOccurs="unbounded"> + <element name="revision" type="mw:RevisionType" /> + <element name="upload" type="mw:UploadType" /> + </choice> + </sequence> + </complexType> + + <complexType name="RevisionType"> + <sequence> + <element name="id" type="positiveInteger" minOccurs="0"/> + <element name="timestamp" type="dateTime"/> + <element name="contributor" type="mw:ContributorType"/> + <element name="minor" minOccurs="0" /> + <element name="comment" type="string" minOccurs="0"/> + <element name="text" type="mw:TextType" /> + </sequence> + </complexType> + + <complexType name="TextType"> + <simpleContent> + <extension base="string"> + <attribute ref="xml:space" use="optional" default="preserve" /> + </extension> + </simpleContent> + </complexType> + + <complexType name="ContributorType"> + <sequence> + <element name="username" type="string" minOccurs="0"/> + <element name="id" type="positiveInteger" minOccurs="0" /> + + <element name="ip" type="string" minOccurs="0"/> + </sequence> + </complexType> + + <complexType name="UploadType"> + <sequence> + <!-- Revision-style data... --> + <element name="timestamp" type="dateTime"/> + <element name="contributor" type="mw:ContributorType"/> + <element name="comment" type="string" minOccurs="0"/> + + <!-- Filename. (Using underscores, not spaces. No 'Image:' namespace marker.) --> + <element name="filename" type="string"/> + + <!-- URI at which this resource can be obtained --> + <element name="src" type="anyURI"/> + + <element name="size" type="positiveInteger" /> + + <!-- TODO: add other metadata fields --> + </sequence> + </complexType> + +</schema> diff --git a/docs/export-demo.xml b/docs/export-demo.xml new file mode 100644 index 00000000..1b4bd7cf --- /dev/null +++ b/docs/export-demo.xml @@ -0,0 +1,115 @@ +<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.3/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/ http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="en"> + +<!-- Optional global configuration info --> +<siteinfo> + <!-- Site name, as set in $wgSitename --> + <sitename>DemoWiki</sitename> + + <!-- Forgot where you got this set? --> + <base>http://example.com/wiki/Main_Page</base> + + <!-- Source software version --> + <generator>MediaWiki 1.5.0</generator> + + <!-- Title case sensitivity options of the wiki this data came from --> + <!-- May be 'first-letter', 'case-sensitive', or 'case-insensitive' --> + <case>first-letter</case> + + <!-- Defined namespace keys on the source wiki. --> + <!-- Titles can be substring-split to obtain the symbolic numeric key --> + <namespaces> + <namespace key="-2">Media</namespace> + <namespace key="-1">Special</namespace> + <namespace key="0"></namespace> + <namespace key="1">Talk</namespace> + <namespace key="2">User</namespace> + <namespace key="3">User talk</namespace> + <namespace key="4">DemoWiki</namespace> + <namespace key="5">DemoWIki talk</namespace> + <namespace key="6">Image</namespace> + <namespace key="7">Image talk</namespace> + <namespace key="8">MediaWiki</namespace> + <namespace key="9">MediaWiki talk</namespace> + <namespace key="10">Template</namespace> + <namespace key="11">Template talk</namespace> + <namespace key="12">Help</namespace> + <namespace key="13">Help talk</namespace> + <namespace key="14">Category</namespace> + <namespace key="15">Category talk</namespace> + </namespaces> +</siteinfo> + +<!-- The rest of the data will be a series of page records --> +<page> + <!-- Titles are listed here in text form, with namespace prefix --> + <!-- if any, and spaces rather than the underscores used in URLs. --> + <title>Page title</title> + + <!-- The page's immutable page_id number in the source database. --> + <!-- Page ID numbers are kept across page moves, but may change --> + <!-- if a page is deleted and recreated. --> + <id>1</id> + + <!-- If restricted, the ACL is listed here raw. --> + <restrictions>edit=sysop:move=sysop</restrictions> + + <!-- With a series of revision records... --> + + <!-- Remember this is XML; if you must use a regex-based extractor --> + <!-- in place of a standard XML parser, be very careful. --> + <!-- * Don't forget to decode character entities! --> + <!-- * If using a 'loose' XML parser, ensure that whitespace is --> + <!-- preserved in the <text> elements. --> + <revision> + <!-- Unique revision ID number (rev_id) in the source database. --> + <!-- This number uniquely identifies the revision on that wiki. --> + <id>100</id> + + <timestamp>2001-01-15T13:15:00Z</timestamp> + <contributor><username>Foobar</username><id>42</id></contributor> + <minor /> + <comment>I have just one thing to say!</comment> + <text xml:space="preserve">A bunch of [[text]] here.</text> + </revision> + + <revision> + <timestamp>2001-01-15T13:10:27Z</timestamp> + <contributor><ip>10.0.0.2</ip></contributor> + <comment>new!</comment> + <text xml:space="preserve">An earlier [[revision]].</text> + </revision> +</page> + +<page> + <title>Talk:Page title</title> + <id>2</id> + <revision> + <id>101</id> + <timestamp>2001-01-15T14:03:00Z</timestamp> + <contributor><ip>10.0.0.2</ip></contributor> + <comment>hey</comment> + <text xml:space="preserve">WHYD YOU LOCK PAGE??!!! i was editing that jerk</text> + </revision> +</page> + +<page> + <title>Image:Some image.jpg</title> + <id>3</id> + <revision> + <id>102</id> + <timestamp>2001-01-15T20:34:12Z</timestamp> + <contributor><username>Foobar</username><id>42</id></contributor> + <comment>My awesomeest image!</comment> + <text xml:space="preserve">This is an awesome little imgae. I lurves it. {{PD}}</text> + </revision> + <upload> + <timestamp>2001-01-15T20:34:12Z</timestamp> + <contributor><username>Foobar</username><id>42</id></contributor> + <comment>My awesomeest image!</comment> + <filename>Some_image.jpg</filename> + <src>http://upload.wikimedia.org/commons/2/22/Some_image.jpg</src> + <size>12345</size> + </upload> +</page> + +</mediawiki> diff --git a/docs/globals.txt b/docs/globals.txt new file mode 100644 index 00000000..ecc5ab33 --- /dev/null +++ b/docs/globals.txt @@ -0,0 +1,74 @@ +globals.txt + +Globals are evil. The original MediaWiki code relied on +globals for processing context far too often. MediaWiki +development since then has been a story of slowly moving +context out of global variables and into objects. Storing +processing context in object member variables allows those +objects to be reused in a much more flexible way. Consider +the elegance of: + + # Generate the article HTML as if viewed by a web request + $article = new Article( Title::newFromText( $t ) ); + $article->view(); + +versus + + # Save current globals + $oldTitle = $wgTitle; + $oldArticle = $wgArticle; + + # Generate the HTML + $wgTitle = Title::newFromText( $t ); + $wgArticle = new Article; + $wgArticle->view(); + + # Restore globals + $wgTitle = $oldTitle + $wgArticle = $oldArticle + +Some of the current MediaWiki developers have an idle +fantasy that some day, globals will be eliminated from +MediaWiki entirely, replaced by an application object which +would be passed to constructors. Whether that would be an +efficient, convenient solution remains to be seen, but +certainly PHP 5 makes such object-oriented programming +models easier than they were in previous versions. + +For the time being though, MediaWiki programmers will have +to work in an environment with some global context. At the +time of writing, 418 globals were initialised on startup by +MediaWiki. 304 of these were configuration settings, which +are documented in DefaultSettings.php. There is no +comprehensive documentation for the remaining 114 globals, +however some of the most important ones are listed below. +They are typically initialised either in index.php or in +Setup.php. + + +$wgOut + OutputPage object for HTTP response. + +$wgUser + User object for the user associated with the current + request. + +$wgTitle + Title object created from the request URL. + +$wgLang + Language object selected by user preferences + +$wgContLang + Language object associated with the wiki being + viewed. + +$wgArticle + Article object corresponding to $wgTitle. + +$wgParser + Parser object. Parser extensions register their + hooks here. + +$wgLoadBalancer + A LoadBalancer object, manages database connections. diff --git a/docs/hooks.txt b/docs/hooks.txt new file mode 100644 index 00000000..4dd68f5f --- /dev/null +++ b/docs/hooks.txt @@ -0,0 +1,502 @@ +hooks.txt + +This document describes how event hooks work in MediaWiki; how to add +hooks for an event; and how to run hooks for an event. + +==Glossary== + +event + Something that happens with the wiki. For example: a user logs + in. A wiki page is saved. A wiki page is deleted. Often there are + two events associated with a single action: one before the code + is run to make the event happen, and one after. Each event has a + name, preferably in CamelCase. For example, 'UserLogin', + 'ArticleSave', 'ArticleSaveComplete', 'ArticleDelete'. + +hook + A clump of code and data that should be run when an event + happens. This can be either a function and a chunk of data, or an + object and a method. + +hook function + The function part of a hook. + +==Rationale== + +Hooks allow us to decouple optionally-run code from code that is run +for everyone. It allows MediaWiki hackers, third-party developers and +local administrators to define code that will be run at certain points +in the mainline code, and to modify the data run by that mainline +code. Hooks can keep mainline code simple, and make it easier to +write extensions. Hooks are a principled alternative to local patches. + +Consider, for example, two options in MediaWiki. One reverses the +order of a title before displaying the article; the other converts the +title to all uppercase letters. Currently, in MediaWiki code, we +would handle this as follows (note: not real code, here): + + function showAnArticle($article) { + global $wgReverseTitle, $wgCapitalizeTitle; + + if ($wgReverseTitle) { + wfReverseTitle($article); + } + + if ($wgCapitalizeTitle) { + wfCapitalizeTitle($article); + } + + # code to actually show the article goes here + } + +An extension writer, or a local admin, will often add custom code to +the function -- with or without a global variable. For example, +someone wanting email notification when an article is shown may add: + + function showAnArticle($article) { + global $wgReverseTitle, $wgCapitalizeTitle; + + if ($wgReverseTitle) { + wfReverseTitle($article); + } + + if ($wgCapitalizeTitle) { + wfCapitalizeTitle($article); + } + + # code to actually show the article goes here + + if ($wgNotifyArticle) { + wfNotifyArticleShow($article)); + } + } + +Using a hook-running strategy, we can avoid having all this +option-specific stuff in our mainline code. Using hooks, the function +becomes: + + function showAnArticle($article) { + + if (wfRunHooks('ArticleShow', array(&$article))) { + + # code to actually show the article goes here + + wfRunHooks('ArticleShowComplete', array(&$article)); + } + } + +We've cleaned up the code here by removing clumps of weird, +infrequently used code and moving them off somewhere else. It's much +easier for someone working with this code to see what's _really_ going +on, and make changes or fix bugs. + +In addition, we can take all the code that deals with the little-used +title-reversing options (say) and put it in one place. Instead of +having little title-reversing if-blocks spread all over the codebase +in showAnArticle, deleteAnArticle, exportArticle, etc., we can +concentrate it all in an extension file: + + function reverseArticleTitle($article) { + # ... + } + + function reverseForExport($article) { + # ... + } + +The setup function for the extension just has to add its hook +functions to the appropriate events: + + setupTitleReversingExtension() { + global $wgHooks; + + $wgHooks['ArticleShow'][] = 'reverseArticleTitle'; + $wgHooks['ArticleDelete'][] = 'reverseArticleTitle'; + $wgHooks['ArticleExport'][] = 'reverseForExport'; + } + +Having all this code related to the title-reversion option in one +place means that it's easier to read and understand; you don't have to +do a grep-find to see where the $wgReverseTitle variable is used, say. + +If the code is well enough isolated, it can even be excluded when not +used -- making for some slight savings in memory and load-up +performance at runtime. Admins who want to have all the reversed +titles can add: + + require_once('extensions/ReverseTitle.php'); + +...to their LocalSettings.php file; those of us who don't want or need +it can just leave it out. + +The extensions don't even have to be shipped with MediaWiki; they +could be provided by a third-party developer or written by the admin +him/herself. + +==Writing hooks== + +A hook is a chunk of code run at some particular event. It consists of: + + * a function with some optional accompanying data, or + * an object with a method and some optional accompanying data. + +Hooks are registered by adding them to the global $wgHooks array for a +given event. All the following are valid ways to define hooks: + + $wgHooks['EventName'][] = 'someFunction'; # function, no data + $wgHooks['EventName'][] = array('someFunction', $someData); + $wgHooks['EventName'][] = array('someFunction'); # weird, but OK + + $wgHooks['EventName'][] = $object; # object only + $wgHooks['EventName'][] = array($object, 'someMethod'); + $wgHooks['EventName'][] = array($object, 'someMethod', $someData); + $wgHooks['EventName'][] = array($object); # weird but OK + +When an event occurs, the function (or object method) will be called +with the optional data provided as well as event-specific parameters. +The above examples would result in the following code being executed +when 'EventName' happened: + + # function, no data + someFunction($param1, $param2) + # function with data + someFunction($someData, $param1, $param2) + + # object only + $object->onEventName($param1, $param2) + # object with method + $object->someMethod($param1, $param2) + # object with method and data + $object->someMethod($someData, $param1, $param2) + +Note that when an object is the hook, and there's no specified method, +the default method called is 'onEventName'. For different events this +would be different: 'onArticleSave', 'onUserLogin', etc. + +The extra data is useful if we want to use the same function or object +for different purposes. For example: + + $wgHooks['ArticleSaveComplete'][] = array('ircNotify', 'TimStarling'); + $wgHooks['ArticleSaveComplete'][] = array('ircNotify', 'brion'); + +This code would result in ircNotify being run twice when an article is +saved: once for 'TimStarling', and once for 'brion'. + +Hooks can return three possible values: + + * true: the hook has operated successfully + * "some string": an error occurred; processing should + stop and the error should be shown to the user + * false: the hook has successfully done the work + necessary and the calling function should skip + +The last result would be for cases where the hook function replaces +the main functionality. For example, if you wanted to authenticate +users to a custom system (LDAP, another PHP program, whatever), you +could do: + + $wgHooks['UserLogin'][] = array('ldapLogin', $ldapServer); + + function ldapLogin($username, $password) { + # log user into LDAP + return false; + } + +Returning false makes less sense for events where the action is +complete, and will normally be ignored. + +==Using hooks== + +A calling function or method uses the wfRunHooks() function to run +the hooks related to a particular event, like so: + + class Article { + # ... + function protect() { + global $wgUser; + if (wfRunHooks('ArticleProtect', array(&$this, &$wgUser))) { + # protect the article + wfRunHooks('ArticleProtectComplete', array(&$this, &$wgUser)); + } + } + +wfRunHooks() returns true if the calling function should continue +processing (the hooks ran OK, or there are no hooks to run), or false +if it shouldn't (an error occurred, or one of the hooks handled the +action already). Checking the return value matters more for "before" +hooks than for "complete" hooks. + +Note that hook parameters are passed in an array; this is a necessary +inconvenience to make it possible to pass reference values (that can +be changed) into the hook code. Also note that earlier versions of +wfRunHooks took a variable number of arguments; the array() calling +protocol came about after MediaWiki 1.4rc1. + +==Events and parameters== + +This is a list of known events and parameters; please add to it if +you're going to add events to the MediaWiki code. + +'AbortNewAccount': Return false to cancel account creation. +$user: the User object about to be created (read-only, incomplete) +$message: out parameter: error message to display on abort + +'AddNewAccount': after a user account is created +$user: the User object that was created. (Parameter added in 1.7) + +'ArticleDelete': before an article is deleted +$article: the article (object) being deleted +$user: the user (object) deleting the article +$reason: the reason (string) the article is being deleted + +'ArticleDeleteComplete': after an article is deleted +$article: the article that was deleted +$user: the user that deleted the article +$reason: the reason the article was deleted + +'ArticleProtect': before an article is protected +$article: the article being protected +$user: the user doing the protection +$protect: boolean whether this is a protect or an unprotect +$reason: Reason for protect +$moveonly: boolean whether this is for move only or not + +'ArticleProtectComplete': after an article is protected +$article: the article that was protected +$user: the user who did the protection +$protect: boolean whether it was a protect or an unprotect +$reason: Reason for protect +$moveonly: boolean whether it was for move only or not + +'ArticleSave': before an article is saved +$article: the article (object) being saved +$user: the user (object) saving the article +$text: the new article text +$summary: the article summary (comment) +$isminor: minor flag +$iswatch: watch flag +$section: section # + +'ArticleSaveComplete': after an article is saved +$article: the article (object) saved +$user: the user (object) who saved the article +$text: the new article text +$summary: the article summary (comment) +$isminor: minor flag +$iswatch: watch flag +$section: section # + +'AutoAuthenticate': called to authenticate users on external/environmental means +$user: writes user object to this parameter + +'BadImage': When checking against the bad image list +$name: Image name being checked +&$bad: Whether or not the image is "bad" + +Change $bad and return false to override. If an image is "bad", it is not +rendered inline in wiki pages or galleries in category pages. + +'BlockIp': before an IP address or user is blocked +$block: the Block object about to be saved +$user: the user _doing_ the block (not the one being blocked) + +'BlockIpComplete': after an IP address or user is blocked +$block: the Block object that was saved +$user: the user who did the block (not the one being blocked) + +'DiffViewHeader': called before diff display +$diff: DifferenceEngine object that's calling +$oldRev: Revision object of the "old" revision (may be null/invalid) +$newRev: Revision object of the "new" revision + +'EditFormPreloadText': Allows population of the edit form when creating new pages +&$text: Text to preload with +&$title: Title object representing the page being created + +'EditFilter': Perform checks on an edit +$editor: Edit form (see includes/EditPage.php) +$text: Contents of the edit box +$section: Section being edited +&$error: Error message to return + +Return false to halt editing; you'll need to handle error messages, etc. yourself. +Alternatively, modifying $error and returning true will cause the contents of $error +to be echoed at the top of the edit form as wikitext. Return true without altering +$error to allow the edit to proceed. + +'EmailConfirmed': When checking that the user's email address is "confirmed" +$user: User being checked +$confirmed: Whether or not the email address is confirmed +This runs before the other checks, such as anonymity and the real check; return +true to allow those checks to occur, and false if checking is done. + +'EmailUser': before sending email from one user to another +$to: address of receiving user +$from: address of sending user +$subject: subject of the mail +$text: text of the mail + +'EmailUserComplete': after sending email from one user to another +$to: address of receiving user +$from: address of sending user +$subject: subject of the mail +$text: text of the mail + +'FetchChangesList': When fetching the ChangesList derivative for a particular user +&$user: User the list is being fetched for +&$skin: Skin object to be used with the list +&$list: List object (defaults to NULL, change it to an object instance and return +false override the list derivative used) + +'GetInternalURL': modify fully-qualified URLs used for squid cache purging +$title: Title object of page +$url: string value as output (out parameter, can modify) +$query: query options passed to Title::getInternalURL() + +'GetLocalURL': modify local URLs as output into page links +$title: Title object of page +$url: string value as output (out parameter, can modify) +$query: query options passed to Title::getLocalURL() + +'GetFullURL': modify fully-qualified URLs used in redirects/export/offsite data +$title: Title object of page +$url: string value as output (out parameter, can modify) +$query: query options passed to Title::getFullURL() + +'LogPageValidTypes': action being logged. DEPRECATED: Use $wgLogTypes +&$type: array of strings + +'LogPageLogName': name of the logging page(s). DEPRECATED: Use $wgLogNames +&$typeText: array of strings + +'LogPageLogHeader': strings used by wfMsg as a header. DEPRECATED: Use $wgLogHeaders +&$headerText: array of strings + +'LogPageActionText': strings used by wfMsg as a header. DEPRECATED: Use $wgLogActions +&$actionText: array of strings + +'MarkPatrolled': before an edit is marked patrolled +$rcid: ID of the revision to be marked patrolled +$user: the user (object) marking the revision as patrolled +$wcOnlySysopsCanPatrol: config setting indicating whether the user + needs to be a sysop in order to mark an edit patrolled + +'MarkPatrolledComplete': after an edit is marked patrolled +$rcid: ID of the revision marked as patrolled +$user: user (object) who marked the edit patrolled +$wcOnlySysopsCanPatrol: config setting indicating whether the user + must be a sysop to patrol the edit + +'MathAfterTexvc': after texvc is executed when rendering mathematics +$mathRenderer: instance of MathRenderer +$errmsg: error message, in HTML (string). Nonempty indicates failure + of rendering the formula + +'OutputPageBeforeHTML': a page has been processed by the parser and +the resulting HTML is about to be displayed. +$parserOutput: the parserOutput (object) that corresponds to the page +$text: the text that will be displayed, in HTML (string) + +'PageRenderingHash': alter the parser cache option hash key + A parser extension which depends on user options should install + this hook and append its values to the key. +$hash: reference to a hash key string which can be modified + +'PersonalUrls': Alter the user-specific navigation links (e.g. "my page, +my talk page, my contributions" etc). + +&$personal_urls: Array of link specifiers (see SkinTemplate.php) +&$title: Title object representing the current page + +'SiteNoticeBefore': Before the sitenotice/anonnotice is composed +&$siteNotice: HTML returned as the sitenotice +Return true to allow the normal method of notice selection/rendering to work, +or change the value of $siteNotice and return false to alter it. + +'SiteNoticeAfter': After the sitenotice/anonnotice is composed +&$siteNotice: HTML sitenotice +Alter the contents of $siteNotice to add to/alter the sitenotice/anonnotice. + +'TitleMoveComplete': after moving an article (title) +$old: old title +$nt: new title +$user: user who did the move +$pageid: database ID of the page that's been moved +$redirid: database ID of the created redirect + +'UnknownAction': An unknown "action" has occured (useful for defining + your own actions) +$action: action name +$article: article "acted on" + +'UnwatchArticle': before a watch is removed from an article +$user: user watching +$article: article object to be removed + +'UnwatchArticle': after a watch is removed from an article +$user: user that was watching +$article: article object removed + +'UploadVerification': additional chances to reject an uploaded file +string $saveName: destination file name +string $tempName: filesystem path to the temporary file for checks +string &$error: output: HTML error to show if upload canceled by returning false + +'UploadComplete': Upon completion of a file upload +$image: Image object representing the file that was uploaded + +'UserCan': To interrupt/advise the "user can do X to Y article" check +$title: Title object being checked against +$user : Current user object +$action: Action being checked +$result: Pointer to result returned if hook returns false. If null is returned, + UserCan checks are continued by internal code + +'UserCreateForm': change to manipulate the login form +$template: SimpleTemplate instance for the form + +'UserLoginComplete': after a user has logged in +$user: the user object that was created on login + +'UserLoginForm': change to manipulate the login form +$template: SimpleTemplate instance for the form + +'UserLogout': before a user logs out +$user: the user object that is about to be logged out + +'UserLogoutComplete': after a user has logged out +$user: the user object _after_ logout (won't have name, ID, etc.) + +'UserRights': After a user's group memberships are changed +$user : User object that was changed +$add : Array of strings corresponding to groups added +$remove: Array of strings corresponding to groups removed + +'WatchArticle': before a watch is added to an article +$user: user that will watch +$article: article object to be watched + +'WatchArticleComplete': after a watch is added to an article +$user: user that watched +$article: article object watched + +'UnwatchArticleComplete': after a watch is removed from an article +$user: user that watched +$article: article object that was watched + +'CategoryPageView': before viewing a categorypage in CategoryPage::view +$catpage: CategoryPage instance + +'SkinTemplateContentActions': after building the $content_action array right + before returning it, see content_action.php in + the extension module for a demonstration of how + to use this hook. +$content_actions: The array of content actions + +'BeforePageDisplay': Called just before outputting a page (all kinds of, + articles, special, history, preview, diff, edit, ...) + Can be used to set custom CSS/JS +$out: OutputPage object + +More hooks might be available but undocumented, you can execute +./maintenance/findhooks.php to find hidden one. diff --git a/docs/html/README b/docs/html/README new file mode 100644 index 00000000..d25b803d --- /dev/null +++ b/docs/html/README @@ -0,0 +1,4 @@ +This directory is for the auto-generated phpdoc documentation. +Run 'php mwdocgen.php' in the maintenance subdirectory to build the docs. + +Get phpDocumentor from http://phpdoc.org/ diff --git a/docs/language.txt b/docs/language.txt new file mode 100644 index 00000000..9d6a0db3 --- /dev/null +++ b/docs/language.txt @@ -0,0 +1,24 @@ +language.txt + +The Language object handles all readable text produced by the +software. The most used function is getMessage(), usually +called with the wrapper function wfMsg() which calls that method +on the global language object. It just returns a piece of text +given a text key. It is recommended that you use each key only +once--bits of text in different contexts that happen to be +identical in English may not be in other languages, so it's +better to add new keys than to reuse them a lot. Likewise, +if there is text that gets combined with things like names and +titles, it is better to put markers like "$1" inside a piece +of text and use str_replace() than to compose such messages in +code, because their order may change in other languages too. + +While the system is running, there will be one global language +object, which will be a subtype of Language. The methods in +these objects will return the native text requested if available, +otherwise they fall back to sending English text (which is why +the LanguageEn object has no code at all--it just inherits the +English defaults of the Language base class). + +The names of the namespaces are also contained in the language +object, though the numbers are fixed. diff --git a/docs/linkcache.txt b/docs/linkcache.txt new file mode 100644 index 00000000..3e9799c3 --- /dev/null +++ b/docs/linkcache.txt @@ -0,0 +1,18 @@ +linkcache.txt + +The LinkCache class maintains a list of article titles and +the information about whether or not the article exists in +the database. This is used to mark up links when displaying +a page. If the same link appears more than once on any page, +then it only has to be looked up once. In most cases, link +lookups are done in batches with the LinkBatch class, or the +equivalent in Parser::replaceLinkHolders(), so the link +cache is mostly useful for short snippets of parsed text +(such as the site notice), and for links in the navigation +areas of the skin. + +The link cache was formerly used to track links used in a +document for the purposes of updating the link tables. This +application is now deprecated. + + diff --git a/docs/magicword.txt b/docs/magicword.txt new file mode 100644 index 00000000..74e49cff --- /dev/null +++ b/docs/magicword.txt @@ -0,0 +1,44 @@ +magicword.txt + +Magic Words are some phrases used in the wikitext. They are defined in several arrays: +* $magicWords (includes/MagicWord.php) includes their internal names ('MAG_XXX'). +* $wgVariableIDs (includes/MagicWord.php) includes their IDs (MAG_XXX, which are constants), + after their internal names are used for "define()". +* Localized arrays (languages/LanguageXX.php) include their different names to be used by the users. + +The localized arrays keys are the internal IDs, and the values are an array, whose include their +case-sensitivity and their alias forms. The first form defined is used by the program, for example, +when moving a page and its old name should include #REDIRECT. + +Adding magic words should be done using several hooks: +* "MagicWordMagicWords" should be used to add the internal name ('MAG_XXX') to $magicWords. +* "MagicWordwgVariableIDs" should be used to add the ID (MAG_XXX constant) to $wgVariableIDs. +* "LanguageGetMagic" should be used to add the different names of the magic word. Use both + the localized name and the English name. Get the language code by the parameter $langCode; + +For example: + +$wgHooks['MagicWordMagicWords'][] = 'wfAddCustomMagicWord'; +$wgHooks['MagicWordwgVariableIDs'][] = 'wfAddCustomMagicWordID'; +$wgHooks['LanguageGetMagic'][] = 'wfAddCustomMagicWordLang'; + +function wfAddCustomMagicWord( &$magicWords ) { + $magicWords[] = 'MAG_CUSTOM'; + return true; +} + +function wfAddCustomMagicWordID( &$magicWords ) { + $magicWords[] = MAG_CUSTOM; + return true; +} + +function wfAddCustomMagicWordLang( &$magicWords, $langCode ) { + switch ( $langCode ) { + case 'es': + $magicWords[MAG_CUSTOM] = array( 0, "#aduanero", "#custom" ); + break; + default: + $magicWords[MAG_CUSTOM] = array( 0, "#custom" ); + } + return true; +} diff --git a/docs/memcached.txt b/docs/memcached.txt new file mode 100644 index 00000000..6752e9c8 --- /dev/null +++ b/docs/memcached.txt @@ -0,0 +1,132 @@ +memcached support for MediaWiki: + +From ca August 2003, MediaWiki has optional support for memcached, a +"high-performance, distributed memory object caching system". +For general information on it, see: http://www.danga.com/memcached/ + +Memcached is likely more trouble than a small site will need, but +for a larger site with heavy load, like Wikipedia, it should help +lighten the load on the database servers by caching data and objects +in memory. + +== Requirements == + +* PHP must be compiled with --enable-sockets + +* libevent: http://www.monkey.org/~provos/libevent/ + (as of 2003-08-11, 0.7a is current) + +* optionally, epoll-rt patch for Linux kernel: + http://www.xmailserver.org/linux-patches/nio-improve.html + +* memcached: http://www.danga.com/memcached/download.bml + (as of this writing, 1.1.9 is current) + +Memcached and libevent are under BSD-style licenses. + +The server should run on Linux and other Unix-like systems... you +can run multiple servers on one machine or on multiple machines on +a network; storage can be distributed across multiple servers, and +multiple web servers can use the same cache cluster. + + +********************* W A R N I N G ! ! ! ! ! *********************** +Memcached has no security or authentication. Please ensure that your +server is appropriately firewalled, and that the port(s) used for +memcached servers are not publicly accessible. Otherwise, anyone on +the internet can put data into and read data from your cache. + +An attacker familiar with MediaWiki internals could use this to give +themselves developer access and delete all data from the wiki's +database, as well as getting all users' password hashes and e-mail +addresses. +********************* W A R N I N G ! ! ! ! ! *********************** + +== Setup == + +If you want to start small, just run one memcached on your web +server: + + memcached -d -l 127.0.0.1 -p 11000 -m 64 + +(to run in daemon mode, accessible only via loopback interface, +on port 11000, using up to 64MB of memory) + +In your LocalSettings.php file, set: + + $wgUseMemCached = true; + $wgMemCachedServers = array( "127.0.0.1:11000" ); + +The wiki should then use memcached to cache various data. To use +multiple servers (physically separate boxes or multiple caches +on one machine on a large-memory x86 box), just add more items +to the array. To increase the weight of a server (say, because +it has twice the memory of the others and you want to spread +usage evenly), make its entry a subarray: + + $wgMemCachedServers = array( + "127.0.0.1:11000", # one gig on this box + array("192.168.0.1:11000", 2) # two gigs on the other box + ); + + +== PHP client for memcached == + +As of this writing, MediaWiki includes version 1.0.10 of the PHP +memcached client by Ryan Gilfether <hotrodder@rocketmail.com>. +You'll find some documentation for it in the 'php-memcached' +subdirectory under the present one. + +We intend to track updates, but if you want to check for the lastest +released version, see http://www.danga.com/memcached/apis.bml + +If you don't set $wgUseMemCached, we still create a MemCacheClient, +but requests to it are no-ops and we always fall through to the +database. If the cache daemon can't be contacted, it should also +disable itself fairly smoothly. + +== Keys used == + +User: + key: $wgDBname:user:id:$sId + ex: wikidb:user:id:51 + stores: instance of class User + set in: User::loadFromSession() + cleared by: User::saveSettings(), UserTalkUpdate::doUpdate() + +Newtalk: + key: $wgDBname:newtalk:ip:$ip + ex: wikidb:newtalk:ip:123.45.67.89 + stores: integer, 0 or 1 + set in: User::loadFromDatabase() + cleared by: User::saveSettings() # ? + expiry set to 30 minutes + +LinkCache: + key: $wgDBname:lc:title:$title + ex: wikidb:lc:title:Wikipedia:Welcome,_Newcomers! + stores: cur_id of page, or 0 if page does not exist + set in: LinkCache::addLink() + cleared by: LinkCache::clearBadLink() + should be cleared on page deletion and rename +MediaWiki namespace: + key: $wgDBname:messages + ex: wikidb:messages + stores: an array where the keys are DB keys and the values are messages + set in: wfMsg(), Article::editUpdates() both call wfLoadAllMessages() + cleared by: nothing + +Watchlist: + key: $wgDBname:watchlist:id:$userID + ex: wikidb:watchlist:id:4635 + stores: HTML string + cleared by: nothing, expiry time $wgWLCacheTimeout (1 hour) + note: emergency optimisation only + +IP blocks: + key: $wgDBname:ipblocks + ex: wikidb:ipblocks + stores: array of arrays, for the BlockCache class + cleared by: BlockCache:clear() + +... more to come ... diff --git a/docs/php-memcached/ChangeLog b/docs/php-memcached/ChangeLog new file mode 100644 index 00000000..86792f60 --- /dev/null +++ b/docs/php-memcached/ChangeLog @@ -0,0 +1,45 @@ +Release 1.0.10 +-------------- +* bug fix: changes hashing function to crc32, sprintf %u +* feature: optional compression + +Release 1.0.9 +------------- +* protocol parsing bug + +Release 1.0.8 +------------- +* whitespace/punctuation/wording cleanups + +Release 1.0.7 +------------- +* added 3 functions which handle error reporting + error() - returns error number of last error generated, else returns 0 + error_string() - returns a string description of error number retuned + error_clear() - clears the last error number and error string +* removed call to preg_match() in _loaditems() +* only non-scalar values are serialize() before being + sent to the server +* added the optional timestamp argument for delete() + read Documentation file for details +* PHPDocs/PEAR style comments added +* abstract debugging (Brion Vibber <brion@pobox.com>) + +Release 1.0.6 +------------- +* removed all array_push() calls +* applied patch provided by Stuart Herbert<stuart@gentoo.org> + corrects possible endless loop. Available at + http://bugs.gentoo.org/show_bug.cgi?id=25385 +* fixed problem with storing large binary files +* added more error checking, specifically on all socket functions +* added support for the INCR and DECR commands + which increment or decrement a value stored in MemCached +* Documentation removed from source and is now available + in the file Documentation + +Release 1.0.4 +------------- +* initial release, version numbers kept + in sync with MemCached version +* capable of storing any datatype in MemCached diff --git a/docs/php-memcached/Documentation b/docs/php-memcached/Documentation new file mode 100644 index 00000000..4782807b --- /dev/null +++ b/docs/php-memcached/Documentation @@ -0,0 +1,258 @@ +Ryan Gilfether <hotrodder@rocketmail.com> +http://www.gilfether.com +This module is Copyright (c) 2003 Ryan Gilfether. +All rights reserved. + +You may distribute under the terms of the GNU General Public License +This is free software. IT COMES WITHOUT WARRANTY OF ANY KIND. + +See the memcached website: http://www.danga.com/memcached/ + + +// Takes one parameter, a array of options. The most important key is +// options["servers"], but that can also be set later with the set_servers() +// method. The servers must be an array of hosts, each of which is +// either a scalar of the form <10.0.0.10:11211> or an array of the +// former and an integer weight value. (the default weight if +// unspecified is 1.) It's recommended that weight values be kept as low +// as possible, as this module currently allocates memory for bucket +// distribution proportional to the total host weights. +// $options["debug"] turns the debugging on if set to true +MemCachedClient::MemCachedClient($options); + +// sets up the list of servers and the ports to connect to +// takes an array of servers in the same format as in the constructor +MemCachedClient::set_servers($servers); + +// Retrieves a key from the memcache. Returns the value (automatically +// unserialized, if necessary) or FALSE if it fails. +// The $key can optionally be an array, with the first element being the +// hash value, if you want to avoid making this module calculate a hash +// value. You may prefer, for example, to keep all of a given user's +// objects on the same memcache server, so you could use the user's +// unique id as the hash value. +// Possible errors set are: +// MC_ERR_GET +MemCachedClient::get($key); + +// just like get(), but takes an array of keys, returns FALSE on error +// Possible errors set are: +// MC_ERR_NOT_ACTIVE +MemCachedClient::get_multi($keys) + +// Unconditionally sets a key to a given value in the memcache. Returns true +// if it was stored successfully. +// The $key can optionally be an arrayref, with the first element being the +// hash value, as described above. +// returns TRUE on success else FALSE +// Possible errors set are: +// MC_ERR_NOT_ACTIVE +// MC_ERR_GET_SOCK +// MC_ERR_SOCKET_WRITE +// MC_ERR_SOCKET_READ +// MC_ERR_SET +MemCachedClient::set($key, $value, $exptime); + +// Like set(), but only stores in memcache if the key doesn't already exist. +// returns TRUE on success else FALSE +// Possible errors set are: +// MC_ERR_NOT_ACTIVE +// MC_ERR_GET_SOCK +// MC_ERR_SOCKET_WRITE +// MC_ERR_SOCKET_READ +// MC_ERR_SET +MemCachedClient::add($key, $value, $exptime); + +// Like set(), but only stores in memcache if the key already exists. +// returns TRUE on success else FALSE +// Possible errors set are: +// MC_ERR_NOT_ACTIVE +// MC_ERR_GET_SOCK +// MC_ERR_SOCKET_WRITE +// MC_ERR_SOCKET_READ +// MC_ERR_SET +MemCachedClient::replace($key, $value, $exptime); + +// removes the key from the MemCache +// $time is the amount of time in seconds (or Unix time) until which +// the client wishes the server to refuse "add" and "replace" commands +// with this key. For this amount of item, the item is put into a +// delete queue, which means that it won't possible to retrieve it by +// the "get" command, but "add" and "replace" command with this key +// will also fail (the "set" command will succeed, however). After the +// time passes, the item is finally deleted from server memory. +// The parameter $time is optional, and, if absent, defaults to 0 +// (which means that the item will be deleted immediately and further +// storage commands with this key will succeed). +// returns TRUE on success else returns FALSE +// Possible errors set are: +// MC_ERR_NOT_ACTIVE +// MC_ERR_GET_SOCK +// MC_ERR_SOCKET_WRITE +// MC_ERR_SOCKET_READ +// MC_ERR_DELETE +MemCachedClient::delete($key, $time = 0); + +// Sends a command to the server to atomically increment the value for +// $key by $value, or by 1 if $value is undefined. Returns FALSE if $key +// doesn't exist on server, otherwise it returns the new value after +// incrementing. Value should be zero or greater. Overflow on server +// is not checked. Be aware of values approaching 2**32. See decr. +// Possible errors set are: +// MC_ERR_NOT_ACTIVE +// MC_ERR_GET_SOCK +// MC_ERR_SOCKET_WRITE +// MC_ERR_SOCKET_READ +// returns new value on success, else returns FALSE +// ONLY WORKS WITH NUMERIC VALUES +MemCachedClient::incr($key[, $value]); + +// Like incr, but decrements. Unlike incr, underflow is checked and new +// values are capped at 0. If server value is 1, a decrement of 2 +// returns 0, not -1. +// Possible errors set are: +// MC_ERR_NOT_ACTIVE +// MC_ERR_GET_SOCK +// MC_ERR_SOCKET_WRITE +// MC_ERR_SOCKET_READ +// returns new value on success, else returns FALSE +// ONLY WORKS WITH NUMERIC VALUES +MemCachedClient::decr($key[, $value]); + +// disconnects from all servers +MemCachedClient::disconnect_all(); + +// if $do_debug is set to true, will print out +// debugging info, else debug is turned off +MemCachedClient::set_debug($do_debug); + +// remove all cached hosts that are no longer good +MemCachedClient::forget_dead_hosts(); + +// When a function returns FALSE, an error code is set. +// This funtion will return the error code. +// See error_string() +// returns last error code set +MemCachedClient::error() + +// Returns a string describing the error set in error() +// See error() +// returns a string describing the error code given +MemCachedClient::error_string() + +// Resets the error number and error string +MemCachedClient::error_clear() + +Error codes are as follows: +MC_ERR_NOT_ACTIVE // no active servers +MC_ERR_SOCKET_WRITE // socket_write() failed +MC_ERR_SOCKET_READ // socket_read() failed +MC_ERR_SOCKET_CONNECT // failed to connect to host +MC_ERR_DELETE // delete() did not recieve DELETED command +MC_ERR_HOST_FORMAT // sock_to_host() invalid host format +MC_ERR_HOST_DEAD // sock_to_host() host is dead +MC_ERR_GET_SOCK // get_sock() failed to find a valid socket +MC_ERR_SET // _set() failed to receive the STORED response +MC_ERR_LOADITEM_HEADER // _load_items failed to receive valid data header +MC_ERR_LOADITEM_END // _load_items failed to receive END response +MC_ERR_LOADITEM_BYTES // _load_items bytes read larger than bytes available +MC_ERR_GET // failed to get value associated with key + +// Turns compression on or off; 0=off, 1=on +MemCacheClient::set_compression($setting) + +EXAMPLE: +<?php +require("MemCachedClient.inc.php"); + +// set the servers, with the last one having an interger weight value of 3 +$options["servers"] = array("10.0.0.15:11000","10.0.0.16:11001",array("10.0.0.17:11002", 3)); +$options["debug"] = false; + +$memc = new MemCachedClient($options); + + +/*********************** + * STORE AN ARRAY + ***********************/ +$myarr = array("one","two", 3); +$memc->set("key_one", $myarr); +$val = $memc->get("key_one"); +print $val[0]."\n"; // prints 'one' +print $val[1]."\n"; // prints 'two' +print $val[2]."\n"; // prints 3 + + +print "\n"; + + +/*********************** + * STORE A CLASS + ***********************/ +class tester +{ + var $one; + var $two; + var $three; +} + +$t = new tester; +$t->one = "one"; +$t->two = "two"; +$t->three = 3; +$memc->set("key_two", $t); +$val = $memc->get("key_two"); +print $val->one."\n"; +print $val->two."\n"; +print $val->three."\n"; + + +print "\n"; + + +/*********************** + * STORE A STRING + ***********************/ +$memc->set("key_three", "my string"); +$val = $memc->get("key_three"); +print $val; // prints 'my string' + +$memc->delete("key_one"); +$memc->delete("key_two"); +$memc->delete("key_three"); + +$memc->disconnect_all(); + + + +print "\n"; + + +/*********************** + * STORE A BINARY FILE + ***********************/ + + // first read the file and save it in memcache +$fp = fopen( "./image.jpg", "rb" ) ; +if ( !$fp ) +{ + print "Could not open ./file.dat!\n" ; + exit ; +} +$data = fread( $fp, filesize( "./image.jpg" ) ) ; +fclose( $fp ) ; +print "Data length is " . strlen( $data ) . "\n" ; +$memc->set( "key", $data ) ; + +// now open a file for writing and write the data +// retrieved from memcache +$fp = fopen("./test.jpg","wb"); +$data = $memc->get( "key" ) ; +print "Data length is " . strlen( $data ) . "\n" ; +fwrite($fp,$data,strlen( $data )); +fclose($fp); + + +?> + + diff --git a/docs/schema.txt b/docs/schema.txt new file mode 100644 index 00000000..f7348462 --- /dev/null +++ b/docs/schema.txt @@ -0,0 +1,6 @@ +The most up-to-date schema for the tables in the database +will always be "tables.sql" in the maintenance directory, +which is called from the installation script. + +That file has been commented with details of the usage for +each table and field. diff --git a/docs/skin.txt b/docs/skin.txt new file mode 100644 index 00000000..82a5b72e --- /dev/null +++ b/docs/skin.txt @@ -0,0 +1,48 @@ + +skin.txt + +This document describes the overall architecture of MediaWiki's HTML rendering +code as well as some history about the skin system. It is placed here rather +than in comments in the code itself to help reduce the code size. + +== Version 1.4 == + +MediaWiki still use the PHPTal skin system introduced in version 1.3 but some +changes have been made to the file organisation. + +PHP class and PHPTal templates have been moved to /skins/ (respectivly from +/includes/ and /templates/). This way skin designer and end user just stick to +one directory. + +Two samples are provided to start with, one for PHPTal use (SkinPHPTal.sample) +and one without (Skin.sample). + + +== Version 1.3 == + +The following might help a bit though. + +Firstly, there's Skin.php; this file will check various settings, and it +contains a base class from which new skins can be derived. + +Before version 1.3, each skin had its own PHP file (with a sub-class to Skin) +to generate the output. The files are: + * SkinCologneBlue.php + * SkinNostalgia.php + * SkinStandard.php + * SkinWikimediaWiki.php +If you want to change those skins, you have to edit these PHP files. + +Since 1.3 a new special skin file is available: SkinPHPTal.php. It makes use of +the PHPTal template engine and allows you to separate code and layout of the +pages. The default 1.3 skin is MonoBook and it uses the SkinPHPTAL class. + +To change the layout, just edit the PHPTal template (templates/xhtml_slim.pt) +as well as the stylesheets (stylesheets/monobook/*). + + +== pre 1.3 version == + +Unfortunately there isn't any documentation, and the code's in a bit of a mess +right now during the transition from the old skin code to the new template-based +skin code in 1.3. diff --git a/docs/title.txt b/docs/title.txt new file mode 100644 index 00000000..b404bd3c --- /dev/null +++ b/docs/title.txt @@ -0,0 +1,72 @@ +title.txt + +The MediaWiki software's "Title" class represents article +titles, which are used for many purposes: as the human-readable +text title of the article, in the URL used to access the article, +the wikitext link to the article, the key into the article +database, and so on. The class in instantiated from one of +these forms and can be queried for the others, and for other +attributes of the title. This is intended to be an +immutable "value" class, so there are no mutator functions. + +To get a new instance, call one of the static factory +methods WikiTitle::newFromURL(), WikiTitle::newFromDBKey(), +or WikiTitle::newFromText(). Once instantiated, the +other non-static accessor methods can be used, such as +getText(), getDBKey(), getNamespace(), etc. + +The prefix rules: a title consists of an optional Interwiki +prefix (such as "m:" for meta or "de:" for German), followed +by an optional namespace, followed by the remainder of the +title. Both Interwiki prefixes and namespace prefixes have +the same rules: they contain only letters, digits, space, and +underscore, must start with a letter, are case insensitive, +and spaces and underscores are interchangeable. Prefixes end +with a ":". A prefix is only recognized if it is one of those +specifically allowed by the software. For example, "de:name" +is a link to the article "name" in the German Wikipedia, because +"de" is recognized as one of the allowable interwikis. The +title "talk:name" is a link to the article "name" in the "talk" +namespace of the current wiki, because "talk" is a recognized +namespace. Both may be present, and if so, the interwiki must +come first, for example, "m:talk:name". If a title begins with +a colon as its first character, no prefixes are scanned for, +and the colon is just removed. Note that because of these +rules, it is possible to have articles with colons in their +names. "E. Coli 0157:H7" is a valid title, as is "2001: A Space +Odyssey", because "E. Coli 0157" and "2001" are not valid +interwikis or namespaces. Likewise, ":de:name" is a link to +the article "de:name"--even though "de" is a valid interwiki, +the initial colon stops all prefix matching. + +Character mapping rules: Once prefixes have been stripped, the +rest of the title processed this way: spaces and underscores are +treated as equivalent and each is converted to the other in the +appropriate context (underscore in URL and database keys, spaces +in plain text). "Extended" characters in the 0x80..0xFF range +are allowed in all places, and are valid characters. They are +encoded in URLs. Other characters may be ASCII letters, digits, +hyphen, comma, period, apostrophe, parentheses, and colon. No +other ASCII characters are allowed, and will be deleted if found +(they will probably cause a browser to misinterpret the URL). +Extended characters are _not_ urlencoded when used as text or +database keys. + +Character encoding rules: TODO + +Canonical forms: the canonical form of a title will always be +returned by the object. In this form, the first (and only the +first) character of the namespace and title will be uppercased; +the rest of the namespace will be lowercased, while the title +will be left as is. The text form will use spaces, the URL and +DBkey forms will use underscores. Interwiki prefixes are all +lowercase. The namespace will use underscores when returned +alone; it will use spaces only when attached to the text title. + +getArticleID() needs some explanation: for "internal" articles, +it should return the "cur_id" field if the article exists, else +it returns 0. For all external articles it returns 0. All of +the IDs for all instances of Title created during a request are +cached, so they can be looked up wuickly while rendering wiki +text with lots of internal links. + diff --git a/docs/user.txt b/docs/user.txt new file mode 100644 index 00000000..3f1c8202 --- /dev/null +++ b/docs/user.txt @@ -0,0 +1,63 @@ + +user.txt + +Documenting the MediaWiki User object. + +(DISCLAIMER: The documentation is not guaranteed to be in sync with +the code at all times. If in doubt, check the table definitions +and User.php.) + +Database fields: + + user_id + Unique integer identifier; primary key. Sent to user in + cookie "{$wgDBname}UserID". + + user_name + Text of full user name; title of "user:" page. Displayed + on change lists, etc. Sent to user as cookie "{$wgDBname}UserName". + Note that user names can contain spaces, while these are + converted to underscores in page titles. + + user_rights + Comma-separated list of rights. Right now, only "sysop", + "developer", "bureaucrat", and "bot" have meaning. + + user_password + Salted md5 hash of md5-hashed user login password. If user option to + remember password is set, an md5 password hash is stored in cookie + "{$wgDBname}UserPassword". The original password and the hashed password + can be compared to the salted-hashed-hashed password. + + user_newpassword + Hash for randomly generated password sent on 'send me a new password'. + If a match is made on login, the new password will replace the old one. + + user_email + User's e-mail address. (Optional, used for user-to-user + e-mail and password recovery.) + + user_options + A urlencoded string of name=value pairs to set various + user options. + + user_touched + Timestamp updated when the user logs in, changes preferences, alters + watchlist, or when someone edits their user talk page or they clear + the new-talk field by viewing it. Used to invalidate old cached pages + from the user's browser cache. + + user_real_name + "Real name" optionally used in some metadata lists. + +The user object encapsulates all of the settings, and clients +classes use the getXXX() functions to access them. These functions +do all the work of determining whether the user is logged in, +whether the requested option can be satisfied from cookies or +whether a database query is needed. Most of the settings needed +for rendering normal pages are set in the cookie to minimize use +of the database. + +Options + The user_options field is a list of name-value pairs. The + following option names are used at various points in the system: |