From 183851b06bd6c52f3cae5375f433da720d410447 Mon Sep 17 00:00:00 2001 From: Pierre Schmitz Date: Wed, 11 Oct 2006 18:12:39 +0000 Subject: MediaWiki 1.7.1 wiederhergestellt --- docs/database.txt | 174 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 174 insertions(+) create mode 100644 docs/database.txt (limited to 'docs/database.txt') diff --git a/docs/database.txt b/docs/database.txt new file mode 100644 index 00000000..679492a1 --- /dev/null +++ b/docs/database.txt @@ -0,0 +1,174 @@ +Some information about database access in MediaWiki. +By Tim Starling, January 2006. + +------------------------------------------------------------------------ + API +------------------------------------------------------------------------ + +For a database API reference, please see the auto-generated +documentation: + +http://wikipedia.sourceforge.net/doc/MediaWiki/Database.html + +To make a read query, something like this usually suffices: + +$dbr =& wfGetDB( DB_SLAVE ); +$res = $dbr->select( /* ...see docs... */ ); +while ( $row = $dbr->fetchObject( $res ) ) { + ... +} +$dbr->freeResult( $res ); + +Note the assignment operator in the while condition. + +For a write query, use something like: + +$dbw =& wfGetDB( DB_MASTER ); +$dbw->insert( /* ...see docs... */ ); + +We use the convention $dbr for read and $dbw for write to help you keep +track of whether the database object is a slave (read-only) or a master +(read/write). If you write to a slave, the world will explode. Or to be +precise, a subsequent write query which succeeded on the master may fail +when replicated to the slave due to a unique key collision. Replication +on the slave will stop and it may take hours to repair the database and +get it back online. Setting read_only in my.cnf on the slave will avoid +this scenario, but given the dire consequences, we prefer to have as +many checks as possible. + +We provide a query() function for raw SQL, but the wrapper functions +like select() and insert() are usually more convenient. They take care +of things like table prefixes and escaping for you. If you really need +to make your own SQL, please read the documentation for tableName() and +addQuotes(). You will need both of them. + + +------------------------------------------------------------------------ + Basic query optimisation +------------------------------------------------------------------------ + +MediaWiki developers who need to write DB queries should have some +understanding of databases and the performance issues associated with +them. Patches containing unacceptably slow features will not be +accepted. Unindexed queries are generally not welcome in MediaWiki, +except in special pages derived from QueryPage. It's a common pitfall +for new developers to submit code containing SQL queries which examine +huge numbers of rows. Remember that COUNT(*) is O(N), counting rows in a +table is like counting beans in a bucket. + + +------------------------------------------------------------------------ + Replication +------------------------------------------------------------------------ + +The largest installation of MediaWiki, Wikimedia, uses a large set of +slave MySQL servers replicating writes made to a master MySQL server. It +is important to understand the issues associated with this setup if you +want to write code destined for Wikipedia. + +It's often the case that the best algorithm to use for a given task +depends on whether or not replication is in use. Due to our unabashed +Wikipedia-centrism, we often just use the replication-friendly version, +but if you like, you can use $wgLoadBalancer->getServerCount() > 1 to +check to see if replication is in use. + +=== Lag === + +Lag primarily occurs when large write queries are sent to the master. +Writes on the master are executed in parallel, but they are executed in +serial when they are replicated to the slaves. The master writes the +query to the binlog when the transaction is committed. The slaves poll +the binlog and start executing the query as soon as it appears. They can +service reads while they are performing a write query, but will not read +anything more from the binlog and thus will perform no more writes. This +means that if the write query runs for a long time, the slaves will lag +behind the master for the time it takes for the write query to complete. + +Lag can be exacerbated by high read load. MediaWiki's load balancer will +stop sending reads to a slave when it is lagged by more than 30 seconds. +If the load ratios are set incorrectly, or if there is too much load +generally, this may lead to a slave permanently hovering around 30 +seconds lag. + +If all slaves are lagged by more than 30 seconds, MediaWiki will stop +writing to the database. All edits and other write operations will be +refused, with an error returned to the user. This gives the slaves a +chance to catch up. Before we had this mechanism, the slaves would +regularly lag by several minutes, making review of recent edits +difficult. + +In addition to this, MediaWiki attempts to ensure that the user sees +events occuring on the wiki in chronological order. A few seconds of lag +can be tolerated, as long as the user sees a consistent picture from +subsequent requests. This is done by saving the master binlog position +in the session, and then at the start of each request, waiting for the +slave to catch up to that position before doing any reads from it. If +this wait times out, reads are allowed anyway, but the request is +considered to be in "lagged slave mode". Lagged slave mode can be +checked by calling $wgLoadBalancer->getLaggedSlaveMode(). The only +practical consequence at present is a warning displayed in the page +footer. + +=== Lag avoidance === + +To avoid excessive lag, queries which write large numbers of rows should +be split up, generally to write one row at a time. Multi-row INSERT ... +SELECT queries are the worst offenders should be avoided altogether. +Instead do the select first and then the insert. + +=== Working with lag === + +Despite our best efforts, it's not practical to guarantee a low-lag +environment. Lag will usually be less than one second, but may +occasionally be up to 30 seconds. For scalability, it's very important +to keep load on the master low, so simply sending all your queries to +the master is not the answer. So when you have a genuine need for +up-to-date data, the following approach is advised: + +1) Do a quick query to the master for a sequence number or timestamp 2) +Run the full query on the slave and check if it matches the data you got +from the master 3) If it doesn't, run the full query on the master + +To avoid swamping the master every time the slaves lag, use of this +approach should be kept to a minimum. In most cases you should just read +from the slave and let the user deal with the delay. + + +------------------------------------------------------------------------ + Lock contention +------------------------------------------------------------------------ + +Due to the high write rate on Wikipedia (and some other wikis), +MediaWiki developers need to be very careful to structure their writes +to avoid long-lasting locks. By default, MediaWiki opens a transaction +at the first query, and commits it before the output is sent. Locks will +be held from the time when the query is done until the commit. So you +can reduce lock time by doing as much processing as possible before you +do your write queries. Update operations which do not require database +access can be delayed until after the commit by adding an object to +$wgPostCommitUpdateList. + +Often this approach is not good enough, and it becomes necessary to +enclose small groups of queries in their own transaction. Use the +following syntax: + +$dbw =& wfGetDB( DB_MASTER ); +$dbw->immediateBegin(); +/* Do queries */ +$dbw->immediateCommit(); + +There are functions called begin() and commit() but they don't do what +you would expect. Don't use them. + +Use of locking reads (e.g. the FOR UPDATE clause) is not advised. They +are poorly implemented in InnoDB and will cause regular deadlock errors. +It's also surprisingly easy to cripple the wiki with lock contention. If +you must use them, define a new flag for $wgAntiLockFlags which allows +them to be turned off, because we'll almost certainly need to do so on +the Wikimedia cluster. + +Instead of locking reads, combine your existence checks into your write +queries, by using an appropriate condition in the WHERE clause of an +UPDATE, or by using unique indexes in combination with INSERT IGNORE. +Then use the affected row count to see if the query succeeded. + -- cgit v1.2.3-54-g00ecf