diff options
Diffstat (limited to 'docs/title.txt')
-rw-r--r-- | docs/title.txt | 72 |
1 files changed, 72 insertions, 0 deletions
diff --git a/docs/title.txt b/docs/title.txt new file mode 100644 index 00000000..b404bd3c --- /dev/null +++ b/docs/title.txt @@ -0,0 +1,72 @@ +title.txt + +The MediaWiki software's "Title" class represents article +titles, which are used for many purposes: as the human-readable +text title of the article, in the URL used to access the article, +the wikitext link to the article, the key into the article +database, and so on. The class in instantiated from one of +these forms and can be queried for the others, and for other +attributes of the title. This is intended to be an +immutable "value" class, so there are no mutator functions. + +To get a new instance, call one of the static factory +methods WikiTitle::newFromURL(), WikiTitle::newFromDBKey(), +or WikiTitle::newFromText(). Once instantiated, the +other non-static accessor methods can be used, such as +getText(), getDBKey(), getNamespace(), etc. + +The prefix rules: a title consists of an optional Interwiki +prefix (such as "m:" for meta or "de:" for German), followed +by an optional namespace, followed by the remainder of the +title. Both Interwiki prefixes and namespace prefixes have +the same rules: they contain only letters, digits, space, and +underscore, must start with a letter, are case insensitive, +and spaces and underscores are interchangeable. Prefixes end +with a ":". A prefix is only recognized if it is one of those +specifically allowed by the software. For example, "de:name" +is a link to the article "name" in the German Wikipedia, because +"de" is recognized as one of the allowable interwikis. The +title "talk:name" is a link to the article "name" in the "talk" +namespace of the current wiki, because "talk" is a recognized +namespace. Both may be present, and if so, the interwiki must +come first, for example, "m:talk:name". If a title begins with +a colon as its first character, no prefixes are scanned for, +and the colon is just removed. Note that because of these +rules, it is possible to have articles with colons in their +names. "E. Coli 0157:H7" is a valid title, as is "2001: A Space +Odyssey", because "E. Coli 0157" and "2001" are not valid +interwikis or namespaces. Likewise, ":de:name" is a link to +the article "de:name"--even though "de" is a valid interwiki, +the initial colon stops all prefix matching. + +Character mapping rules: Once prefixes have been stripped, the +rest of the title processed this way: spaces and underscores are +treated as equivalent and each is converted to the other in the +appropriate context (underscore in URL and database keys, spaces +in plain text). "Extended" characters in the 0x80..0xFF range +are allowed in all places, and are valid characters. They are +encoded in URLs. Other characters may be ASCII letters, digits, +hyphen, comma, period, apostrophe, parentheses, and colon. No +other ASCII characters are allowed, and will be deleted if found +(they will probably cause a browser to misinterpret the URL). +Extended characters are _not_ urlencoded when used as text or +database keys. + +Character encoding rules: TODO + +Canonical forms: the canonical form of a title will always be +returned by the object. In this form, the first (and only the +first) character of the namespace and title will be uppercased; +the rest of the namespace will be lowercased, while the title +will be left as is. The text form will use spaces, the URL and +DBkey forms will use underscores. Interwiki prefixes are all +lowercase. The namespace will use underscores when returned +alone; it will use spaces only when attached to the text title. + +getArticleID() needs some explanation: for "internal" articles, +it should return the "cur_id" field if the article exists, else +it returns 0. For all external articles it returns 0. All of +the IDs for all instances of Title created during a request are +cached, so they can be looked up wuickly while rendering wiki +text with lots of internal links. + |