diff options
Diffstat (limited to 'docs/title.txt')
-rw-r--r-- | docs/title.txt | 123 |
1 files changed, 57 insertions, 66 deletions
diff --git a/docs/title.txt b/docs/title.txt index fd449c54..d2d91c9c 100644 --- a/docs/title.txt +++ b/docs/title.txt @@ -1,76 +1,67 @@ title.txt -The MediaWiki software's "Title" class represents article -titles, which are used for many purposes: as the human-readable -text title of the article, in the URL used to access the article, -the wikitext link to the article, the key into the article -database, and so on. The class in instantiated from one of -these forms and can be queried for the others, and for other -attributes of the title. This is intended to be an -immutable "value" class, so there are no mutator functions. +The MediaWiki software's "Title" class represents article titles, which are used +for many purposes: as the human-readable text title of the article, in the URL +used to access the article, the wikitext link to the article, the key into the +article database, and so on. The class in instantiated from one of these forms +and can be queried for the others, and for other attributes of the title. This +is intended to be an immutable "value" class, so there are no mutator functions. -To get a new instance, call one of the static factory -methods Title::newFromURL(), Title::newFromDBKey(), -or Title::newFromText(). Once instantiated, the -other non-static accessor methods can be used, such as -getText(), getDBKey(), getNamespace(), etc. +To get a new instance, call Title::newFromText(). Once instantiated, the +non-static accessor methods can be used, such as getText(), getDBKey(), +getNamespace(), etc. Note that Title::newFromText() may return false if the text +is illegal according to the rules below. -The prefix rules: a title consists of an optional interwiki -prefix (such as "m:" for meta or "de:" for German), followed -by an optional namespace, followed by the remainder of the -title. Both interwiki prefixes and namespace prefixes have -the same rules: they contain only letters, digits, space, and -underscore, must start with a letter, are case insensitive, -and spaces and underscores are interchangeable. Prefixes end -with a ":". A prefix is only recognized if it is one of those -specifically allowed by the software. For example, "de:name" -is a link to the article "name" in the German Wikipedia, because -"de" is recognized as one of the allowable interwikis. The -title "talk:name" is a link to the article "name" in the "talk" -namespace of the current wiki, because "talk" is a recognized -namespace. Both may be present, and if so, the interwiki must -come first, for example, "m:talk:name". If a title begins with -a colon as its first character, no prefixes are scanned for, -and the colon is just removed. Note that because of these -rules, it is possible to have articles with colons in their -names. "E. Coli 0157:H7" is a valid title, as is "2001: A Space -Odyssey", because "E. Coli 0157" and "2001" are not valid -interwikis or namespaces. +The prefix rules: a title consists of an optional interwiki prefix (such as "m:" +for meta or "de:" for German), followed by an optional namespace, followed by +the remainder of the title. Both interwiki prefixes and namespace prefixes have +the same rules: they contain only letters, digits, space, and underscore, must +start with a letter, are case insensitive, and spaces and underscores are +interchangeable. Prefixes end with a ":". A prefix is only recognized if it is +one of those specifically allowed by the software. For example, "de:name" is a +link to the article "name" in the German Wikipedia, because "de" is recognized +as one of the allowable interwikis. The title "talk:name" is a link to the +article "name" in the "talk" namespace of the current wiki, because "talk" is a +recognized namespace. Both may be present, and if so, the interwiki must +come first, for example, "m:talk:name". If a title begins with a colon as its +first character, no prefixes are scanned for, and the colon is just removed. +Note that because of these rules, it is possible to have articles with colons in +their names. "E. Coli 0157:H7" is a valid title, as is "2001: A Space Odyssey", +because "E. Coli 0157" and "2001" are not valid interwikis or namespaces. -It is not possible to have an article whose bare name includes -a namespace or interwiki prefix. +It is not possible to have an article whose bare name includes a namespace or +interwiki prefix. -An initial colon in a title listed in wiki text may however -suppress special handling for interlanguage links, image links, -and category links. +An initial colon in a title listed in wiki text may however suppress special +handling for interlanguage links, image links, and category links. It is also +used to indicate the main namespace in template inclusions. -Character mapping rules: Once prefixes have been stripped, the -rest of the title processed this way: spaces and underscores are -treated as equivalent and each is converted to the other in the -appropriate context (underscore in URL and database keys, spaces -in plain text). "Extended" characters in the 0x80..0xFF range -are allowed in all places, and are valid characters. They are -encoded in URLs. Other characters may be ASCII letters, digits, -hyphen, comma, period, apostrophe, parentheses, and colon. No -other ASCII characters are allowed, and will be deleted if found -(they will probably cause a browser to misinterpret the URL). -Extended characters are _not_ urlencoded when used as text or -database keys. +Once prefixes have been stripped, the rest of the title processed this way: -Character encoding rules: TODO +* Spaces and underscores are treated as equivalent and each is converted to the + other in the appropriate context (underscore in URL and database keys, spaces + in plain text). +* Multiple consecutive spaces are converted to a single space. +* Leading or trailing space is removed. +* If $wgCapitalLinks is enabled (the default), the first letter is capitalised, + using the capitalisation function of the content language object. +* The unicode characters LRM (U+200E) and RLM (U+200F) are silently stripped. +* Invalid UTF-8 sequences or instances of the replacement character (U+FFFD) are + considered illegal. +* A percent sign followed by two hexadecimal characters is illegal +* Anything that looks like an XML/HTML character reference is illegal +* Any character not matched by the $wgLegalTitleChars regex is illegal +* Zero-length titles (after whitespace stripping) are illegal -Canonical forms: the canonical form of a title will always be -returned by the object. In this form, the first (and only the -first) character of the namespace and title will be uppercased; -the rest of the namespace will be lowercased, while the title -will be left as is. The text form will use spaces, the URL and -DBkey forms will use underscores. Interwiki prefixes are all -lowercase. The namespace will use underscores when returned -alone; it will use spaces only when attached to the text title. +All titles except special pages must be less than 255 bytes when encoded with +UTF-8, because that is the size of the database field. Special page titles may +be up to 512 bytes. -getArticleID() needs some explanation: for "internal" articles, -it should return the "page_id" field if the article exists, else -it returns 0. For all external articles it returns 0. All of -the IDs for all instances of Title created during a request are -cached, so they can be looked up quickly while rendering wiki -text with lots of internal links. +Note that Unicode Normal Form C (NFC) is enforced by MediaWiki's user interface +input functions, and so titles will typically be in this form. + +getArticleID() needs some explanation: for "internal" articles, it should return +the "page_id" field if the article exists, else it returns 0. For all external +articles it returns 0. All of the IDs for all instances of Title created during +a request are cached, so they can be looked up quickly while rendering wiki text +with lots of internal links. See linkcache.txt. |