From 183851b06bd6c52f3cae5375f433da720d410447 Mon Sep 17 00:00:00 2001 From: Pierre Schmitz Date: Wed, 11 Oct 2006 18:12:39 +0000 Subject: MediaWiki 1.7.1 wiederhergestellt --- includes/normal/README | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) create mode 100644 includes/normal/README (limited to 'includes/normal/README') diff --git a/includes/normal/README b/includes/normal/README new file mode 100644 index 00000000..f8207a1b --- /dev/null +++ b/includes/normal/README @@ -0,0 +1,55 @@ +This directory contains some Unicode normalization routines. These routines +are meant to be reusable in other projects, so I'm not tying them to the +MediaWiki utility functions. + +The main function to care about is UtfNormal::toNFC(); this will convert +a given UTF-8 string to Normalization Form C if it's not already such. +The function assumes that the input string is already valid UTF-8; if there +are corrupt characters this may produce erroneous results. + +To also check for illegal characters, use UtfNormal::cleanUp(). This will +strip illegal UTF-8 sequences and characters that are illegal in XML, and +if necessary convert to normalization form C. + +Performance is kind of stinky in absolute terms, though it should be speedy +on pure ASCII text. ;) On text that can be determined quickly to already be +in NFC it's not too awful but it can quickly get uncomfortably slow, +particularly for Korean text (the hangul decomposition/composition code is +extra slow). + + +== Regenerating data tables == + +UtfNormalData.inc and UtfNormalDataK.inc are generated from the Unicode +Character Database by the script UtfNormalGenerate.php. On a *nix system +'make' should fetch the necessary files and regenerate it if the scripts +have been changed or you remove it. + + +== Testing == + +'make test' will run the conformance test (UtfNormalTest.php), fetching the +data from from the net if necessary. If it reports failure, something is +going wrong! + + +== Benchmarks == + +Run 'make bench' to download some sample texts from Wikipedia and run some +cheap benchmarks of some of the functions. Take all numbers with large +grains of salt. + + +== PHP module extension == + +There's an experimental PHP extension module which wraps the ICU library's +normalization functions. This is *MUCH* faster than doing this work in pure +PHP code. This is in the 'normal' directory in MediaWiki's CVS extensions +module. It is known to work with PHP 4.3.8 and 5.0.2 on Linux/x86 but hasn't +been thoroughly tested on other configurations. + +If the php_normal.so module is loaded in php.ini, the normalization functions +will automatically use it. If you can't (or don't want to) load it in php.ini, +you may be able to load it using the dl() function before include()ing or +require()ing UtfNormal.php, and it will be picked up. + -- cgit v1.2.3-54-g00ecf