diff options
Diffstat (limited to 'includes/normal/README')
-rw-r--r-- | includes/normal/README | 59 |
1 files changed, 0 insertions, 59 deletions
diff --git a/includes/normal/README b/includes/normal/README deleted file mode 100644 index 0f718d2c..00000000 --- a/includes/normal/README +++ /dev/null @@ -1,59 +0,0 @@ -This directory contains some Unicode normalization routines. These routines -are meant to be reusable in other projects, so I'm not tying them to the -MediaWiki utility functions. - -The main function to care about is UtfNormal::toNFC(); this will convert -a given UTF-8 string to Normalization Form C if it's not already such. -The function assumes that the input string is already valid UTF-8; if there -are corrupt characters this may produce erroneous results. - -To also check for illegal characters, use UtfNormal::cleanUp(). This will -strip illegal UTF-8 sequences and characters that are illegal in XML, and -if necessary convert to normalization form C. - -Performance is kind of stinky in absolute terms, though it should be speedy -on pure ASCII text. ;) On text that can be determined quickly to already be -in NFC it's not too awful but it can quickly get uncomfortably slow, -particularly for Korean text (the hangul decomposition/composition code is -extra slow). - - -== Regenerating data tables == - -UtfNormalData.inc and UtfNormalDataK.inc are generated from the Unicode -Character Database by the script UtfNormalGenerate.php. On a *nix system -'make' should fetch the necessary files and regenerate it if the scripts -have been changed or you remove it. - - -== Testing == - -'make test' will run the conformance test (UtfNormalTest.php), fetching the -data from from the net if necessary. If it reports failure, something is -going wrong! - -You may have to set up PHPUnit first. - -$ pear channel-discover pear.phpunit.de -$ pear install phpunit/PHPUnit - -== Benchmarks == - -Run 'make bench' to download some sample texts from Wikipedia and run some -cheap benchmarks of some of the functions. Take all numbers with large -grains of salt. - - -== PHP module extension == - -There's an experimental PHP extension module which wraps the ICU library's -normalization functions. This is *MUCH* faster than doing this work in pure -PHP code. This is at https://git.wikimedia.org/summary/mediawiki%2Fextensions%2Fnormal.git. -It is used by the WMF, which currently runs PHP 5.3.10 on Linux. It hasn't been -thoroughly tested on other configurations, but may work. - -If the php_normal.so module is loaded in php.ini, the normalization functions -will automatically use it. If you can't (or don't want to) load it in php.ini, -you may be able to load it using the dl() function before the inclusion of -UtfNormal.php, and it will be picked up. - |