PHP Documentation for: \SmartUnicode

<?php

// Usage example:
 SmartUnicode::some_method_of_this_class(...);

  //-----------------------------------------------------------------------------------------------------
  //-----------------------------------------------------------------------------------------------------
  // SAFE / MultiByte functions Reference with Replacements (Smart.Framework UTF-8)
  // MORE INFO AT: http://www.phpwact.org/php/i18n/utf-8
  //-----------------------------------------------------------------------------------------------------
  // FUNCTION NAME                BETTER / SAFER REPLACEMENT              STATUS      NOTICE
  //-----------------------------------------------------------------------------------------------------
  //--
  //(int)                         Smart::format_number_int()              [ok]        The replacement function have a second parameter to allow also unsigned integers
  //number_format()               Smart::format_number_dec()              [ok]        It is easier to use and rely on framework
  //htmlspecialchars()            Smart::escape_html()                    [ok]        The replacement function will take in count if unicode strings or not are used
  //--
  //mail()                        SmartUnicode::mailsend()                [ok]        The PHP mail() is not unicode safe
  //--
  //split() / str_split()         explode()                               [ok]        Use the explode() function ; avoid to use split() or str_split() because they are not binary safe an can break unicode strings
  //join()                        implode()                               [ok]        The join() is deprecated and alias to implode()
  //substr_replace()              str_replace()                           [ok]        It is not certified to be safe 100% with unicode strings ; try to use the the replacement function
  //str_ireplace()                * str_ireplace()                        [ok!]       Will fail if try to replace unicode accented characters if case differs (lower vs. upper)
  //--
  //substr_count()                SmartUnicode::substr_count()            [ok]        the PHP substr_count() is not unicode safe
  //strlen()                      SmartUnicode::str_len() / strlen()      [ok!]       the PHP strlen() is not unicode safe ; but for counting bytes in a string use always the strlen() ; for counting characters in a string always use SmartUnicode::str_len() in an unicode environment
  //substr()                      SmartUnicode::sub_str()                 [ok]        the PHP substr() is not unicode safe
  //--
  //strstr()                      SmartUnicode::str_str()                 [ok]        the PHP strstr() is not unicode safe
  //stristr()                     SmartUnicode::stri_str()                [ok]        the PHP stristr() is not unicode safe
  //--
  //strpos()                      SmartUnicode::str_pos()                 [ok]        the PHP strpos() is not unicode safe
  //stripos()                     SmartUnicode::str_ipos()                [ok]        the PHP stripos() is not unicode safe
  //strrpos()                     SmartUnicode::str_rpos()                [ok]        the PHP strrpos() is not unicode safe
  //strripos()                    SmartUnicode::str_ripos()               [ok]        the PHP strripos() is not unicode safe
  //--
  //strtolower()                  SmartUnicode::str_tolower()             [ok]        the PHP strtolower() is not unicode safe and will not make lower case the accented characters
  //strtoupper()                  SmartUnicode::str_toupper()             [ok]        the PHP strtoupper() is not unicode safe and will not make upper case the accented characters
  //--
  //utf8_decode()                 SmartUnicode::utf8_to_iso()             [ok]        it may break strings that are used in unicode environments thus the strings need to be re-encoded ; if not re-encoded back to unicode the regex \u will fail in strange modes ...
  //utf8_encode()                 SmartUnicode::iso_to_utf8()             [!!]        there is a risk to double encode the string and break it if is not ISO ; use just for ISO strings !!
  //wordwrap()                    SmartUnicode::word_wrap()               [ok]        the PHP wordwrap() is not unicode safe
  //strip_tags()                  Smart::stripTags()                      [ok+]       the PHP strip_tags() will not replace some extra things like &nbsp; and much other html entities
  //--
  //printf()                      * printf()                              [!+]        Will not take care of real multibyte string length and may return unexpected results
  //sprintf()                     * sprintf()                             [!+]        Will not take care of real multibyte string length and may return unexpected results
  //vsprintf()                    * vsprintf()                            [!+]        Will not take care of real multibyte string length and may return unexpected results
  //strcasecmp()                  * strcasecmp()                          [!+]        Will not take care of real multibyte string length and may return unexpected results
  //strcspn()                     * strcspn()                             [!+]        Will not take care of real multibyte string length and may return unexpected results
  //strspn()                      * strspn()                              [!+]        Will not take care of real multibyte string length and may return unexpected results
  //--
  //strtr()                                                               [!!]        Use only with non-unicode characters, will break the unicode strings if unicode characters are multibyte
  //strrev()                                                              [!!]        Use only with non-unicode characters, will break the unicode strings if unicode characters are multibyte
  //--
  //chunk_split()                 -                                       [!!-]       This really breaks unicode strings ... Use only with non-unicode characters, will break the unicode strings if unicode characters are multibyte
  //--
  //-----------------------------------------------------------------------------------------------------
  //---------------   DEPRECATED Functions, AVOID TO USE THEM ...         ---------------
  //---------------   They are not binary safe and will be OFF since PHP7 ---------------
  // Replacement hint: ereg("^(5|6)$", 'some value') [=] preg_match("/^(5|6)$/", 'some value')
  //-----------------------------------------------------------------------------------------------------
  //ereg()                        preg_match*()                           [ok]        Will work after regex pattern conversion ; If String is unicode, then using \u regex modifier is required
  //eregi()                       preg_match*()                           [ok]        Will work after regex pattern conversion ; If String is unicode, then using \u regex modifier is required
  //ereg_replace()                preg_replace*()                         [ok]        Will work after regex pattern conversion ; If String is unicode, then using \u regex modifier is required
  //eregi_replace()               preg_replace*()                         [ok]        Will work after regex pattern conversion ; If String is unicode, then using \u regex modifier is required
  //-----------------------------------------------------------------------------------------------------
  //-----------------------------------------------------------------------------------------------------

// #end php code

Class: SmartUnicode - provides the string util functions to work safe with Unicode (Multibyte) Strings / Characters.
Compatbile with: UTF-8 (Unicode), ISO-8859-1 (latin base), ISO-8859-* (latin extended, greek, cyrillic, ...), Japanese, ...
Take a look at the replacement table below.

Unicode Safe ucwords() :: Make a Unicode string uppercase on each word
Notice: this is only partial compatible with PHP ucwords() as it makes first letter of each word Upper while force lowercase on the rest of the word letters as it complies with MB_CASE_TITLE

Convert the CharSet Encoding of a String
NOTICE: If the charset to is different than UTF-8 (unicode) if using the string in this framework it must be re-converted
NOTICE: If used to convert to HTML-ENTITIES charset this function will consume a lot of memory and may run out of memory for large strings > 10% of memory_limit set in init.php

Safe Convert Unicode ISO to UTF-8 that can be also normalized from Unicode.
It will remove all invalid characters except latin1.
NOTICE: When $normalize is set to FALSE will do exactly as utf8_enc()
When $normalize is set to TRUE will assume the string is Unicode ISO and will first decode it

Safe Convert UTF-8 to ISO that can be also normalized to Unicode.
It will remove all invalid characters except latin1.
NOTICE: It converts the string back to unicode since all the strings in the framework are unicode (UTF-8) to avoid breaking the regex with \u over those strings !!!
Never use just single utf8_enc() when the framework is in UTF-8 mode, else the regex \u will fail over those strings ...

final class \SmartUnicode
{ } ::

class Methods

class Properties

class Constants

Sample code: PHP

final class \SmartUnicode { } ::

class Methods

class Properties

class Constants

Sample code: PHP

final class \SmartUnicode
{ } ::