Class: SmartUnicode - provides the string util functions to work safe with Unicode (Multibyte) Strings / Characters.
Compatbile with: UTF-8 (Unicode), ISO-8859-1 (latin base), ISO-8859-* (latin extended, greek, cyrillic, ...), Japanese, ...
Take a look at the replacement table below.
Language: PHP
Located at: lib/framework/lib_unicode.php
Package: @Core
Namespace: \
Class Name: SmartUnicode
Version: v.20230101
Depends: extensions: PHP MBString, PHP XML ; constants: SMART_FRAMEWORK_CHARSET, SMART_FRAMEWORK_SECURITY_FILTER_INPUT
Hints: You must make always a difference for what you are goint to do with a string. If you need bytes length, then use strlen() with all strings, includding unicode ! If you need instead the number of characters in a string, then use SmartUnicode::str_len() for all variables / strings you know (or you just suppose) that are unicode.
Usage: static object: Class::method() - This class provides only STATIC methods
class Methods
public staticfunctionstr_len (
string$ytext
) {} :: INT
@return: {INTEGER} The number of characters in a string
@param: {STRING} $ytext: The string
Unicode Safe strlen() :: Get string length as number of characters in the string, which may differ from number of bytes in a string if Unicode (Multibyte) string is used
@return: {STRING} The sub-string
@param: {STRING} $ystr: The string
@param: {INTEGER} $ystart: The start offset
@param: {INTEGER} $ylen: OPTIONAL :: The number of characters to use, starting from start offset
Unicode Safe substr() :: Get part of string
public staticfunctionsubstr_count (
string$ystr,
string$ysubstr
) {} :: INT
@return: {INTEGER} The number of times the piece sub-string occurs in the string being checked
@param: {STRING} $ystr: The string being checked
@param: {STRING} $ysubstr: The string to be found
Unicode Safe substr_count() :: Count the number of substring occurrences
@return: {STRING} The processed string with replacements if the needle is found
@param: {STRING} $needle: The sub-string being searched for
@param: {STRING} $replace: The replacement value that replaces found search values
@param: {STRING} $haystack: The string on which to make the replacements
@param: {INTEGER} $count: The number of replacements to operate
Unicode Safe str_replace() with Limit :: Replace a fixed (by count) number of occurrences of the search string with the replacement string
@return: {INTEGER} / FALSE :: The numeric position of the first occurrence of piece in the string. If not found, it returns FALSE
@param: {STRING} $ystr: The string to search in
@param: {STRING} $ysubstr: The sub-string to be found
@param: {INTEGER} $offset: The search offset. If it is not specified, 0 is used
Unicode Safe strpos() :: Find position of first occurrence of string in a string, Case Sensitive
@return: {INTEGER} / FALSE :: The numeric position of the first occurrence of piece in the string. If not found, it returns FALSE
@param: {STRING} $ystr: The string to search in
@param: {STRING} $ysubstr: The sub-string to be found
@param: {INTEGER} $offset: The search offset. If it is not specified, 0 is used
Unicode Safe stripos() :: Find position of first occurrence of string in a string, Case Insensitive
@return: {INTEGER} / FALSE :: The numeric position of the last occurrence of piece in the string. If not found, it returns FALSE
@param: {STRING} $ystr: The string to search in
@param: {STRING} $ysubstr: The sub-string to be found
@param: {INTEGER} $offset: The search offset. If it is not specified, 0 is used
Unicode Safe strrpos() :: Find position of last occurrence of string in a string, Case Sensitive
@return: {INTEGER} / FALSE :: The numeric position of the last occurrence of piece in the string or FALSE if not found
@param: {STRING} $ystr: The string to search in
@param: {STRING} $ysubstr: The sub-string to be found
@param: {INTEGER} $offset: The search offset. If it is not specified, 0 is used
Unicode Safe strripos() :: Find position of last occurrence of string in a string, Case Insensitive
@return: {STRING} / FALSE :: Returns the portion of string starting with first match or FALSE if not found
@param: {STRING} $ystring: The string to search in
@param: {STRING} $ypart: The sub-string to search for in the string
Unicode Safe strstr() :: Finds first occurrence of a string within another, Case Sensitive
@return: {STRING} / FALSE :: Returns the portion of string starting with first match or FALSE if not found
@param: {STRING} $ystring: The string to search in
@param: {STRING} $ypart: The sub-string to search for in the string
Unicode Safe stristr() :: Finds first occurrence of a string within another, Case Insensitive
public staticfunctionstr_contains (
string$ystring,
string$ypart
) {} :: BOOL
@return: {BOOLEAN} Returns TRUE if found or FALSE if not found
@param: {STRING} $ystring: The string to check
@param: {STRING} $ypart: The sub-string to search for in the string
Unicode Safe :: Check if a string contains another, Case Sensitive
public staticfunctionstr_icontains (
string$ystring,
string$ypart
) {} :: BOOL
@return: {BOOLEAN} Returns TRUE if found or FALSE if not found
@param: {STRING} $ystring: The string to check
@param: {STRING} $ypart: The sub-string to search for in the string
Unicode ~ Safe * :: Check if a string contains another, Case Insensitive
public staticfunctionstr_wordcount (
string$str
) {} :: INT
@return: {INTEGER} Returns the number of words found
@param: {STRING} $str: The string to find words into
Unicode ~ Safe * :: Find the aproximative number of words in an unicode string
public staticfunctionstr_tolower (
string$ystr
) {} :: STRING
@return: {STRING} The processed string as lowercase string
@param: {STRING} $ystr: The string
Unicode Safe strtolower() :: Make a Unicode string lowercase
public staticfunctionstr_toupper (
string$ystr
) {} :: STRING
@return: {STRING} The processed string as uppercase string
@param: {STRING} $ystr: The string
Unicode Safe strtoupper() :: Make a Unicode string uppercase
public staticfunctionuc_words (
string$ystr
) {} :: STRING
@return: {STRING} The processed string as uppercase on each word
@param: {STRING} $ystr: The string
Unicode Safe ucwords() :: Make a Unicode string uppercase on each word
Notice: this is only partial compatible with PHP ucwords() as it makes first letter of each word Upper while force lowercase on the rest of the word letters as it complies with MB_CASE_TITLE
public staticfunctionuc_first (
string$ystr
) {} :: STRING
@return: {STRING} The processed string as uppercase of first character string
@param: {STRING} $ystr: The string
Unicode Safe ucfirst() :: Make a Unicode string uppercase on first character
@return: {STRING} The processed string
@param: {STRING} $ystr: The string
@param: {ENUM} $ychar_from: Empty to detect / Select one or many from the list of: class::CONVERSION_IMPLICIT_CHARSETS
@param: {ENUM} $ychar_to: Empty to use the framework internal charset defined in SMART_FRAMEWORK_CHARSET / Select one of the: UTF-8, HTML-ENTITIES or another valid charset
@param: {BOOLEAN} $normalize: Normalize (Default is TRUE) - will normalize the string into the default framework charset else the string will be incompatible with the current encoding ... ; Using this to false must be use with very much attention !!!
Convert the CharSet Encoding of a String
NOTICE: If the charset to is different than UTF-8 (unicode) if using the string in this framework it must be re-converted
NOTICE: If used to convert to HTML-ENTITIES charset this function will consume a lot of memory and may run out of memory for large strings > 10% of memory_limit set in init.php
@return: {STRING} The fixed string
@param: {STRING} $str: The string
@param: {BOOL} $detect: If set to TRUE will force from the same unicode ; Default is FALSE
Safe Fix the string to contain only current charset.
All the unsafe characters will be replaced by: ?
This is not necessary after SmartUnicode::convert_charset() as it will already fix it
public staticfunctionutf8_enc (
string$str
) {} :: STRING
@return: {STRING} The UTF-8 encoded string
@param: {STRING} $str: An ISO-8859-1 string
Converts a string from ISO-8859-1 to UTF-8
Replacement for utf8_encode() which is deprecated since PHP 8.2
public staticfunctionutf8_dec (
string$str
) {} :: STRING
@return: {STRING} The decoded string as ISO-8859-1 having all invalid characters replaced with ?
@param: {STRING} $str: An UTF-8 string
Converts a string from UTF-8 to ISO-8859-1, replacing invalid or unrepresentable characters with ?
Replacement for utf8_decode() which is deprecated since PHP 8.2
@return: {STRING} The processed string
@param: {STRING} $str: The string
@param: {BOOLEAN} $normalize: Normalize (Default is FALSE) - will normalize the string from the default framework charset else the string will be incompatible with the current encoding ... ; Using this to true must be use with very much attention, depending by context !!!
Safe Convert Unicode ISO to UTF-8 that can be also normalized from Unicode.
It will remove all invalid characters except latin1.
NOTICE: When $normalize is set to FALSE will do exactly as utf8_enc()
When $normalize is set to TRUE will assume the string is Unicode ISO and will first decode it
@return: {STRING} The processed string
@param: {STRING} $str: The string
@param: {BOOLEAN} $normalize: Normalize (Default is TRUE) - will normalize the string into the default framework charset else the string will be incompatible with the current encoding ... ; Using this to false must be use with very much attention, depending by context !!!
Safe Convert UTF-8 to ISO that can be also normalized to Unicode.
It will remove all invalid characters except latin1.
NOTICE: It converts the string back to unicode since all the strings in the framework are unicode (UTF-8) to avoid breaking the regex with \u over those strings !!!
Never use just single utf8_enc() when the framework is in UTF-8 mode, else the regex \u will fail over those strings ...
@return: {STRING} The processed string
@param: {STRING} $str: The string
@param: {BOOLEAN} $normalize: Normalize (Default is TRUE) - will normalize the string by forcing the ISO-8859-1 character set
De-Accent a latin-based Unicode string :: will convert all accented characters in UTF-8 / ISO-8859-* with their unnaccented versions into ISO-8859-1
@return: {STRING} The processed string
@param: {STRING} $str: The string
@param: {ENUM} $encoding: A valid MB Encoding (Ex: UTF-8, ISO-8859-1, ...) or empty string to try detect
@param: {BOOL} $normalize: Default is FALSE ; if TRUE will normalize the conversion by forcing all ISO-8859-1 (may break some remaining UTF-8 characters)
Safe Convert the Unicode Accented Characters to Safe HTML Entities
@return: {STRING} The processed string
@param: {STRING} $str: The string to be wrapped
@param: {INTEGER+} $width: OPTIONAL :: The number of characters at which the string will be wrapped ; Min:1 ; Max:2048 ; Default:75
@param: {STRING} $break: OPTIONAL :: The line is broken using the optional break parameter ; Default is: \n ; will also add the optional visual break by default '¬'
@param: {BOOLEAN} $cut: OPTIONAL :: If the cut is set to TRUE, the string is always wrapped at or before the specified width ; When FALSE the function dose not split the word until the end of the word even if the width is smaller than the word width. Default is FALSE.
@param: {STRING} $visualbreak: OPTIONAL :: Visual Break String Character ; Default is '¬'
Unicode Safe wordwrap() :: Wraps a string to a given number of characters
//----------------------------------------------------------------------------------------------------- //----------------------------------------------------------------------------------------------------- // SAFE / MultiByte functions Reference with Replacements (Smart.Framework UTF-8) // MORE INFO AT: http://www.phpwact.org/php/i18n/utf-8 //----------------------------------------------------------------------------------------------------- // FUNCTION NAME BETTER / SAFER REPLACEMENT STATUS NOTICE //----------------------------------------------------------------------------------------------------- //-- //(int) Smart::format_number_int() [ok] The replacement function have a second parameter to allow also unsigned integers //number_format() Smart::format_number_dec() [ok] It is easier to use and rely on framework //htmlspecialchars() Smart::escape_html() [ok] The replacement function will take in count if unicode strings or not are used //-- //mail() SmartUnicode::mailsend() [ok] The PHP mail() is not unicode safe //-- //split() / str_split() explode() [ok] Use the explode() function ; avoid to use split() or str_split() because they are not binary safe an can break unicode strings //join() implode() [ok] The join() is deprecated and alias to implode() //substr_replace() str_replace() [ok] It is not certified to be safe 100% with unicode strings ; try to use the the replacement function //str_ireplace() * str_ireplace() [ok!] Will fail if try to replace unicode accented characters if case differs (lower vs. upper) //-- //substr_count() SmartUnicode::substr_count() [ok] the PHP substr_count() is not unicode safe //strlen() SmartUnicode::str_len() / strlen() [ok!] the PHP strlen() is not unicode safe ; but for counting bytes in a string use always the strlen() ; for counting characters in a string always use SmartUnicode::str_len() in an unicode environment //substr() SmartUnicode::sub_str() [ok] the PHP substr() is not unicode safe //-- //strstr() SmartUnicode::str_str() [ok] the PHP strstr() is not unicode safe //stristr() SmartUnicode::stri_str() [ok] the PHP stristr() is not unicode safe //-- //strpos() SmartUnicode::str_pos() [ok] the PHP strpos() is not unicode safe //stripos() SmartUnicode::str_ipos() [ok] the PHP stripos() is not unicode safe //strrpos() SmartUnicode::str_rpos() [ok] the PHP strrpos() is not unicode safe //strripos() SmartUnicode::str_ripos() [ok] the PHP strripos() is not unicode safe //-- //strtolower() SmartUnicode::str_tolower() [ok] the PHP strtolower() is not unicode safe and will not make lower case the accented characters //strtoupper() SmartUnicode::str_toupper() [ok] the PHP strtoupper() is not unicode safe and will not make upper case the accented characters //-- //utf8_decode() SmartUnicode::utf8_to_iso() [ok] it may break strings that are used in unicode environments thus the strings need to be re-encoded ; if not re-encoded back to unicode the regex \u will fail in strange modes ... //utf8_encode() SmartUnicode::iso_to_utf8() [!!] there is a risk to double encode the string and break it if is not ISO ; use just for ISO strings !! //wordwrap() SmartUnicode::word_wrap() [ok] the PHP wordwrap() is not unicode safe //strip_tags() Smart::stripTags() [ok+] the PHP strip_tags() will not replace some extra things like and much other html entities //-- //printf() * printf() [!+] Will not take care of real multibyte string length and may return unexpected results //sprintf() * sprintf() [!+] Will not take care of real multibyte string length and may return unexpected results //vsprintf() * vsprintf() [!+] Will not take care of real multibyte string length and may return unexpected results //strcasecmp() * strcasecmp() [!+] Will not take care of real multibyte string length and may return unexpected results //strcspn() * strcspn() [!+] Will not take care of real multibyte string length and may return unexpected results //strspn() * strspn() [!+] Will not take care of real multibyte string length and may return unexpected results //-- //strtr() [!!] Use only with non-unicode characters, will break the unicode strings if unicode characters are multibyte //strrev() [!!] Use only with non-unicode characters, will break the unicode strings if unicode characters are multibyte //-- //chunk_split() - [!!-] This really breaks unicode strings ... Use only with non-unicode characters, will break the unicode strings if unicode characters are multibyte //-- //----------------------------------------------------------------------------------------------------- //--------------- DEPRECATED Functions, AVOID TO USE THEM ... --------------- //--------------- They are not binary safe and will be OFF since PHP7 --------------- // Replacement hint: ereg("^(5|6)$", 'some value') [=] preg_match("/^(5|6)$/", 'some value') //----------------------------------------------------------------------------------------------------- //ereg() preg_match*() [ok] Will work after regex pattern conversion ; If String is unicode, then using \u regex modifier is required //eregi() preg_match*() [ok] Will work after regex pattern conversion ; If String is unicode, then using \u regex modifier is required //ereg_replace() preg_replace*() [ok] Will work after regex pattern conversion ; If String is unicode, then using \u regex modifier is required //eregi_replace() preg_replace*() [ok] Will work after regex pattern conversion ; If String is unicode, then using \u regex modifier is required //----------------------------------------------------------------------------------------------------- //-----------------------------------------------------------------------------------------------------
// #end php code
documentation generated on: 2023-10-19 23:15:42 +0000
Compatbile with: UTF-8 (Unicode), ISO-8859-1 (latin base), ISO-8859-* (latin extended, greek, cyrillic, ...), Japanese, ...
Take a look at the replacement table below.