PHP检测字符串是否为UTF8编码4种方法

作者：袖梨 2022-06-24

例子1

代码如下

复制代码

/**
* 检测字符串是否为UTF8编码
* @param string $str 被检测的字符串
* @return boolean
*/
function is_utf8($str){
$len = strlen($str);
for($i = 0; $i < $len; $i++){
$c = ord($str[$i]);
if ($c > 128) {
if (($c > 247)) return false;
elseif ($c > 239) $bytes = 4;
elseif ($c > 223) $bytes = 3;
elseif ($c > 191) $bytes = 2;
else return false;
if (($i + $bytes) > $len) return false;
while ($bytes > 1) {
$i++;
$b = ord($str[$i]);
if ($b < 128 || $b > 191) return false;
$bytes--;
}
}
}
return true;
}

例子2

代码如下

复制代码

function is_utf8($string) {
     return preg_match('%^(?:
             [＼x09＼x0A＼x0D＼x20-＼x7E]                 # ASCII
         | [＼xC2-＼xDF][＼x80-＼xBF]                 # non-overlong 2-byte
         |     ＼xE0[＼xA0-＼xBF][＼x80-＼xBF]             # excluding overlongs
         | [＼xE1-＼xEC＼xEE＼xEF][＼x80-＼xBF]{2}     # straight 3-byte
         |     ＼xED[＼x80-＼x9F][＼x80-＼xBF]             # excluding surrogates
         |     ＼xF0[＼x90-＼xBF][＼x80-＼xBF]{2}     # planes 1-3
         | [＼xF1-＼xF3][＼x80-＼xBF]{3}             # planes 4-15
         |     ＼xF4[＼x80-＼x8F][＼x80-＼xBF]{2}     # plane 16
     )*$%xs', $string);
}

准确率基本和mb_detect_encoding()一样，要对一起对，要错一起错。
编码检测不可能100%准确，这个东西已经可以基本满足要求了。

例子3

代码如下	复制代码
function mb_is_utf8($string) { return mb_detect_encoding($string, 'UTF-8') === 'UTF-8';//新发现 }

例子4

代码如下

复制代码

// Returns true if $string is valid UTF-8 and false otherwise.
function is_utf8($word)
{
if (preg_match("/^([".chr(228)."-".chr(233)."]{1}[".chr(128)."-".chr(191)."]{1}[".chr(128)."-".chr(191)."]{1}){1}/",$word) == true || preg_match("/([".chr(228)."-".chr(233)."]{1}[".chr(128)."-".chr(191)."]{1}[".chr(128)."-".chr(191)."]{1}){1}$/",$word) == true || preg_match("/([".chr(228)."-".chr(233)."]{1}[".chr(128)."-".chr(191)."]{1}[".chr(128)."-".chr(191)."]{1}){2,}/",$word) == true)
{
return true;
}
else
{
return false;
}
} // function is_utf8

PHP检测字符串是否为UTF8编码4种方法

相关文章

精彩推荐