A very common problem for web developers is to maintain a normalized link (a string to identify a page) for their webpages when that string is from a foreign language that contains diacritic. For example, the string “ştiaţi că…” (in Romanian means “did you know”), should be converted to “stiati ca…”, so romanian diacritics are converted to it’s latin representation. That means that “ă” becomes “a”, “ş” becomes “s” and “ţ” becomes “t”. This transformation is useful because some search engines and some browsers don’t get well with these diacritics.
There is a mysql stored routine to convert such characters. That procedure uses a mysql table that contain about 400 characters and their latin representation. This is a mapping table between different alphabets and latin alphabet. The table has been created based on standard mysql charsets (see Collation chart for utf8_general_ci, European alphabets). Procedure listing is shown below. There is a text file with SQL queries to create utf8_translation and it’s content.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | CREATE FUNCTION `util_utf8_to_ascii`(`_str` VARCHAR(255)) RETURNS varchar(255) LANGUAGE SQL SQL SECURITY INVOKER COMMENT 'Convert UTF8 to ASCII based on utf8_translation' BEGIN DECLARE cutf8 VARCHAR(5); DECLARE cascii CHAR(1); DECLARE newStr VARCHAR(255); DECLARE done INT DEFAULT 0; DECLARE cur1 CURSOR FOR SELECT u.c_utf8, u.c_ascii FROM utf8_translation AS u; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1; OPEN cur1; SET newStr = _str; read_loop: LOOP FETCH cur1 INTO cutf8, cascii; IF done THEN LEAVE read_loop; END IF; IF (LOCATE(cutf8, _str)) THEN SET newStr = REPLACE(newStr, cutf8, cascii); END IF; END LOOP; RETURN newStr; END |

