Florin Chiş

MySQL snippets, tips and tricks

  • Home
  • MySQL Tips
  • Quiz time ?
  • MySQL Tools
  • About me
RSS

Convert UTF8 diacritics to ASCII in MySQL for use in website links

Posted on 28/12/2011 by admin
Comments off

A very common problem for web developers is to maintain a normalized link (a string to identify a page) for their webpages when that string is from a foreign language that contains diacritic. For example, the string “ştiaţi că…” (in Romanian means “did you know”), should be converted to “stiati ca…”, so romanian diacritics are converted to it’s latin representation. That means that “ă” becomes “a”, “ş” becomes “s” and “ţ” becomes “t”. This transformation is useful because some search engines and some browsers don’t get well with these diacritics.
There is a mysql stored routine to convert such characters. That procedure uses a mysql table that contain about 400 characters and their latin representation. This is a mapping table between different alphabets and latin alphabet. The table has been created based on standard mysql charsets (see Collation chart for utf8_general_ci, European alphabets). Procedure listing is shown below. There is a text file with SQL queries to create utf8_translation and it’s content.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
CREATE FUNCTION `util_utf8_to_ascii`(`_str` VARCHAR(255))
    RETURNS varchar(255)
    LANGUAGE SQL
    SQL SECURITY INVOKER
    COMMENT 'Convert UTF8 to ASCII based on utf8_translation'
BEGIN
    DECLARE cutf8 VARCHAR(5);
    DECLARE cascii CHAR(1);
    DECLARE newStr VARCHAR(255);
    DECLARE done INT DEFAULT 0;
    DECLARE cur1 CURSOR FOR SELECT u.c_utf8, u.c_ascii FROM utf8_translation AS u;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
    OPEN cur1;
    SET newStr = _str;
    read_loop: LOOP
        FETCH cur1 INTO cutf8, cascii;
        IF done THEN
            LEAVE read_loop;
        END IF;
        IF (LOCATE(cutf8, _str)) THEN
            SET newStr = REPLACE(newStr, cutf8, cascii);
        END IF;
    END LOOP;
    RETURN newStr;
END
Categories: MySQL small projects | Tags: conversion, latin, link, procedure, translation, utf8

Generate table with random text from dictionary

Posted on 27/12/2011 by admin
43 comments

For testing purposes, it is very useful sometimes to generate some random content in a mysql table. Such table simulates a real table with content generated in long time. For instance, suppose you have a mysql table with comments from users. Each day, users post a number of comments. Thus, i a year or so, there have been accumulated many comments (let’s say, about 100.000 comments). But you would like to test some database queries in a real environment before users can post their comments. So, this post will explain how to generate a big table with comments (automatically generated) using mysql stored routines.

First, I’ve got an English dictionary with words, which I’ve imported into a simple MySQL table with structure described below:

1
2
3
4
5
6
7
8
9
10
CREATE TABLE `dict_en` (
    `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
    `word` VARCHAR(50) NOT NULL DEFAULT '',
    PRIMARY KEY (`id`),
    INDEX `word` (`word`)
)
COMMENT='English dictionary'
COLLATE='utf8_general_ci'
ENGINE=MyISAM
ROW_FORMAT=DEFAULT;

Second, I’ve created a simple table that will host so-called articles (randomly created):

1
2
3
4
5
6
7
8
9
CREATE TABLE `articles` (
    `id` BIGINT(20) NOT NULL AUTO_INCREMENT,
    `content` MEDIUMTEXT NULL,
    PRIMARY KEY (`id`)
)
COMMENT='articole'
COLLATE='utf8_general_ci'
ENGINE=MyISAM
ROW_FORMAT=DEFAULT;

Now, you should import a simple text file with dictionary data, each word on a single line. This could be done with LOAD DATA INFILE command like this:

1
2
3
4
LOAD DATA INFILE 'my_dictionary_file.txt' INTO TABLE dict_en
    FIELDS TERMINATED BY "\t"
    LINES TERMINATED BY "\n"
    (`word`);

At this moment, we use the dictionary table to generate some variable-width random generated text that simulated a comment. For this example, let’s generate a comment with a random number of words between 1 and 1000, words separated by SPACE. This can be accomplished with MySQL function described below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
CREATE FUNCTION `create_content_by_wordlist`()
    RETURNS mediumtext
    LANGUAGE SQL
    DETERMINISTIC
    COMMENT 'Creates a text with random words from dictionary table'
BEGIN
    DECLARE retVal MEDIUMTEXT DEFAULT '';
    DECLARE iter INT(10) DEFAULT 0;
    DECLARE randLength INT(10) DEFAULT 0;
    SET randLength = FLOOR(1000 * rand());
    SET iter=0;
    SET retVal = '';
    SELECT COUNT(`id`) INTO @cntWL FROM dict_en;
    WHILE (iter<randLength) DO
        SET @curId = FLOOR(@cntWL * RAND());
        SELECT word INTO @wrd FROM dict_en WHERE `id`=@curId LIMIT 1;
        SET retVal = CONCAT(retVal,' ', @wrd);
        SET iter = iter+1;
    END WHILE;
    RETURN retVal;
END

Now it’s time to fill the comments table, which is named “articles” for simplicity. Next stored routine starts filling that table with content. It accepts a single parameter: number of comments (or articles):

1
2
3
4
5
6
7
8
9
10
11
12
13
CREATE PROCEDURE `create_articles`(IN `_records` INT)
    LANGUAGE SQL
    NOT DETERMINISTIC
    COMMENT 'Fill articles table'
BEGIN
    DECLARE contor INT(10) DEFAULT 0;
    SET contor = 0;
#    TRUNCATE TABLE articles;
    WHILE (contor<_records) DO
        INSERT INTO articles SET `content`=create_content_by_wordlist();
        SET contor = contor+1;
    END WHILE;
END

All prepared now! So, let’s put those stored routines to work. All you have to do is just create table for dictionary, fill it with data (I’ve used a text file with 350k english words). Finally, put mysql to hard work: run create_articles() procedure:

1
CALL create_articles(10000);

Be aware, a greater number of articles make articles table bigger. In my situation, for 250k articles randomly generated, the articles table has about 5G in size. It is a good situation for testing large MySQL tables. You could use indexes, partitions or other engines to test different study cases.

Categories: MySQL small projects | Tags: article, dictionary, generate, mysql, procedure, random, text, word

Test

Posted on 24/12/2011 by admin
36 comments

WBVCBXUJM4R6

Categories: General

Repair (or optimize) all tables from a database

Posted on 23/12/2011 by admin
26 comments

This stored routime fetches all tables from database, then executes an operation of REPAIR (you could also use OPTIMIZE command also) for each one. It operates on tables in database in which procedure has been defined.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
CREATE PROCEDURE `tools_repair_all_tables`()
    LANGUAGE SQL
    SQL SECURITY INVOKER
BEGIN
    DECLARE endloop INT DEFAULT 0;
    DECLARE tableName char(100);
    DECLARE rCursor CURSOR FOR SELECT `TABLE_NAME` FROM `information_schema`.`TABLES` WHERE `TABLE_SCHEMA`=DATABASE();
    DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET endloop=1;
    OPEN rCursor;
    FETCH rCursor INTO tableName;
    WHILE endloop = 0 DO
        SET @sql = CONCAT('REPAIR TABLE `', tableName, '` QUICK;');
        PREPARE statement FROM @sql;
        EXECUTE statement;
        FETCH rCursor INTO tableName;
    END WHILE;
    CLOSE rCursor;
END
Categories: Mysql stored routines | Tags: automatic, database, mysql, repair, script, tables

Padding a string with a char to complete fixed length width

Posted on 22/12/2011 by admin
18 comments

Sometimes it is useful to add fill a string with a character to fit a specific width. Below is shown a mysql stored routine that accomplishes this. For instance, if you have number “946″ and you need to fill with zeroes before so it fits 5 chars, then function should return “00946″.
It’s parameters are:
_val: input string
_paddchar: char used for padding
_fullLength: specify full length of output string.
Here is the listing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
CREATE FUNCTION `tools_padd_char`(`_val` VARCHAR(255), `_paddchar` VARCHAR(255), `_fullLength` INT(10))
RETURNS VARCHAR(255)
LANGUAGE SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE retVal VARCHAR(255) DEFAULT '';
    DECLARE repetitions INT(10) DEFAULT 0;
    SET retVal = _val;
    SET repetitions = _fullLength - LENGTH(_val);
    IF (repetitions > 0) THEN
        WHILE (repetitions > 0) DO
            SET retVal = CONCAT(_paddchar, retVal);
            SET repetitions = repetitions - 1;
        END WHILE;
    END IF;
    RETURN retVal;
END
Categories: Mysql stored routines | Tags: char, fixed, length, padding
  • Search

  • Recent posts

    • Convert UTF8 diacritics to ASCII in MySQL for use in website links
    • Generate table with random text from dictionary
    • Test
    • Repair (or optimize) all tables from a database
    • Padding a string with a char to complete fixed length width
  • What about you ?

    Do you create stored routines in MySQL ?

    View Results

    Loading ... Loading ...
    • Polls Archive
  • Categories

    • General
    • MySQL small projects
    • Mysql stored routines
  • Newsletter

    Want to receive articles in your mailbox ?

  • Meteo

    meteoblue.com
  • Legături

    • Cosmin Roman blog
    • Romania Maps
  • Tags

    article automatic char conversion database dictionary fixed generate latin length link mysql padding procedure random repair script tables text translation utf8 word

    WP Cumulus Flash tag cloud by Roy Tanck requires Flash Player 9 or better.

  • Archive

    • decembrie 2011
  • New comments

    • Parteneri

      Bloguri, Bloggeri si Cititori
      Blogarama - The Blog Directory
    Mysql tips, tricks and tests. Proudly Powered by WordPress | Nest Theme by YChong