For testing purposes, it is very useful sometimes to generate some random content in a mysql table. Such table simulates a real table with content generated in long time. For instance, suppose you have a mysql table with comments from users. Each day, users post a number of comments. Thus, i a year or so, there have been accumulated many comments (let’s say, about 100.000 comments). But you would like to test some database queries in a real environment before users can post their comments. So, this post will explain how to generate a big table with comments (automatically generated) using mysql stored routines.
First, I’ve got an English dictionary with words, which I’ve imported into a simple MySQL table with structure described below:
1 2 3 4 5 6 7 8 9 10
| CREATE TABLE `dict_en` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`word` VARCHAR(50) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
INDEX `word` (`word`)
)
COMMENT='English dictionary'
COLLATE='utf8_general_ci'
ENGINE=MyISAM
ROW_FORMAT=DEFAULT; |
Second, I’ve created a simple table that will host so-called articles (randomly created):
1 2 3 4 5 6 7 8 9
| CREATE TABLE `articles` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`content` MEDIUMTEXT NULL,
PRIMARY KEY (`id`)
)
COMMENT='articole'
COLLATE='utf8_general_ci'
ENGINE=MyISAM
ROW_FORMAT=DEFAULT; |
Now, you should import a simple text file with dictionary data, each word on a single line. This could be done with LOAD DATA INFILE command like this:
1 2 3 4
| LOAD DATA INFILE 'my_dictionary_file.txt' INTO TABLE dict_en
FIELDS TERMINATED BY "\t"
LINES TERMINATED BY "\n"
(`word`); |
At this moment, we use the dictionary table to generate some variable-width random generated text that simulated a comment. For this example, let’s generate a comment with a random number of words between 1 and 1000, words separated by SPACE. This can be accomplished with MySQL function described below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| CREATE FUNCTION `create_content_by_wordlist`()
RETURNS mediumtext
LANGUAGE SQL
DETERMINISTIC
COMMENT 'Creates a text with random words from dictionary table'
BEGIN
DECLARE retVal MEDIUMTEXT DEFAULT '';
DECLARE iter INT(10) DEFAULT 0;
DECLARE randLength INT(10) DEFAULT 0;
SET randLength = FLOOR(1000 * rand());
SET iter=0;
SET retVal = '';
SELECT COUNT(`id`) INTO @cntWL FROM dict_en;
WHILE (iter<randLength) DO
SET @curId = FLOOR(@cntWL * RAND());
SELECT word INTO @wrd FROM dict_en WHERE `id`=@curId LIMIT 1;
SET retVal = CONCAT(retVal,' ', @wrd);
SET iter = iter+1;
END WHILE;
RETURN retVal;
END |
Now it’s time to fill the comments table, which is named “articles” for simplicity. Next stored routine starts filling that table with content. It accepts a single parameter: number of comments (or articles):
1 2 3 4 5 6 7 8 9 10 11 12 13
| CREATE PROCEDURE `create_articles`(IN `_records` INT)
LANGUAGE SQL
NOT DETERMINISTIC
COMMENT 'Fill articles table'
BEGIN
DECLARE contor INT(10) DEFAULT 0;
SET contor = 0;
# TRUNCATE TABLE articles;
WHILE (contor<_records) DO
INSERT INTO articles SET `content`=create_content_by_wordlist();
SET contor = contor+1;
END WHILE;
END |
All prepared now! So, let’s put those stored routines to work. All you have to do is just create table for dictionary, fill it with data (I’ve used a text file with 350k english words). Finally, put mysql to hard work: run create_articles() procedure:
1
| CALL create_articles(10000); |
Be aware, a greater number of articles make articles table bigger. In my situation, for 250k articles randomly generated, the articles table has about 5G in size. It is a good situation for testing large MySQL tables. You could use indexes, partitions or other engines to test different study cases.