Tuesday, April 5, 2011

Migrating an Adjacency List

Adjacency lists and modified preorder trees are two ways of storing nested hierarchies in a relational database. For some good background on these structures, see here.

So the adjacency list is probably one that's intuitive and easy to come up with if you are to to the field. However, once you learn the benefits of the preorder tree, you'll probably want to migrate. But the migration is not so easy as a few queries -- you'll need an algorithm to do so, and a recursive one at that.

Here's a couple stored procedures for MySQL that will migrate data from an adjacency list to a modified preorder traversal tree:


First I'll set up an adjacency list (language family data from wikipedia):

 CREATE TABLE IF NOT EXISTS `language_family_adj_list` (
`language_id` int(11) NOT NULL auto_increment,
`language` varchar(20) NOT NULL,
`parent_id` int(11) default NULL,
PRIMARY KEY (`language_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=41 ;


INSERT INTO `language_family_adj_list` (`language_id`, `language`, `parent_id`) VALUES
(1, 'Finno-Ugric', NULL),
(2, 'Hungarian', 1),
(3, 'Khanty', 1),
(4, 'Mansi', 1),
(5, 'Permic', 1),
(6, 'Mari', 1),
(7, 'Mordvinic', 1),
(8, 'Sami', 1),
(9, 'Baltic-Finnic', 1),
(10, 'Komi', 5),
(11, 'Komi-Permyak', 5),
(12, 'Udmurt', 5),
(13, 'Erzya', 7),
(14, 'Moksha', 7),
(15, 'Western Sami', 8),
(16, 'Eastern Sami', 8),
(17, 'Southern Sami', 15),
(18, 'Umi Sami', 15),
(19, 'Lule Sami', 15),
(20, 'Pite Sami', 15),
(22, 'Northern Sami', 15),
(23, 'Kemi Sami', 16),
(24, 'Inari Sami', 16),
(25, 'Akkala Sami', 16),
(26, 'Kildin Sami', 16),
(27, 'Skolt Sami', 16),
(28, 'Ter Sami', 16),
(29, 'Estonian', 9),
(30, 'Finnish', 9),
(31, 'Ingrian', 9),
(32, 'Karelian', 9),
(33, 'Livonian', 9),
(34, 'Veps', 9),
(35, 'Votic', 9),
(36, 'South Estonian', 29),
(37, 'Voro', 36),
(38, 'Karelian Proper', 32),
(39, 'Lude', 32),
(40, 'Olonets Karelian', 32);


Here's a query to demonstrate:

mysql> SELECT t1.language AS lev1, t2.language as lev2, t3.language as lev3, t4.language as lev4, t5.language AS lev5
-> FROM language_family_adj_list AS t1
-> LEFT JOIN language_family_adj_list AS t2 ON t2.parent_id = t1.language_id
-> LEFT JOIN language_family_adj_list AS t3 ON t3.parent_id = t2.language_id
-> LEFT JOIN language_family_adj_list AS t4 ON t4.parent_id = t3.language_id
-> LEFT JOIN language_family_adj_list AS t5 ON t5.parent_id = t4.language_id
-> WHERE t1.parent_id IS NULL
-> ORDER BY t1.language, t2.language, t3.language, t4.language, t5.language;
+-------------+---------------+--------------+------------------+------+
| lev1 | lev2 | lev3 | lev4 | lev5 |
+-------------+---------------+--------------+------------------+------+
| Finno-Ugric | Baltic-Finnic | Estonian | South Estonian | Voro |
| Finno-Ugric | Baltic-Finnic | Finnish | NULL | NULL |
| Finno-Ugric | Baltic-Finnic | Ingrian | NULL | NULL |
| Finno-Ugric | Baltic-Finnic | Karelian | Karelian Proper | NULL |
| Finno-Ugric | Baltic-Finnic | Karelian | Lude | NULL |
| Finno-Ugric | Baltic-Finnic | Karelian | Olonets Karelian | NULL |
| Finno-Ugric | Baltic-Finnic | Livonian | NULL | NULL |
| Finno-Ugric | Baltic-Finnic | Veps | NULL | NULL |
| Finno-Ugric | Baltic-Finnic | Votic | NULL | NULL |
| Finno-Ugric | Hungarian | NULL | NULL | NULL |
| Finno-Ugric | Khanty | NULL | NULL | NULL |
| Finno-Ugric | Mansi | NULL | NULL | NULL |
| Finno-Ugric | Mari | NULL | NULL | NULL |
| Finno-Ugric | Mordvinic | Erzya | NULL | NULL |
| Finno-Ugric | Mordvinic | Moksha | NULL | NULL |
| Finno-Ugric | Permic | Komi | NULL | NULL |
| Finno-Ugric | Permic | Komi-Permyak | NULL | NULL |
| Finno-Ugric | Permic | Udmurt | NULL | NULL |
| Finno-Ugric | Sami | Eastern Sami | Akkala Sami | NULL |
| Finno-Ugric | Sami | Eastern Sami | Inari Sami | NULL |
| Finno-Ugric | Sami | Eastern Sami | Kemi Sami | NULL |
| Finno-Ugric | Sami | Eastern Sami | Kildin Sami | NULL |
| Finno-Ugric | Sami | Eastern Sami | Skolt Sami | NULL |
| Finno-Ugric | Sami | Eastern Sami | Ter Sami | NULL |
| Finno-Ugric | Sami | Western Sami | Lule Sami | NULL |
| Finno-Ugric | Sami | Western Sami | Northern Sami | NULL |
| Finno-Ugric | Sami | Western Sami | Pite Sami | NULL |
| Finno-Ugric | Sami | Western Sami | Southern Sami | NULL |
| Finno-Ugric | Sami | Western Sami | Umi Sami | NULL |
+-------------+---------------+--------------+------------------+------+
29 rows in set (0.00 sec)


So here's the modified preorder traversal tree table schema:


CREATE TABLE language_family_mptt (
language VARCHAR(30) NOT NULL,
lft INT NOT NULL,
rgt INT NOT NULL
) COLLATE utf8;


And then here's the recursive stored procedure to migrate the data:


TRUNCATE TABLE language_family_mptt;
SET max_sp_recursion_depth = 255;
DROP PROCEDURE IF EXISTS insert_branches;
DROP PROCEDURE IF EXISTS start_tree;
DELIMITER ~~

CREATE PROCEDURE start_tree()
BEGIN
DECLARE language_field VARCHAR(100);
DECLARE done INT DEFAULT 0;
DECLARE insert_id INT;
DECLARE source_id INT;

DECLARE cursor1 CURSOR FOR SELECT language, language_id FROM language_family_adj_list WHERE parent_id IS NULL ORDER BY language;

DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

OPEN cursor1;
read_loop: LOOP

SET @my_left = 1;

FETCH cursor1 INTO language_field, source_id;
INSERT INTO language_family_mptt ( language, lft ) VALUES ( language_field, 1 );

CALL insert_branches( source_id );

UPDATE language_family_mptt SET rgt = @my_left + 1 WHERE lft = 1 AND rgt = 0;

IF done THEN
LEAVE read_loop;
END IF;

END LOOP;
CLOSE cursor1;

END; ~~

CREATE PROCEDURE insert_branches( IN source_parent_id INT )
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE next_source_parent_id INT DEFAULT NULL;
DECLARE orig_left INT DEFAULT NULL;
DECLARE language_field VARCHAR(100);
DECLARE cursor1 CURSOR FOR SELECT language_id, language FROM language_family_adj_list WHERE parent_id = source_parent_id ORDER BY language;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

OPEN cursor1;

read_loop: LOOP

FETCH cursor1 INTO next_source_parent_id, language_field;

IF done THEN
LEAVE read_loop;
END IF;

SET @my_left = @my_left + 1;

INSERT INTO language_family_mptt ( language, lft ) VALUES ( language_field, @my_left );

SET orig_left = @my_left;

CALL insert_branches( next_source_parent_id );

UPDATE language_family_mptt SET rgt = @my_left + 1 WHERE lft = orig_left AND rgt = 0 ;

SET @my_left = @my_left + 1;

END LOOP;
CLOSE cursor1;
END; ~~

DELIMITER ;


And here's the results:


mysql> SELECT CONCAT( REPEAT( ' ', (COUNT(parent.language) - 1) ), node.language) AS name FROM language_family_mptt AS node, language_family_mptt AS parent WHERE node.lft BETWEEN parent.lft AND parent.rgt GROUP BY node.language ORDER BY node.lft;
+---------------------------------+
| name |
+---------------------------------+
| Finno-Ugric |
| Baltic-Finnic |
| Estonian |
| South Estonian |
| Voro |
| Finnish |
| Ingrian |
| Karelian |
| Karelian Proper |
| Lude |
| Olonets Karelian |
| Livonian |
| Veps |
| Votic |
| Hungarian |
| Khanty |
| Mansi |
| Mari |
| Mordvinic |
| Erzya |
| Moksha |
| Permic |
| Komi |
| Komi-Permyak |
| Udmurt |
| Sami |
| Eastern Sami |
| Akkala Sami |
| Inari Sami |
| Kemi Sami |
| Kildin Sami |
| Skolt Sami |
| Ter Sami |
| Western Sami |
| Lule Sami |
| Northern Sami |
| Pite Sami |
| Southern Sami |
| Umi Sami |
+---------------------------------+
39 rows in set (0.00 sec)


:D