This is one of a series of blogs related to the automated conversion of PL/SQL to PL/pgSQL.
One of the Oracle stored procedure conversion topics that we will be presenting at the PGEast conference in New York is that of CONNECT BY query translation. In this case the issue is mainly the SQL itself. In other blogs I will discuss solutions for more general topics like Oracle Packages and Global Variables.
See our presentation: Large Customers Want PostgreSQL Too !
https://www.postgresqlconference.org/content/large-customers-want-postgresql-too
Linked List Processing
Frequently, the PL/SQL we have translated contained SELECT statements that included the CONNECT BY option. In many cases this feature was used to identify related documents in a document table where the documents were grouped together by a singly-linked list and the CONNECT BY feature was used to traverse the list.
To illustrate this concept consider the following table:
create table songList (
artistId int not null,
id int not null,
priorId int,
title varchar2(20) );
If we assign artistId of 401 to Taylor Swift, then we can map some songs from Taylor Swift’s second album into this table starting with id 201 and begin another list which will ultimately link songs from her next album:
insert into songList values (401, 201,NULL, ‘Fearless’);
insert into songList values (401, 202, 201, ‘Fifteen’);
insert into songList values (401, 203, 202, ‘Love Song’);
insert into songList values (401, 205, 203, ‘White Horse’);
insert into songList values (401, 206, 205, ‘You Belong With Me’);
insert into songList values (401, 301,NULL, ‘unknown’);
The following Oracle CONNECT BY query could be used to list the songs in Taylor Swift’s second album (that is the 200 series):
SELECT artistId, id, title
FROM songList
WHERE artistId = 401
START WITH id = 201
CONNECT BY PRIOR id = priorId;
Which generates a result of:
ARTISTID ID TITLE
========== ====== ====================
401 201 Fearless
401 202 Fifteen
401 203 Love Song
401 205 White Horse
401 206 You Belong With Me
To process the linked list in the same manner using Postgres one must use a Recursive query. My translation tool is capable of translating the prior Oracle CONNECT BY query into a Recursive query automatically. When the prior SELECT query is processed by my translator the following is produced:
WITH RECURSIVE rqName (
artistId, id, title)
AS ( SELECT == the anchor query
artistId,
id,
title
FROM
songList
WHERE
id = 201
UNION ALL
SELECT == the recursive query
tn.artistId,
tn.id,
tn.title
FROM
rqName tp, songList tn
WHERE
tp.id = tn.priorId
)
SELECT artistId, id, title
FROM rqName
WHERE artistId = 401;
The anchor query is responsible for finding the START WITH row. The recursive query then UNIONs its results which are dependent upon the join between the most recent result’s id (tp.id) and the current song’s link field (tn.priorId). The query interates through the list of ids until it encounters the id of 301 which does not have a priorId.
The restriction artistId = 401 could be added to the anchor query. I have found through experimentation that Postgres’ performance is improved when written in the described manner. This improvement, noted by using EXPLAIN ANALYZE is significant when a lot of data is involved.
Tree Hierarchy Processing
The prior recursive query as coded was sufficient to process one hierarchy because of the singly-linked list. A common requirement is that more than one hierarchy, or multiple branches of a tree, must be processed. My translator can make the Recursive query code modifications required to support this for Postgres optionally.
For example, consider the manager / employee relationships in the emp table of the Oracle demo database.
select ename, empno, mgr from emp;
ENAME EMPNO MGR
====== ========== ==========
SMITH 7369 7902
ALLEN 7499 7698
WARD 7521 7698
JONES 7566 7839
MARTIN 7654 7698
BLAKE 7698 7839
CLARK 7782 7839
SCOTT 7788 7566
KING 7839
TURNER 7844 7698
ADAMS 7876 7788
JAMES 7900 7698
FORD 7902 7566
MILLER 7934 7782
One can use an Oracle CONNECT BY to generate the manager to employee hierarchy:
SELECT ename, empno, mgr FROM emp
START WITH mgr IS NULL
CONNECT BY PRIOR empno = mgr;
ENAME EMPNO MGR
====== ========== ==========
KING 7839
JONES 7566 7839
SCOTT 7788 7566
ADAMS 7876 7788
FORD 7902 7566
SMITH 7369 7902
BLAKE 7698 7839
ALLEN 7499 7698
WARD 7521 7698
MARTIN 7654 7698
TURNER 7844 7698
JAMES 7900 7698
CLARK 7782 7839
MILLER 7934 7782
The equivalent Recursive query produced from the Oracle CONNECT BY query using my translator with the hierarchy option enabled is:
WITH RECURSIVE rqName (
ename, empno, mgr, arrHierarchy)
AS ( SELECT == the anchor query
ename,
empno,
mgr,
ARRAY[coalesce(empno,0)] == array for perserving the
FROM
emp
WHERE
mgr IS NULL
UNION ALL
SELECT == the recursive query
tn.ename,
tn.empno,
tn.mgr,
arrHierarchy || tn.empno == append to perserve the hierarchy
FROM
rqName tp, emp tn
WHERE
tp.empno = tn.mgr
)
SELECT ename, empno, mgr
FROM rqName
ORDER BY arrHierarchy;
The results for this query are identical to that of the Oracle CONNECT BY.
The key to processing branches of the tree hierarchy is to capture the “path” in the array named arrHierarchy then sort the results based on this path. The path was accumulated in the array using the concat operator in the code:
arrHierarchy || tn.empno
Path Display
The path will be displayed if the Oracle function SYS_CONNECT_BY_PATH is referenced in the SELECT list. The means that the array arrHierarchy will be available and probably should be cast to VARCHAR.
Level Emulation
My translator also can customize the Recursive query to display the equivalent to the Oracle LEVEL pseudo column when the SELECT list includes the Oracle LEVEL keyword. The following is the code generated for the query that evaluates the emp table when LEVEL is selected for Oracle:
SELECT ename, empno, mgr, LEVEL FROM emp
START WITH mgr IS NULL
CONNECT BY PRIOR empno = mgr;
ENAME EMPNO MGR LEVEL
====== ========== ========== ======
KING 7839 1
JONES 7566 7839 2
SCOTT 7788 7566 3
ADAMS 7876 7788 4
FORD 7902 7566 3
SMITH 7369 7902 4
BLAKE 7698 7839 2
ALLEN 7499 7698 3
WARD 7521 7698 3
MARTIN 7654 7698 3
TURNER 7844 7698 3
JAMES 7900 7698 3
CLARK 7782 7839 2
MILLER 7934 7782 3
The translation of the prior query with the LEVEL specification follows:
WITH RECURSIVE rqName (
ename, empno, mgr, LEVEL, arrHierarchy)
AS ( SELECT == the anchor query
ename,
empno,
mgr,
1,
ARRAY[coalesce(empno,0)] == array for perserving the hierarchy order
FROM
emp
WHERE
mgr IS NULL
UNION ALL
SELECT == the recursive query
tn.ename,
tn.empno,
tn.mgr,
tp.LEVEL + 1,
arrHierarchy || tn.empno == append to perserve the hierarchy
FROM
rqName tp, emp tn
WHERE
tp.empno = tn.mgr
)
SELECT ename, empno, mgr, LEVEL
FROM rqName
ORDER BY arrHierarchy;
Conclusion
Rewriting CONNECT BY queries can be a very challenging task to perform by hand. My translation tool can relieve much of the work involved for most of the forms of CONNECT BY that we have encountered. There can be some forms that require special hand recoding such as CONNECT BYs in a subquery (because Postgres does not support recursive queries in a subquery). Nevertheless, the Oracle CONNECT BY needn’t be viewed as an insurmountable hurdle when considering a PL/SQL to PL/pgSQL migration.