Hello,
I'm facing problem with very slow table updates. Currently I perform several 1000 updates on a table sequential. I would like to know some improvements for a better performance.
I will give a short introduce on what I'm doing exactly:
My starting situation is a table with sequences for different properties. Each value of the sequence represents one measured value for one day.
| SEQUENCE | PROPERTY |
|---|---|
| 22;25;30;24;20 | Temperature |
| 23;25;30;20;18 | Temperature |
| 60;70;50;40;60 | Humidity |
| ... | ... |
On this I perform a sequential pattern mining using the SPADE implementation in R, getting as result a frequent sequence and the support of it. Let's say the result looks like following one:
| ID | FREQ-SEQUENCE | SUPPORT | PROPERTY |
|---|---|---|---|
| 1 | 25;30 | 1 | Temperature |
| 2 | 23;25 | 0.5 | Temperature |
| 3 | 60;70;50 | 0.33 | Humidity |
| .. | .. | .. | .. |
Now I have a procedure which creates dynamical a table where each frequent sequence represents one column. The column name consist's of the property and the ID of the frequent sequence it represents (e.g "Temperature_1" stands for the freq. sequence "25:30"). Additional there are the columns sequence and property.
This table represents if a frequent sequence occurs in a sequence or not. Therefore I fill them with 1 (occurs) or 0 (occurs not). E.g. as I mentioned column "Temperature_1" stands for the freq. sequence "25;30" this occurs in the sequence "22;25;30;24;20" so it will be "1" as it occurs also in "23;25;30;20;18" the second row in column "Temperature_1" will be also "1" but the third will be "0" as it's a totally other property.
| Temperature_1 | Temperature_2 | Humidity_3 | SEQUENCE | PROPERTY |
|---|---|---|---|---|
| 1 | 0 | 0 | 22;25;30;24;20 | Temperature |
| 1 | 1 | 0 | 23;25;30;20;18 | Temperature |
| 0 | 0 | 1 | 60;70;50;40;60 | Humidity |
| ... | ... | ... | ... | ... |
Filling the table is also where I facing the performance problems.
In a first step I just insert the sequences and properties in the related columns.
To fill the table now with "0" or "1" for occurrence or absence of a frequent sequence in a sequence I use the following algorithm:
-- iteration over all columns (frequent sequences)
FOR freqSequenceID IN 1..:maxFreqSequenceID DO --get the propertie and the frequent sequence which the current column represents SELECT "PROPERTY" INTO property FROM FREQUENT_SEQUENCE_TBL WHERE "ID" = :freqSequenceID; SELECT "SEQUENCE" INTO freqSequence FROM FREQUENT_SEQUENCE_TBL WHERE "ID" = :freqSequenceID; -- concatenate the name of current column colName := CONCAT(:property,'_'); colName := CONCAT(:colName,freqSequenceID); -- update all rows in the column with "0" insertVar := CONCAT ('UPDATE OCCURENCE_TBL SET "', :colName); insertVar := CONCAT (insertVar, '" = 0'); EXECUTE IMMEDIATE (:insertVar); -- update all rows where the current frequent sequence occurs in the sequence with one -- UPDATE OCCURENCE_TBL SET OCCURENCE_TBL.colName = 1 FROM OCCURENCE_TBL -- INNER JOIN "SEQUENCE_TBL" ON OCCURENCE_TBL."SEQUENCE_ID" = SEQUENCE_TBL."SEQUENCE_ID" AND OCCURENCE_TBL."PROPERTY" = :property -- WHERE LOCATE ( SEQUENCE_TBL."SEQUENCE", :freqSequence) >= 1 AND SEQUENCE_TBL."PROPERTY" = :property insertVar := CONCAT ('UPDATE O SET O."', :colName); insertVar := CONCAT (insertVar, '" = 1 FROM OCCURENCE_TBL AS O INNER JOIN "SEQUENCE_TBL" AS S ON O."SEQUENCE_ID" = S."SEQUENCE_ID" AND O."PROPERTY" = '''); insertVar := CONCAT (insertVar, :property); insertVar := CONCAT (insertVar, '''WHERE LOCATE ( S."SEQUENCE", '''); insertVar := CONCAT (insertVar, :freqSequence); insertVar := CONCAT (insertVar,''') >= 1 AND S."PROPERTY" = '''); insertVar := CONCAT (insertVar, :property); insertVar := CONCAT (insertVar, ''''); EXECUTE IMMEDIATE (:insertVar);
END FOR;I think that the slow performance of it is because a update is sequential done for each column and not together at all in the end.
I tried already to collect all this update statement and execute them after the loop but it seems that with EXEC / EXECUTE IMMEDIATE only one statement is possible.
Has anyone an idea how to improve the update process of this table?
When it's possible it could be also a virtual table, it must be not persistent. In the next step it would be used for a classification with CART.
Thank you in advance
Matthias