Tuning Database XQuery Statements (2)

So we concluded the post “Tuning Database XQuery Statements (1)” with the following SQL statement (cost 794)…

--
EXPLAIN PLAN FOR
SELECT MAX(x1.publicatienr) AS MAX
  FROM ic_metadata m 
     , XMLTable  (xmlnamespaces(DEFAULT 'http://www.somewhere.nl/metadata'), 
                  '$p/opslag-metadata/dossierred' 
                  PASSING m.metadata AS "p"
                  COLUMNS 
                    publicatiedatum  DATE       PATH 'publicatiedatum'
                  , publicatienr     NUMBER(38) PATH 'publicatienr'
                 ) x1 
 WHERE XMLEXISTS ('declare default element namespace "http://www.somewhere.nl/metadata"; (::)
                  $p/opslag-metadata/dossierred[uitgave/@code=$b1]'
                  PASSING m.metadata AS "p"
                        , '92'       AS "b1"
                  )
   AND x1.publicatiedatum >= to_date('01-01-1964', 'DD-MM-YYYY') 
   AND x1.publicatiedatum < = to_date('31-12-1964', 'DD-MM-YYYY') 
;
--

I tried after this some “what if” scenario’s which also exposes some of the extra possibilities the XQuery language adds to SQL to solve a query problem. The following is far from what all is possible, but I wanted to check what the Oracle Optimizer does or doesn’t pick up. The following also shows some room of improvement (enhancement request/ideas already posted to the guys in charge).

😉

Anyway, follow me along…

Alternative XQuery approaches

The first alternative which came to mind was, “what if I bring everything into the XMLEXIST/XQuery section” of the statement and completely remove the date comparisons in the SQL section?

This resulted in the following statement.

--
EXPLAIN PLAN FOR
SELECT MAX(x1.publicatienr) AS MAX
  FROM ic_metadata m 
     , XMLTable  (xmlnamespaces(DEFAULT 'http://www.somewhere.nl/metadata'), 
                  '$p/opslag-metadata/dossierred' 
                  PASSING m.metadata AS "p"
                  COLUMNS 
                    -- varchar2 naar number cast direct in XMLTABLE
                    -- anders XMLCAST in SELECT clause
                    publicatienr     NUMBER(38) PATH 'publicatienr'
                    --
                 ) x1 
 WHERE XMLEXISTS (q'# declare default element namespace "http://www.somewhere.nl/metadata"; (::)
                      $p/opslag-metadata/dossierred
                      [uitgave/@code=$b1]
                      [publicatiedatum >= xs:date($d1)]
                      [publicatiedatum < = xs:date($d2)]
                    #'
                  PASSING m.metadata AS "p"
                        , '92'         AS "b1"
                        , '1964-01-01' AS "d1"
                        , '1964-12-31' AS "d2"                         
                  )
;
--

To show more detailed what I did here in the XMLEXIST section…

--
..
 WHERE XMLEXISTS (q'# declare default element namespace "http://www.kluwer.nl/metadata"; (::)
                      $p/opslag-metadata/dossierred           (: restriction only dossiered :)
                      [uitgave/@code=$b1]                     (: + check on code=92         :)
                      [publicatiedatum >= xs:date($d1)]       (: + year must be 1964        :)
                      [publicatiedatum < = xs:date($d2)]
                    #'
                  PASSING m.metadata AS "p"
                        , '92'         AS "b1"
                        , '1964-01-01' AS "d1"
                        , '1964-12-31' AS "d2"                         
                  )
--

By the way, in XQuery the

(: :)

notation is used, can be used, to add comment sections. It is also needed as its notation of “(::)” to signal programs like SQL*Plus to ignore the ” ; “-sign.

The XPath section has now become a “all in one” criteria list while using bind parameters to fill in the needed values

  • “b1” = 92
  • “d1” = “1964-01-01”
  • “d2” = “1964-12-31”

The first part of the XPath section makes sure we have the proper XML instances only which are “dossierred” related…

  • $p/opstag-metadata/dossierred

Then we make sure that attribute “code” of element “uitgave” matches only values ’92’

  • $p/opstag-metadata/dossierred/uitgave/@code=92

And we add the date criteria / predicate sections, now it has to match all the following criteria

  • $p/opstag-metadata/dossierred
  • $p/opstag-metadata/dossierred/uitgave/@code=’92’
  • $p/opstag-metadata/dossierred/publicatiedatum >= ‘1964-01-01’
  • $p/opstag-metadata/dossierred/publicatiedatum < = '1964-12-31'

Be aware of the use of xs:date that is needed for date datatype comparison.

Also check up on “Example 15” in Best Practices for XQuery processing in Oracle Database for use of xs:date or alternatives…

Capture

In all the current statement

--
EXPLAIN PLAN FOR
SELECT MAX(x1.publicatienr) AS MAX
  FROM ic_metadata m 
     , XMLTable  (xmlnamespaces(DEFAULT 'http://www.somewhere.nl/metadata'), 
                  '$p/opslag-metadata/dossierred' 
                  PASSING m.metadata AS "p"
                  COLUMNS 
                    -- varchar2 naar number cast direct in XMLTABLE
                    -- anders XMLCAST in SELECT clause
                    publicatienr     NUMBER(38) PATH 'publicatienr'
                    --
                 ) x1 
 WHERE XMLEXISTS (q'# declare default element namespace "http://www.somewhere.nl/metadata"; (::)
                      $p/opslag-metadata/dossierred
                      [uitgave/@code=$b1]
                      [publicatiedatum >= xs:date($d1)]
                      [publicatiedatum < = xs:date($d2)]
                    #'
                  PASSING m.metadata AS "p"
                        , '92'         AS "b1"
                        , '1964-01-01' AS "d1"
                        , '1964-12-31' AS "d2"                         
                  )
;
--

Resulted in the same cost (794) and explain plan output as shown at the beginning of this post (and explained at the end of the first post).

So no change via this approach.

What if we make use of XMLQUERY functionality in the SELECT section?

We could start with getting XMLTABLE out of the equation by moving the needed publicatienr value to the SELECT via using XMLQUERY. This could result in the following:

--
SELECT MAX(XMLCast(XMLQuery(q'# declare default element namespace "http://www.somewhere.nl/metadata"; (::) 
                                $p/opslag-metadata/dossierred/publicatienr
                              #' 
                        PASSING m.metadata AS "p"
                        RETURNING CONTENT
                        )
                    AS NUMBER(38)
                   )
           ) AS MAX
  FROM ic_metadata m 
 WHERE XMLEXISTS (q'# declare default element namespace "http://www.somewhere.nl/metadata"; (::)
                      $p/opslag-metadata/dossierred           
                      [uitgave/@code=$b1]                     
                      [publicatiedatum >= xs:date($d1)]       
                      [publicatiedatum < = xs:date($d2)]
                    #'
                  PASSING m.metadata AS "p"
                        , '92'         AS "b1"
                        , '1964-01-01' AS "d1"
                        , '1964-12-31' AS "d2"                         
                  )
;
--

A short test shows no difference in the COST figure (still 794) and/or the explain plan output.

Okay, lets do the rest as well and bring the whole XMLEXISTS section from the WHERE clause also in the SELECT clause section…

--
EXPLAIN PLAN FOR
SELECT MAX(XMLCast(XMLQuery(q'# declare default element namespace "http://www.somewhere.nl/metadata"; (::) 
                                $p/opslag-metadata/dossierred
                                [uitgave/@code=$b1]                     
                                [publicatiedatum >= xs:date($d1)]       
                                [publicatiedatum < = xs:date($d2)]
                                /publicatienr
                              #' 
                        PASSING m.metadata AS "p"
                              , '92'         AS "b1"
                              , '1964-01-01' AS "d1"
                              , '1964-12-31' AS "d2"   
                        RETURNING CONTENT
                        )
                    AS NUMBER(38)
                   )
           ) AS MAX
  FROM ic_metadata m;
--
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
--

Oops.

Back to, more or less, our start figures. Total costs went up to 566783 instead of still being around the 794 figure.

This is where “my” enhancement request comes into play. In principle, there should be a way, for the Optimizer or XQuery rewrite engine, to “rewrite” the XMLQUERY portion into an WHERE clause alternative. Instead, the “Optimizer/XVM/??” looses track, does a full table scan of on column METADATA in table IC_METADATA (10+ million rows…) and a tiny bit of the work via a final cost effective INDEX RANGE SCAN…

To make a long story short…In all my attempts, “COST” figures go only down when I bring selection criteria, predicates back into the WHERE clause… for example, the following will give you a cost of 794 again…

The XPath bit, besides the already noted

  • $p/opstag-metadata/dossierred
  • $p/opstag-metadata/dossierred/uitgave/@code=’92’
  • $p/opstag-metadata/dossierred/publicatiedatum >= ‘1964-01-01’
  • $p/opstag-metadata/dossierred/publicatiedatum < = '1964-12-31'

in the XMLQUERY section, now also needs and extra

  • $p/opstag-metadata/dossierred/publicatienr

which can be cast via XMLCAST to a NUMBER(38) datatype.

--
SELECT MAX(XMLCast(XMLQuery(q'# declare default element namespace "http://www.somewhere.nl/metadata"; (::) 
                                $p/opslag-metadata/dossierred
                                [uitgave/@code=$b1]                  
                                [publicatiedatum >= xs:date($d1)]    
                                [publicatiedatum < = xs:date($d2)]
                                /publicatienr           
                              #' 
                        PASSING m.metadata AS "p"
                              , '92'         AS "b1"
                              , '1964-01-01' AS "d1"
                              , '1964-12-31' AS "d2"   
                        RETURNING CONTENT
                        )
                    AS NUMBER(38)
                   )
           ) AS MAX
  FROM pcmadmin.ic_metadata m  
 WHERE XMLEXISTS (q'# declare default element namespace "http://www.somewhere.nl/metadata"; (::)
                      $p/opslag-metadata/dossierred      
                      [uitgave/@code=$b1]                
                      [publicatiedatum >= xs:date($d1)]  
                      [publicatiedatum < = xs:date($d2)]
                    #'
                  PASSING m.metadata AS "p"
                        , '92'         AS "b1"
                        , '1964-01-01' AS "d1"
                        , '1964-12-31' AS "d2"                         
                  );
--

Anyway, something to be aware of if you want to take full advantage in a (11.2.0.3) database with XMLType Object Relational storage solution. I wasn’t (yet) able to test this on a different database version and/or, for example, on XMLType Securefile Binary XML storage.

That brings me to…

Cheating via XMLType Object Relational storage

The following only applies to XMLType Object Relational storage and…

  • …Is fun to know/educational purpose,
  • …is STUPID if you actually use it
  • …could stop working if Oracle alters internal coding (or deprecates XMLType OR)
  • …wil break code and or your solution if you switch to other XMLType storage
  • …Is not optimized by Oracle
  • …Is NOT SUPPORTED by Oracle
  • …and should therefore never be used…

the above…just in case…, someone uses this kind of syntax in a production environment.

I have warned you

Brings us back on what Oracle may do rewriting, under the covers, in the case of XMLType Object Relational storage, which are in essence “plain old database objects”.

Via object notation and with the use of the pseudocolumn XMLDATA, the original query

--
SELECT MAX(to_number(extractvalue(m.metadata,'/opslag-metadata/dossierred/publicatienr'))) AS MAX
  FROM ic_metadata m
 WHERE extractvalue(m.metadata,'/opslag-metadata/dossierred/uitgave/@code')                    = '92'
   AND EXTRACT(YEAR FROM extractvalue(m.metadata,'/opslag-metadata/dossierred/publicatiedatum')) = 1964;
--

can be re-written into (NOTE the use of TABLE for the “code” attribute)

--
SELECT MAX(to_number(m.metadata.xmldata."dossierred"."publicatienr")) AS MAX
  FROM ic_metadata m
     , TABLE(m.metadata.xmldata."dossierred"."uitgave") u
 WHERE m.metadata.xmldata."dossierred"."publicatiedatum" >= to_date('01-01-1964', 'DD-MM-YYYY') 
   AND m.metadata.xmldata."dossierred"."publicatiedatum" < = to_date('31-12-1964', 'DD-MM-YYYY') 
   AND u."code" = '92';
--

which brings up back to a cost of 794…

Optimized?

Can we optimize THIS used statement in database version 11.2.0.3.?

  • We could create one FAT index based on criteria used in indexes for UITGAVE and PUBLICATIE
  • We could LIST or Equi-partition the XMLType OR or Binary XML content based on different parts of the storage definitions in “/opslag-metadata” (comments/dossiers/etc)
  • If we would switch to XMLType Binary XML (schemaless), we could create “storage” defined Structured XMLIndex (SXI) indexes
  • …and combine the above with LOCAL partitioning of those XMLIndexes
  • …and combine the above with secondary Indexes on the columns of those XMLIndexes content tables

And in Oracle 12.1.0.2 and above, we might support the statement

HTH/M.

Marco Gralike Written by: