Oracle 11g – XMLIndex (part 1)

The following is an extension on the former post about “About Table(XMLSequence()) and XMLTable”. These examples where based on a XMLType column using CLOB storage. If you have read the “Oracle 11g – XMLType Storage Options“, then you should now know that this storage model is only performing if you had a document centric environment in mind.

Oracle’s 11g addition “XMLIndex”, will give you an solution for these kinds of environments.

The XMLIndex is a domain index, specially crafted for use in a XMLDB environment. It is a logical index with 3 components (excerpt Oracle 11g beta XMLDB Developers Guide):

  • A path index – This indexes the XML tags of a document and identifies its various document fragments.
  • An order index – This indexes the hierarchical positions of the nodes in an XML document. It keeps track of parent/child, ancestor/descendant, and sibling relations.
  • A value index – This indexes the values of an XML document. It allows lookup by either value equality or value range. A value index is used for values in query predicates (WHERE clause).

Basic Structures

Using the XMLIndex index will give you the possibility to select and manipulate fragments as well, so you now also can use the same CLOB storage model (in conjunction with a XMLIndex index) to handle content centric environments.

The following from the Oracle 11g beta XMLDB Developers Guide point out the advantages of using a XMLIndex index.

Advantages of XMLIndex

An XMLIndex index presents the following advantages over other indexing methods:

  • An XMLIndex index can be used for SQL functions XMLQuery, XMLTable, XMLExists, XMLCast, extract, extractValue, and existsNode, and it is effective in any part of a query; it is not limited to use in a WHERE clause. This is not the case for any of the other kinds of indexes you might use with XML data.
  • XMLIndex can thus speed access to both SELECT list data and FROM list data, making it useful for XML fragment extraction, in particular. Function-based indexes (and CTXXPath indexes, which are deprecated) cannot be used to extract document fragments.
  • You need no prior knowledge of the XPath expressions that will be used in queries. XMLIndex is completely general. This is not the case for function-based indexes. If you do have such prior knowledge, you can often improve performance by tailoring XMLIndex indexing to those paths most queried.
  • You can use an XMLIndex index with either XML schema-based or non-schema-based data. It can be used with unstructured storage or hybrid storage. B-tree indexing is appropriate only for schema-based data stored object-relationally (structured storage); it is ineffective for XML schema-based data stored in a CLOB instance.
  • For hybrid storage of XML schema-based data, XMLIndex can handle XPath expressions that target document fragments that are stored within a CLOB instance. XPath rewrite is ineffective in such cases.
  • You can use an XMLIndex index for searches with XPath expressions that target collections, that is, nodes that occur multiple times within a document. This is not the case for functional indexes.
  • XMLIndex indexing both index creation and index maintenance can be carried out in parallel, using multiple database processes. This is not the case for function-based indexes (and CTXXPATH indexes, which are deprecated).
  • Updating of XMLIndex indexes on binary XML storage can be accomplished in a piecewise manner, improving DML performance considerably. This is not the case for any of the other kinds of indexes you might use with XML data.

Creating an XMLIndex

Lets start using it. Remember the examples from “About Table(XMLSequence()) and XMLTable”?. The table used there was created the following way:

SQL> CREATE TABLE "3020_JWB_FA" 
     ("HGO_FA_ID" NUMBER(12, 0) NOT NULL ENABLE
     , "GEGEVENS" "SYS"."XMLTYPE") 
     PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 
     STORAGE
     (INITIAL 65536 
      FREELISTS 1 FREELIST GROUPS 1 
      BUFFER_POOL DEFAULT)
     TABLESPACE "USERS" 
     LOGGING NOCOMPRESS XMLTYPE "GEGEVENS" STORE AS CLOB
       (TABLESPACE "USERS" 
        ENABLE STORAGE IN ROW CHUNK 8192 
        PCTVERSION 10 NOCACHE LOGGING
        STORAGE(INITIAL 65536 
                FREELISTS 1 FREELIST GROUPS 1 
                BUFFER_POOL DEFAULT)
       );

Via a database export dumpfile, 10 records where inserted.

The the following can be seen via user_segments:

SQL> col SEGMENT_NAME FOR a25
 
SQL> SELECT SEGMENT_NAME, segment_type, 
     tablespace_name, bytes FROM user_segments;
 
 
SEGMENT_NAME              SEGMENT_TYPE  TABLESPACE_NAME    BYTES
------------------------- ------------- ---------------- -------
SYS_IL0000070767C00003$$  LOBINDEX      USERS              65536
SYS_LOB0000070767C00003$$ LOBSEGMENT    USERS             393216
3020_JWB_FA               TABLE         USERS              65536
 
3 ROWS selected.

Creating the initial structures / XMLIndex and showing the result

SQL> CREATE INDEX hgo_xmlindex ON 3020_JWB_FA(gegevens)
  2  indextype IS xdb.xmlindex
  3  parameters ('PATH TABLE HGO_PATH_TABLE')
  4  ;
 
INDEX created.
 
SQL> SELECT SEGMENT_NAME, segment_type, 
     tablespace_name, bytes FROM user_segments;
 
SEGMENT_NAME                    SEGMENT_TYPE  TABLESPACE_NAME    BYTES
------------------------------- ------------- ---------------- -------
SYS70770_HGO_XMLIND_VALUE_IX    INDEX         USERS             131072
SYS70770_HGO_XMLIND_ORDKEY_IX   INDEX         USERS             196608
SYS70770_HGO_XMLIND_PATHID_IX   INDEX         USERS             131072
HGO_PATH_TABLE                  TABLE         USERS             196608
SYS_IL0000070767C00003$$        LOBINDEX      USERS              65536
SYS_LOB0000070767C00003$$       LOBSEGMENT    USERS             393216
3020_JWB_FA                     TABLE         USERS              65536
 
7 ROWS selected.

The following excerpt from the Oracle 11g beta XMLDB Developers Guide explains what was created (path table plus 3 indexes)

XMLIndex is implemented using a path table and a set of (local) secondary indexes corresponding to its components. The path table contains one row for each indexed node in the XML document. For each indexed node, the path table stores:

  • The corresponding rowid of the table that stores the document.
  • A locator, which provides fast access to the corresponding document fragment. For binary XML storage of XML schema-based data, it also stores data-type information.
  • An order key, to record the hierarchical position of the node in the document. You can think of this as a Dewey decimal key like that used in library cataloging and Internet protocol SNMP. In such a system, the key 3.21.5 represents the node position of the fifth child of the twenty-first child of the third child of the document root node.

You are not able to query the path table. We are only allowed to describe it as the following example shows.

SQL> DESC HGO_PATH_TABLE
 Name                    NULL?    TYPE
 ----------------------- -------- ----------------
 RID                              ROWID
 PATHID                           RAW(8)
 ORDER_KEY                        RAW(1000)
 LOCATOR                          RAW(2000)
 VALUE                            VARCHAR2(4000)
 
 
SQL> SELECT COUNT(*) FROM HGO_PATH_TABLE;
SELECT COUNT(*) FROM HGO_PATH_TABLE
                     *
ERROR at line 1:
ORA-30967: operation directly ON the Path TABLE IS disallowed

Comparison

If you compare the execution path now when a XMLIndex is in place then you will see the following difference regarding the XMLTable example given in post (hereby repeated for clarity).

 
--
-- Without XMLIndex
--
 
SQL> SELECT COUNT(*) AS "AMOUNT"
  2  FROM   3020_JWB_FA 
  3  ,      XMLTABLE(xmlnamespaces(DEFAULT 'http://www.mysite.nl/xsd/app/JWB/MYAPP-JWB-HISTORIE'),
  4                  '/MYAPP-JWB/PERSOON'
  5                  passing 3020_JWB_FA.gegevens
  6                 );
 
AMOUNT
------------------
                10
 
1 ROW selected.
 
 
Execution Plan
----------------------------------------------------------
Plan hash VALUE: 245705617
 
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                   | ROWS  | Bytes | Cost (%CPU)| TIME     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                        |     1 |  2002 |   276   (1)| 00:00:04 |
|   1 |  SORT AGGREGATE                     |                        |     1 |  2002 |            |          |
|   2 |   NESTED LOOPS                      |                        | 81680 |   155M|   276   (1)| 00:00:04 |
|   3 |    TABLE ACCESS FULL                | 3020_JWB_FA            |    10 | 20020 |     3   (0)| 00:00:01 |
|   4 |    COLLECTION ITERATOR PICKLER FETCH| XMLSEQUENCEFROMXMLTYPE |       |       |            |          |
--------------------------------------------------------------------------------------------------------------
 
 
Statistics
----------------------------------------------------------
       1457  recursive calls
          0  db block gets
        871  consistent gets
        298  physical reads
       1872  redo SIZE
        421  bytes sent via SQL*Net TO client
        385  bytes received via SQL*Net FROM client
          2  SQL*Net roundtrips TO/FROM client
         37  sorts (memory)
          0  sorts (disk)
          1  ROWS processed
 
--
-- With XMLIndex in Place
--
 
SQL> EXEC dbms_stats.DELETE_SCHEMA_STATS('MARCO')
 
PL/SQL PROCEDURE successfully completed.
 
SQL> EXEC  dbms_stats.GATHER_SCHEMA_STATS('MARCO')
 
PL/SQL PROCEDURE successfully completed.
 
SQL> ALTER system FLUSH buffer_cache;
 
System altered.
 
SQL> ALTER system FLUSH shared_pool;
 
System altered.
 
SQL> SELECT COUNT(*) AS "AMOUNT"
  2  FROM   3020_JWB_FA 
  3  ,      XMLTABLE(xmlnamespaces(DEFAULT 'http://www.mysite.nl/xsd/app/JWB/MYAPP-JWB-HISTORIE'),
  4                  '/MYAPP-JWB/PERSOON'
  5                  passing 3020_JWB_FA.gegevens
  6                 );
 
AMOUNT
------------------
                10
 
1 ROW selected.
 
Execution Plan
----------------------------------------------------------
Plan hash VALUE: 2446817566
 
---------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                          | ROWS  | Bytes | Cost (%CPU)| TIME     |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                               |     1 |    27 |     6   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE               |                               |     1 |    27 |            |          |
|   2 |   NESTED LOOPS                |                               |     1 |    27 |     6   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| HGO_PATH_TABLE                |     1 |    15 |     5   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS70770_HGO_XMLIND_PATHID_IX |     4 |       |     2   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY USER ROWID | 3020_JWB_FA                   |     1 |    12 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------
 
Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------
 
   3 - FILTER(SYS_XMLI_LOC_ISNODE("SYS_P0"."LOCATOR")=1)
   4 - access("SYS_P0"."PATHID"=HEXTORAW('5D99') )
 
 
Statistics
----------------------------------------------------------
       4708  recursive calls
          0  db block gets
       2220  consistent gets
        201  physical reads
      20792  redo SIZE
        421  bytes sent via SQL*Net TO client
        385  bytes received via SQL*Net FROM client
          2  SQL*Net roundtrips TO/FROM client
        117  sorts (memory)
          0  sorts (disk)
          1  ROWS processed

Despite the recursive calls, the “costs” value has decreased significantly.

What’s next?

One of the disadvantages of creating a full blown XMLIndex, that is indexing on all possible values, is that the size of the XMLIndex is most of the time larger in size then the table itself.

This is one of the reasons that it is possible to make use of what is called “XMLIndex Path Subsetting“. This method will be shown in part 2, but in short, one creates only indexes on XPath locations which are needed. By default, XMLIndex indexes all possible XPath locations in your XML data, this is easy if you have no knowledge of what data will be selected. As said, the disadvantage is that this will use a lot of space.

With XMLIndex Path Subsetting one can remove all the index values that one doesn’t need after indexing all Xpath locations OR you create a skeleton structure and add all the index values on the XPath locations you actually need.

As in the relational database world, one should not build an index on every possible table column and concatenated column combinations.

But as said, wait for the next post (“Oracle 11g – XMLIndex (part 2)”) to see how this can be achieved, also other functionality like building secondary indexes on the path table will be demonstrated.

For now, I hope it was interesting.

:-)

Disclaimer

The information demonstrated and shared here is based on Oracle beta software. The following is intended to outline Oracle’s general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.