XMLIndex (Part 2) – XMLIndex Path Subsetting

As said in the former post, one of the disadvantages of creating a full blown XMLIndex, indexing on all possible values, is that the size of the XMLIndex is most of the time larger in size then the table itself.

This is one of the reasons that it is possible to make use of what is called “XMLIndex Path Subsetting“. In short, one creates only indexes on XPath locations which are needed. By default, XMLIndex indexes all possible XPath locations in your XML data, this is easy if you have no knowledge of what data will be selected. As said, the disadvantage is that this will use a lot of space.

Path Subsetting

With XMLIndex Path Subsetting one can remove all the index values that one doesn’t need after indexing all Xpath locations OR you create a skeleton structure and add all the index values on the XPath locations you actually need.

As in the relational database world, one should not build an index on every possible table column and concatenated column combinations. So, IMHO, I think it is easier to start from scratch with no index available and build only those indexes on index paths you need.

In the following example, I will demonstrate how you can do this. Given the examples from the “Oracle 11g – XMLIndex (part 1)” post, you can use XMLIndex path subsetting the following way…

SQL> drop user marco cascade;

User dropped.

SQL> create user marco identified by marco;

User created.

SQL> grant dba, xdbadmin to marco;

Grant succeeded.

SQL> connect marco/marco
Connected.

SQL> CREATE TABLE "3020_JWB_FA"
  2         ( "HGO_FA_ID" NUMBER(12, 0) NOT NULL ENABLE
  3         , "GEGEVENS" "SYS"."XMLTYPE") 
  4         PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 
  5         STORAGE
  6         (INITIAL 65536 
  7          FREELISTS 1 FREELIST GROUPS 1 
  8          BUFFER_POOL DEFAULT)
  9         TABLESPACE "USERS" 
 10         LOGGING NOCOMPRESS XMLTYPE "GEGEVENS" STORE AS CLOB
 11           (TABLESPACE "USERS" 
 12            ENABLE STORAGE IN ROW CHUNK 8192 
 13            PCTVERSION 10 NOCACHE LOGGING
 14            STORAGE(INITIAL 65536 
 15                    FREELISTS 1 FREELIST GROUPS 1 
 16                    BUFFER_POOL DEFAULT)
 17           );

Table created.

Now we create an XMLIndex without a namespace mapping (there is no XML Schema reference in use) and fake a lot of records via the DBMS_STATS.SET_TABLE_STATS method.

SQL> CREATE INDEX xmlindex_ix ON "3020_JWB_FA"(gegevens) INDEXTYPE IS XDB.XMLINDEX
  2  PARAMETERS ('PATHS (INCLUDE (/MYAPP-JWB/PERSOON))');

Index created.

SQL> exec  dbms_stats.GATHER_SCHEMA_STATS('MARCO')

PL/SQL procedure successfully completed.

SQL> begin dbms_stats.set_table_stats
  2        ( ownname => 'MARCO'
  3        , tabname => '"3020_JWB_FA"'
  4        , numrows => 1000000
  5        , numblks => 100000 
  6        );
  7  end;
  8  /

PL/SQL procedure successfully completed.

The following SQL statement will demonstrate what happens if you would use a statement with a namespace reference in it. This is a very common problem on the OTN XMLDB Forums , most of the time people use namespace references, but don’t know what the exact syntax should have been (“HELP, I don’t get no records returned and I know there are a lot of them in the table!”).

SQL> explain plan for 
  2  SELECT count(*) AS "AMOUNT"
  3  FROM   "3020_JWB_FA"
  4  ,      XMLTABLE(xmlnamespaces
  5                     (DEFAULT 'http://www.mysite.nl/xsd/app/JWB/MYAPP-JWB-HISTORIE'),
  6                      '/MYAPP-JWB/PERSOON'
  7                      passing "3020_JWB_FA".gegevens);

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display(null,null,'all'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------
Plan hash value: 3485210572

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                        |     1 |   100 |    27M  (1)| 90:47:45 |
|   1 |  SORT AGGREGATE                     |                        |     1 |   100 |            |          |
|   2 |   NESTED LOOPS                      |                        |  8168M|   760G|    27M  (1)| 90:47:45 |
|   3 |    TABLE ACCESS FULL                | 3020_JWB_FA            |  1000K|    95M| 27143   (1)| 00:05:26 |
|   4 |    COLLECTION ITERATOR PICKLER FETCH| XMLSEQUENCEFROMXMLTYPE |       |       |            |          |
--------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$C10A9E5D
   3 - SEL$C10A9E5D / 3020_JWB_FA@SEL$1

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=0) COUNT(*)[22]
   2 - (#keys=0) "3020_JWB_FA"."SYS_NC00003$"[LOB,4000], VALUE(A0)[40]
   3 - "3020_JWB_FA"."SYS_NC00003$"[LOB,4000]
   4 - VALUE(A0)[40]

25 rows selected.

The namespace reference is missing in our create XMLIndex statement. If a “COLLECTION ITERATOR PICKLER FETCH” is showing in our explain plan then most of the time this is the first sign that performance will be very, very, very bad. We have to address this (just like functional indexes on XML data) by adding a namespace reference in our path subsetting.

Only ONE domain index (in our case a XMLIndex) is allowed. So or we have to alter the index via a ALTER INDEX statement on “XMLINDEX_IX” or we start over. The following will drop and recreate the index. If you don’t do this, you will be faced with the following ORA-29879 error.

CREATE INDEX xmlindex_ix_02 ON "3020_JWB_FA"(gegevens) INDEXTYPE IS XDB.XMLINDEX
*
ERROR at line 1:
ORA-29879: cannot create multiple domain indexes on a column list using same indextype

So, the following drops and re-creates the XMLIndex

SQL> drop index xmlindex_ix;

Index dropped.

SQL> CREATE INDEX xmlindex_ix ON "3020_JWB_FA"(gegevens) INDEXTYPE IS XDB.XMLINDEX
  2  PARAMETERS ('PATHS (INCLUDE (/MYAPP-JWB/PERSOON)
  3                      NAMESPACE MAPPING (xmlns="http://www.mysite.nl/xsd/app/JWB/MYAPP-JWB-HISTORIE"))'); 

Index created.

SQL> begin dbms_stats.set_table_stats
  2        ( ownname => 'MARCO'
  3        , tabname => '"3020_JWB_FA"'
  4        , numrows => 1000000
  5        , numblks => 100000 
  6        );
  7  end;
  8  /

PL/SQL procedure successfully completed.

The explain plan shown now, reflects the use off the namespace reference.

SQL> explain plan for 
  2  SELECT count(*) AS "AMOUNT"
  3  FROM   "3020_JWB_FA"
  4  ,      XMLTABLE(xmlnamespaces
  5                     (DEFAULT 'http://www.mysite.nl/xsd/app/JWB/MYAPP-JWB-HISTORIE'),
  6                      '/MYAPP-JWB/PERSOON'
  7                      passing "3020_JWB_FA".gegevens);

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display(null,null,'all'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------
Plan hash value: 1080528221

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                                |     1 |  1032 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE               |                                |     1 |  1032 |            |          |
|   2 |   NESTED LOOPS                |                                |     1 |  1032 |     2   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| SYS71111_XMLINDEX_I_PATH_TABLE |     1 |  1020 |     1   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | SYS71111_XMLINDEX_I_PATHID_IX  |     1 |       |     2   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY USER ROWID | 3020_JWB_FA                    |     1 |    12 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$D5B9B34A
   3 - SEL$D5B9B34A / SYS_P0@SEL$F702BB9B
   4 - SEL$D5B9B34A / SYS_P0@SEL$F702BB9B
   5 - SEL$D5B9B34A / 3020_JWB_FA@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(SYS_XMLI_LOC_ISNODE("SYS_P0"."LOCATOR")=1)
   4 - access("SYS_P0"."PATHID"=HEXTORAW('5F4F') )

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=0) COUNT(*)[22]
   2 - (#keys=0)
   3 - "SYS_P0"."RID"[ROWID,10]
   4 - "SYS_P0".ROWID[ROWID,10], "SYS_P0"."RID"[ROWID,10]

Note
-----
   - dynamic sampling used for this statement

38 rows selected.
How to enable, disable or adding extra indexes, will be handled in a future post.

Related Post

Marco Gralike Written by: