XMLDB Performance: Binary XML Storage Models

This post will show you some of the first numbers I collected regarding “Loading XML data”, while making use of different XMLType “physical storage containers”.

I also have done some initial testing with Object Relational XMLType storage, but because this method of storage has many options and extra features, I won’t describe them yet here. This topic is interesting enough to earn its on post.

If you need some background on Oracle XMLType Storage option than have a read through the “Binary-, CLOB, Object Relational Storage” Category option in the menu, the Oracle XMLDB Developers Manual or a short intro via Oracle 11g – XMLType Storage Options.

After having created an environment as described in “XMLDB Performance: Environment, Set-up, Procedure“, the following results were gathered by me while keeping values constant, for example the values for “connection.xml“, as described in the “XMLDB Performance: Environment, Set-up, Procedure” . Only the WIKI_STAGE create statements are different.

My Goal

My goal here is to see how these XMLType storage structures perform, in this case, what happens while loading XML data. Another is issue is finding out how to create them and can I deduct some extra “concepts” or rules of numb out of these testing procedures. While setting the test environment up and loading the data I noticed that there are some issues, for example, regarding altering structures or counting the data (XMLDB Performance: The Side Effects of a Simple “count(*)”).

During this learning process, beside insight in, I also had some new questions that have to be answered to understand the mechanics involved. One of this questions I don’t have an answered yet for is “What are the internal differences in bookkeeping between a Smallfile and Bigfile tablespace, besides ROWID structures” (and what is the effect on XMLType storage structures).

Create Table Statements

In this first report, regarding my loading XML data test results, I compared the following structures:

  • Binary XMLType Table using Basicfile storage
  • Binary XMLType Table using Securefile, not compressed, storage
  • Binary XMLType Table using Securefile, compressed high, storage

The results shown were measured while using the following SQL settings for create table WIKI_STAGE

Binary XML Basicfile (XMLType Table)

The following create WIKI_STAGE create table statement was used:

select dbms_metadata.get_ddl('TABLE','WIKI_STAGE','WIKI') from dual;

DBMS_METADATA.GET_DDL('TABLE','WIKI_STAGE','WIKI')
--------------------------------------------------------------------------------

  CREATE TABLE "WIKI"."WIKI_STAGE" OF "SYS"."XMLTYPE"
 OIDINDEX  ( PCTFREE 10 INITRANS 2 MAXTRANS 255
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
  TABLESPACE "MEDIAWIKI_STAGE" )
 PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS1 BUFFER_POOL DEFAULT)
  TABLESPACE "MEDIAWIKI_STAGE"
 XMLTYPE COLUMN "SYS_NC_ROWINFO$" STORE AS BASICFILE BINARY XML
  (TABLESPACE "MEDIAWIKI_STAGE" ENABLE STORAGE IN ROW CHUNK 8192
  NOCACHE LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT))
 ALLOW NONSCHEMA DISALLOW ANYSCHEMA
.

Binary XML Securefile (XMLType Table) – NO COMPRESS

The following create WIKI_STAGE create table statement was used:

select dbms_metadata.get_ddl('TABLE','WIKI_STAGE','WIKI') from dual;

DBMS_METADATA.GET_DDL('TABLE','WIKI_STAGE','WIKI')
--------------------------------------------------------------------------------

 CREATE TABLE "WIKI"."WIKI_STAGE" OF "SYS"."XMLTYPE"
OIDINDEX  ( PCTFREE 10 INITRANS 2 MAXTRANS 255
 STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
 TABLESPACE "MEDIAWIKI_STAGE" )
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
 STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
 TABLESPACE "MEDIAWIKI_STAGE"
XMLTYPE COLUMN "SYS_NC_ROWINFO$" STORE AS SECUREFILE BINARY XML
(TABLESPACE "MEDIAWIKI_STAGE" ENABLE STORAGE IN ROW CHUNK 8192
 NOCACHE LOGGING  DECRYPT  NOCOMPRESS  KEEP_DUPLICATES
 STORAGE(INITIAL 106496 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
 PCTINCREASE 0 BUFFER_POOL DEFAULT))
ALLOW NONSCHEMA DISALLOW ANYSCHEMA
.

Binary XML Securefile (XMLType Table) – COMPRESS HIGH

The following create WIKI_STAGE create table statement was used:

select dbms_metadata.get_ddl('TABLE','WIKI_STAGE','WIKI') from dual;

DBMS_METADATA.GET_DDL('TABLE','WIKI_STAGE','WIKI')
--------------------------------------------------------------------------------

  CREATE TABLE "WIKI"."WIKI_STAGE" OF "SYS"."XMLTYPE"
 OIDINDEX  ( PCTFREE 10 INITRANS 2 MAXTRANS 255
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
  TABLESPACE "MEDIAWIKI_STAGE" )
 PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
  TABLESPACE "MEDIAWIKI_STAGE"
 XMLTYPE COLUMN "SYS_NC_ROWINFO$" STORE AS SECUREFILE BINARY XML (
  TABLESPACE "MEDIAWIKI_STAGE" ENABLE STORAGE IN ROW CHUNK 8192
  NOCACHE LOGGING  DECRYPT  COMPRESS HIGH  KEEP_DUPLICATES
  STORAGE(INITIAL 106496 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 BUFFER_POOL DEFAULT))
 ALLOW NONSCHEMA DISALLOW ANYSCHEMA
.

Segment Findings

All these storage structures, these binary XMLType tables, contain:

  • A table segment
  • An index segment
  • A LOB segment
  • A LOB index segment

While using the create table statements above, this resulted in the following used amount of bytes per segment type after 6.996.742 Mediawiki XML records where loaded.

Table Segments


Click on the picture to enlarge

Index Segments


Click on the picture to enlarge

LOB Segments


Click on the picture to enlarge

LOB Index Segments


Click on the picture to enlarge

Size Totals

Would you compare the original size on disk of these 6.996.742 Mediawiki XML files on Windows NTFS against the amount of bytes used in the database, then you would could see the following differences.

Totals Overview – Segments


Click on the picture to enlarge

and while looking at file size alone…

Totals Overview – Size on Disk, Size in the Database


Click on the picture to enlarge

A simple conclusion, while looking at the graphs, is that “standard” XML storage using Basicfile or Securefile CLOB storages are bigger that the XML size on Windows disk (while using NTFS). The other conclusion in this case is that you will regain 4.2 Gigabyte while compressing it in Securefile Binary XML format, compared to the Mediawiki XML Windows flat file.

To my surprise, I am now aware of the fact that a “Securefile” structure in Oracle 11.1.0.6.0 can be bigger that its alternative the “Basicfile” structure.

I also encountered a bug. A bug I had completely forgotten. One that I had encountered and reported during the end of beta testing of Oracle version 11.1.0.6.0 in 2007. My bug number 6364855, relates to “LOB SIZE USING SECUREFILE IS VERY LARGE”.

See the following alteration statement:

SQL> alter table wiki_stage modify lob (xmldata) (compress high)

Table altered
.

The symptoms are that if you compress or de-compress Securefile storage, the internal storage size increases, despite using “compress”. The bug was very quickly solved and will be packaged in patch version 11.1.0.7.0. So if needed, ask for a back port for Oracle database version 11.1.0.6.0.

Duration

An interesting comparison is, of course, also: “How much time was needed the load the XML data?”. Are there consequences while using one or the other storage model? And there are, as expected…


Click on the picture to enlarge

Conclusions?

What is new for me is that there seems to be a small load time trade off between the old “Basicfile” structure (the structure also used before 11g) and the new Securefile ( no compress) structure in the 11.1.0.6.0 (Oracle Enterprise Edition 32 bit for Linux), which I can’t explain. I did not yet trace / count, for instance, database server CPU time. It could be that storing data, while making us off Securefile structures in general, would consume more CPU cycles, but because I didn’t trace on this, also can’t prove this… As said, just a gut feeling.

The other differences can be explained because the can be explained via the information given in the XMLDB Developers Guide and Storage Guide

XMLType storage will be checked on XML Well-Formedness (this is default behavior) . This takes time (Memory and CPU cycle consumption)

Binary XML is a post parse storage model, this means that XML tags and information, in labor terms, will be cut in to pieces, sorted, ordered and categorized regarding “same elements”, “the same information”, usage, etc. Also the XML data will be stripped from unnecessary whitespace and the tree structure, meta data info will be optimized regarding storage, its accessibility and maintenance. All this information will used when the information is requested again and has to be reconstructed into its original form (with the exception regarding whitespace). The internal mechanism regarding the “How” is only known to Oracle.

Compressing (data) takes extra time, most of it needed for CPU processing. This explains the difference between the Securefile storage via “NOCOMPRESS” and the Securefile storage via “COMPRESS HIGH”.

Other conclusions…?

Securefile storage is smaller, so as advertised, while using compression, in the Oracle Enterprise Edition version 11.1.0.6.0. I hope that patch 11.1.0.7.0 will be released this year so I can re-test some of the features for Securefile. One of my questions, still unanswered, is “How much time is needed to “compress high” a Securefile LOB segment, using a default Securefile LOB segment (NOCOMPRESS) as a starting point”. Some internal information about Basicfile and Securefile Binary XML would be useful and could be used to explain the differences between seen in table, lob segment and lob index segments.

There are still some questions I didn’t answer yet that are related to the described storage models here, such as “What happens if the Binary XML storage model depends on XML Schemata” or “What happens if the Binary XML storage uses features like ‘de-duplication’ or different ‘schema controlled encoding’ “…

What I have described here wasn’t “exact” science. Uncontrolled functionality like AWR mechanisms were still enabled and in place, but I think some comparisons can be made and give a “statement of direction”. On the other hand, just doing it (setup, testing, loading data, thinking about “what does this figure mean…”), made me a lot wiser…

Although I believe that I only scratched the surface of some of the internals, dealing with XML storage etc., I hope that people can use this information to their advantage.

😎

Related Posts

Marco Gralike Written by:

3 Comments

  1. Thomas
    September 16

    Hi Marco!
    Very interesting comparison! Looks like high compression brings quite some space savings!
    Could you add the queries you used to get the segments and database sizes? Thanks!
    Thomas

  2. September 16

    There is a dedicated presentation dealing with securefile during Oracle Open World (if you are going). Search in the scheduler on keyword “securefile”.

  3. September 16

    Scripting, segments:

    SQL> get segment
      1  prompt
      2  PROMPT Segments:
      3  prompt
      4  column segment_name format a30         heading 'Segment'
      5  column segment_type format a15         heading 'Type'
      6  column bytes        format 99999999999 heading 'Bytes'
      7  column extents      format 9999999     heading 'Tot.Ext'
      8  column ini          format 99999999999 heading 'IniSize'
      9  column next         format 99999999999 heading 'NextSize'
     10  column min_extents  format 9999        heading 'MinE'
     11  column max_extents  format 999999        heading 'MaxE'
     12  column owner        format a15         heading 'Owner'
     13  column pct          format 9999        heading '%Inc'
     14  column tbsname      format a20         heading 'Tablespace'
     15  break on report
     16  compute sum of bytes on report
     17  undefine segment_type
     18  undefine segment_name
     19  undefine tablespace_name
     20  undefine owner
     21  undefine extents
     22  SET LINESIZE 200
     23  SET PAGESIZE 5000
     24  SET FEEDBACK OFF
     25  select   SEGMENT_NAME                      
     26  ,        SEGMENT_TYPE                      
     27  ,        BYTES                             
     28  ,        EXTENTS                   
     29  ,        INITIAL_EXTENT ini        
     30  ,        NEXT_EXTENT next          
     31  ,        PCT_INCREASE pct
     32  ,        MIN_EXTENTS                       
     33  ,        MAX_EXTENTS
     34  ,        OWNER
     35  ,        tablespace_name                   
     36  from     sys.dba_segments
     37  where    segment_name like upper('%&Segment_name%')
     38  and      segment_type like upper('%&Segment_type%')
     39  and      owner like upper('%&Owner%')
     40  and      extents >= to_number(nvl('&Extents','0'))
     41  and      tablespace_name like upper('%&Tablespace_name%')
     42  order by owner
     43  ,        extents desc
     44  ,        segment_name
     45  ,        segment_type
     46  ,        bytes
     47  /
     48  set linesize 132
     49  set pagesize 36
     50  set feedback on
     51  undefine segment_type
     52  undefine segment_name
     53  undefine tablespace_name
     54  undefine owner
     55* undefine extents
     56
    

    If appropriate for detailed LOB info, I used:

    
    SQL> get bytes
      1  SET SERVEROUTPUT ON
      2  PROMPT
      3  PROMPT
      4  UNDEFINE USERNAME
      5  UNDEFINE SEGMENTNAME
      6  UNDEFINE SEGMENTTYPE
      7  declare
      8      l_segment_name          varchar2(30);
      9      l_segment_size_blocks   number;
     10      l_segment_size_bytes    number;
     11      l_used_blocks           number;
     12      l_used_bytes            number;
     13      l_expired_blocks        number;
     14      l_expired_bytes         number;
     15      l_unexpired_blocks      number;
     16      l_unexpired_bytes       number;
     17  begin
     18  dbms_output.put_line(chr(10));
     19   dbms_space.space_usage (segment_owner           => upper('&&USERNAME'),
     20                           segment_name            => upper('&&SEGMENTNAME'),
     21                           segment_type            => upper('&&SEGMENTTYPE'),
     22                           partition_name          => NULL,
     23                           segment_size_blocks     => l_segment_size_blocks,
     24                           segment_size_bytes      => l_segment_size_bytes,
     25                           used_blocks             => l_used_blocks,
     26                           used_bytes              => l_used_bytes,
     27                           expired_blocks          => l_expired_blocks,
     28                           expired_bytes           => l_expired_bytes,
     29                           unexpired_blocks        => l_unexpired_blocks,
     30                           unexpired_bytes         => l_unexpired_bytes
     31                     );
     32     dbms_output.put_line('OBJECT_NAME = &&SEGMENTNAME');
     33     dbms_output.put_line('-----------------------------------------');
     34     dbms_output.put_line('segment_size_blocks       => '||  l_segment_size_blocks);
     35     dbms_output.put_line('segment_size_bytes        => '||  l_segment_size_bytes);
     36     dbms_output.put_line('used_blocks               => '||  l_used_blocks);
     37     dbms_output.put_line('used_bytes                => '||  l_used_bytes);
     38     dbms_output.put_line('expired_blocks            => '||  l_expired_blocks);
     39     dbms_output.put_line('expired_bytes             => '||  l_expired_bytes);
     40     dbms_output.put_line('unexpired_blocks          => '||  l_unexpired_blocks);
     41     dbms_output.put_line('unexpired_bytes           => '||  l_unexpired_bytes);
     42  end;
     43  /
     44  UNDEFINE USERNAME
     45  UNDEFINE SEGMENTNAME
     46* UNDEFINE SEGMENTTYPE
     47  
    

Comments are closed.