XMLDB Performance: Environment, Set-up, Procedure

There has some time past since my last update on my XMLDB Performance, tuning, adventure. As you maybe have read, I had my problems to set-up a decent test environment and loading XML data. I am testing with an Mediawiki XML dumpfile. If there is a problem or a side effect to notice than I have a big chance that it will show up if not only by the size of this Mediawiki XML dumpfile. The Mediawiki XML dumpfile contains, when I downloaded it, almost 8 million records and has a file size on Windows NT NFS of 17,4 Gigabyte.

Loading this data via the procedures described in my posts, “HOWTO: Load Really Big XML Files” and “Setting Up an XMLDB Performance “Baseline” Environment (Part 02)” dealt with some of the issues I encountered to realize controllable testing set-up. Loading this amount of XML data will take some time.

Environment Set-up

My client machine is my old Windows XP Professional Dell Dimension 8300 desktop machine from 2003, with 1.5 Gb RAM and a single CPU.

Client EnvironmentClick on the picture to enlarge

As you can see, nothing fancy here.

I use a VMware Server Oracle Enterprise Linux virtual machine environment and a Oracle Enterprise Database V 11.1.0.6.0 as my back-end tier for my testing purposes. I created an RDA output so you will have an idea how the virtual environment looks like: rdarda_srv01_18_102.zip.

Our business LAN is based on 100 Mb/s Ethernet.

There are probably a lot of issue how could have an effect on loading the data, storing it in the database, etc – but – all methods shown here will have been affected with the same issues because I only changed the storage model and kept all the other factors “as is”.

I used the following connections.xml setting for the SAX Loader program (also see “HOWTO: Load Really Big XML Files” ).

  Thin
  10.252.252.102
1521
  TEST
  TEST
  DEDICATED
  wiki
wiki
  C:\TMP\WikiMedia_DUMP\enwiki-latest-pages-articles.xml
  page
WIKI_STAGE
WIKI_STAGE_ERRORS xsi http://www.mediawiki.org/xml/export-0.3/ 1000 4

So I used a “commitcharge” of “1000” and 4 threads to load the data. While loading the data this way, my client CPU topped almost at “70%” busy.

Database Set-up

Have a look at the RDA output for full details of the environment. While reading it, I see that a lot could have been improved, for instance a bigger TEMP tablespace etc. I created a dedicated schema called “WIKI”, with a default tablespace MEDIAWIKI_STAGE and temporary tablespace TEMP and granted it the dba and xdbadmin roles,

The MEDIAWIKI_STAGE tablespace is a bigfile tablespace only used to store two tables: a WIKI_STAGE and a WIKI_STAGE_ERRORS table.

The WIKI_STAGE table is the one that is changed / reconfigured during the loading of the XML data and that will actually will be used to store the XML data.

The WIKI_STAGE_ERRORS table is used to store XML data, of the set to be loaded into the database, that contain errors, for instance, because they are they don’t comply to XML well-formedness. There are 2 that actually don’t comply in my set and therefore are inserted into WIKI_STAGE_ERRORS.

SQL> select dbms_metadata.get_ddl('TABLE','WIKI_STAGE_ERRORS') from dual;

DBMS_METADATA.GET_DDL('TABLE','WIKI_STAGE_ERRORS')
--------------------------------------------------------------------------------
  CREATE TABLE "WIKI"."WIKI_STAGE_ERRORS" OF "XMLTYPE"
 OIDINDEX  ( PCTFREE 10 INITRANS 2 MAXTRANS255
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
  TABLESPACE "MEDIAWIKI_STAGE" )
 PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
  TABLESPACE "MEDIAWIKI_STAGE"
 XMLTYPE COLUMN "SYS_NC_ROWINFO$" STORE AS BASICFILE CLOB (
  TABLESPACE "MEDIAWIKI_STAGE" ENABLE STORAGE
 IN ROW CHUNK 8192 PCTVERSION 10
  NOCACHE LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT))

1 row selected.

SQL> select dbms_metadata.get_ddl('TABLESPACE','TEMP') from dual;

DBMS_METADATA.GET_DDL('TABLESPACE','TEMP')
--------------------------------------------------------------------------------

  CREATE TEMPORARY TABLESPACE "TEMP" TEMPFILE
  '/oracle/oradata/TEST/temp01.dbf' SIZE 55574528
  AUTOEXTEND ON NEXT 655360 MAXSIZE 32767M
  EXTENT MANAGEMENT LOCAL UNIFORM SIZE 1048576

1 row selected.

SQL> select dbms_metadata.get_ddl('TABLESPACE','MEDIAWIKI_STAGE') from dual;

DBMS_METADATA.GET_DDL('TABLESPACE','MEDIAWIKI_STAGE')
--------------------------------------------------------------------------------

  CREATE BIGFILE TABLESPACE "MEDIAWIKI_STAGE" DATAFILE
  '/u02/oracle/oradata/TEST/mediawiki_stage.dbf' SIZE 41003515904
  AUTOEXTEND ON NEXT 1073741824 MAXSIZE 46080M
  LOGGING ONLINE PERMANENT BLOCKSIZE 8192
  EXTENT MANAGEMENT LOCAL AUTOALLOCATE SEGMENT
  SPACE MANAGEMENT AUTO
   ALTER DATABASE DATAFILE
  '/u02/oracle/oradata/TEST/mediawiki_stage.dbf' RESIZE 48318382080

1 row selected
.

Although a lot of functionality is still enabled or disabled that could have an effect on performance, like in the “background” running maintenance tasks…; I decided to at least have control about memory structures and therefore set these by hand (so no automatic memory management). My hope is that this will enable me to see the effects later on, when changing a specific memory structure like the java_pool or the large_pool.

The following are set by hand:

java_pool_size = 159383552
large_pool_size = 159383552
pga_aggregate_target = 157286400
db_cache_size = 503316480
shared_pool_size = 159383552

Not “top notch” values but I am interested (for now) in differences…

Procedure

The procedure was simple.

  1. I dropped the WIKI_STAGE via a “drop table WIKI_STAGE purge”
  2. Truncated the table WIKI_STAGE_ERRORS
  3. To be absolutely sure the recyclebin was empty, purged it via a purge dba_recyclebin statement
  4. Created an output by listing all segments defined on tablespace WIKIMEDIA_STAGE
  5. Created the new WIKI_STAGE
  6. Created an output by listing all segments defined on tablespace WIKIMEDIA_STAGE
  7. Created a dbms_metadata output of the table structure
  8. Created an output via dbms_space.space_usage for LOBSEGMENT belonging to the WIKI_STAGE structure (if appropriate)
  9. Run the SAX Loader Java program loading the Mediawiki XML dumpfile
  10. Went home to checkup later on to see if it was finished

If finished

  1. Created an output by listing all segments defined on tablespace WIKIMEDIA_STAGE
  2. Created an output via dbms_space.space_usage for LOBSEGMENT belonging to the WIKI_STAGE structure (if appropriate)
  3. Collected all logging and data and put it into Excel

…and then restarted the procedure…

To be continued…

I split up blog posts, so that I am able to refer via this blog post when dealing with the environment, set-up, procedure and decisions made…

So, to be continued…

Marco Gralike Written by: