This post will show you some of the first numbers I collected regarding “Loading XML data”, while making use of different XMLType “physical storage containers”.
I also have done some initial testing with Object Relational XMLType storage, but because this method of storage has many options and extra features, I won’t describe them yet here. This topic is interesting enough to earn its on post.
While setting up a baseline for my XMLDB performance tests, I noticed that my “count(*)” on a Binary XML table (using Securefile LOB storage) called “WIKI_STAGE” took an awful long time. So long, that I even had to kill the SQL*Plus session, that was executing the “count(*)”. I started wondering. Why did it take so long to come up with the result?
In the end, due to good advice from Jonathan Lewis, I came up with a solution (although probably unsupported) and a better understanding off the mechanics involved. Also, as a side effect, it triggered a really good discussion on the “Oracle-L” freelist, regarding “counting”.
I don’t mean “show and tell,” where someone claims he has improved performance at hundreds of customer sites by hundreds of percentage points [sic], so therefore he’s an expert.
I mean show your work, which means documenting a relevant baseline measurement, conducting a controlled experiment, documenting a second relevant measurement, and then showing your results openly and transparently so that your reader can follow along and even reproduce your test if he wants to.
This is more or less funny, because I read Cary’s post, be apparently I didn’t read it… I can really relate to it now.
I am in the middle of setting up a XMLDB test environment to test, among others, load times while using different kinds off XMLType storage based upon CLOB, Object Relational and Binary XML (using Basicfile / Securefile options). And although I am working on a VMware environment, I noticed that it isn’t that easy to setup a “controlled experiment“. What makes it harder is, that I am using the Mediawiki XML English dumpfile, that contains roundabout 7 million records (17 Gb of ASCII data). This makes it more interesting, and the effects more clearer, but it also takes much more time to do stuff.