Hmmm, still busy to setup a baseline environment to do some serious XMLDB performance / architecture testing. I am almost now on the point were I really can start, but until now I ran into a lot of issues… It also told me that the mediawiki dumpfile is good set of XML data to work with. A lot of variance is build in. I hope I won’t encounter too muchmore issues though…
National Language Settings
One off those issues was the almost standard issue with handling NLS, aka National Language Settings. Oracle has a good FAQ I always refer to when the item comes up again on the OTN XMLDB Forum. To avoid most of the general issues, I set the defaults for all my environments to AMERICAN_AMERICA.AL32UTF8… of course this has its trade offs.
I just read Alex post about ORA-01450 and I realized this is also a very valid reason you could encounter for Object Relational Storage in using Oracle 10g generating your XML Schema objects via DBMS_XMLSCHEMA. IOT’s are the default while registering your XML Schema via DBMS_XMLSCHEMA (and AL32UTF8 as default database characterset).
I encountered my NLS issue when I realized that my JDeveloper did show its error messages in Dutch. While processing the XML mediawiki dump file, two XML documents failed to load, showing the following messages:
Thread Writer_000001 encountered Execption -
ORA-31011: Ontleden van XML is mislukt.
ORA-19202: Fout in XML-verwerking ().
I try to be consistent regarding NLS settings, so I realized that JDeveloper was picking up my Dutch settings from the Windows XP command line environment ( I always use AMERICAN.AMERICAin the registry).
After searching a bit on the internet, I found the method to overrule this:
These settings must be set in the jdev.conf configuration file of the JDeveloper software. The file resides in the directory \\jdeveloper_home\jdev\bin.
XMLType and XML Wellformedness
Most of the time if you encounter NLS issues, while using XMLDB functionality, this manifests itself because a XML document will get an error that it is “not wellformed” XML. Probably it is, but because of your NLS characterset settings, the document will be converted if the database is using a different characterset and therefore will report that the XML document isn’t well formed anymore… So be aware of this and it won’t cost you so much time to solve the problem.
While using an XMLType column or table, there will be always a check on “XML wellformedness”, unless you disable it, for example, because of performance reasons. If you want to know how, read the OTN XMLDB Forum post called: “insert in xmltype column: extremely slow“. Beware of Mark’s response though – regarding Oracle support (“…and some subsequent operation on the table causes an ORA-00600 please do not bother calling support“).
In my post “HOWTO: Load Really Big XML Files“, I mentioned the use of an “error” table. This error table should be not to restrictive and be able to hold the exceptions. Not that I did this, but I became more aware of this while thinking about NLS in conjunction with XMLType and the errors I encountered during the load process of the Mediawiki XML dumpfile. Probably the best will be using an CLOB column for the error table.
I like the SQL*Plus Windows GUI. Can’t help myself. But in Oracle 11gR1 this GUI is not supplied anymore in the distribution. Only the windows command line version is available. There are probably zillions of small examples that different versions are incompatible, for instance an Oracle 9 client SQL*Plus client doesn’t know how to handle a “purge recyclebin” statement (the recyclebin wasn’t available in version 9). I guess I encountered such an issue again. While using a SQL*Plus 10gR2 GUI client on the error table, at last filled with XML documents, I get the following error:
SQL> select * from WIKI_STAGE_ERRORS;
attempted to use an invalid connection in OCI (object mode only)
My guess is due to Japanese signs in it – but still have to investigate – so maybe, again, NLS related, although the error is severe. The problem is caused by the XML document that has the following content: Takayuki Yamada
My Oracle 11gR1 Linux sqlplus client shows the XML document. I will try some NLS Japanese character set to see if this will avoid the problem with the 10gR2 SQL*Plus client and setting a Truetype font in the registry that hopefully supports this .
Tomorrow the first complete load will be finished. A second load process will be started in the evening. In total this will be roundabout the 10th attempt (loading time approximately 22 hours each per run) so I still can’t do a comparison because I will be on holiday to Portugal next 2 weeks, so tests with CLOB based xmltype storage, etc, will have to wait.
A baseline…Slowly, very slowly, but I am getting there…