HOWTO: Handle Complex XML Schemas (Part 1)

Currently sitting in at the Oracle Open World 2010 presentation of Sam Iducula, Consulting member of the tech. staff and Mark Drake, Sr. Product Manager for Oracle XML DB. Before getting into the more in-depth topics Sam explained XML schema usage, for validation via XML schema validators like for example XML Spy or JDeveloper. This is currently really needed because those more used XML Schema like the really big ones out there like H7, etc, are nowadays so very very big that a good XML Schema validator is really needed. XML Schema in binary XML format is stored in a post parsed binary format. This has the advantage that Oracle knows about the format when storing the XML document. Extra information can be shared by the database by registering the XML Schema in the database that validates the Binary XML content.

There can be a lot of recursive dependencies, via the import or include references in a XML Schema, which make it even more difficult to make optimal use of this information. For example in the HL7 (Health Level 7 schemas) setup this includes over 100 included XML Schemas. Oracle 11gR2 has been greatly improved performance and handling of those very huge meta data information as stored in such XML Schemas. Via streaming schema validation and adding hints via xdb:annotations, this provides the database with even more information on how to optimal handle these structures and as such performance can be improved even more. Some of those hints could be used to avoid the creation of objects, in this case while using XMLType Object Relational storage via, for instance, xdb:defaultTable=”” (providing an empty string) or store parts of the XML document information out of line. By the way for this last example you should use JDeveloper because it will annotate the XML Schema incorrectly (the bug has been reported by me). One of the improvements in 11.2.0.2.0 is huge improvements were made in cycle detection recognition, so they are handled even better in the mentioned version.

On the XMLDB home page on the Oracle OTN website a package of tools provided (“Oracle XML DB Ease of Use Tools for Structured Storage“) which can make your life easier regarding those xdb:annotation’s especially for those enormous big XML Schemas. This tool set which enables you to automate a lot of hints in XML Schema optimization you would like to make. Via XQuery or other XML DB update statements you are also able to override the by the database generated naming or storage options. Via some simple anonymous PL/SQL blocks this can be very easily done via for example, DBMS_ANNOTATE-x packages contained in this XML tool set, as said which is freely available on the XML DB OTN Oracle website.

Automation of xdb:annotations
Click the picture to enlarge

This tool set also comes with a white paper that shows and demonstrates some of the best XML handling ideas and experience gathered trough a lot of years handling customer use cases the Oracle XML DB Development team had. For example if you know it’s not applicable to your XML document you are able to switch off or alter DOM validation handling while storing or handling your XML document in the database. You can override ordering for example if it is applicable for your XML Schema, this avoids oracle checking it, which improves handling, but, be very aware, it can also be dangerous doing this if it was implemented by the person who created the XML Schema, but just didn’t care about the real life implement and/or it’s importance regarding being a actual mandatory requirement in practice.

I have experience multiple times that even with official XML Schemas the restrictions didn’t match real life use, so although automation really helps you to manage your XML registered schemas more easily, you must be aware of those exceptions. XML schemas can be created very loosely on real life implementations which can get you in a lot of problems after these storage models, based on such an XML schema, is used in your database design; those rules will be enforced via a XML schemas in the database.

As always, proper design with future needs in mind, takes time to do it properly. This is also the case regarding creating a good XML schema.

In Oracle 11you have now the possibility, via this tool set, to use DBMS_XML_MANAGE (for XMLType Object Relational storage) to rewrite table to column mapping, which figures out for you, makes it more easy, to identify and create supporting indexes on ComplexTypes. This has the advantage that you can create indexes with some more meaningful names like, for example “line_items_uniq_idx_01” or whatever the naming convention within your company might be.

In the latest XDB toolset there will be now also a XDB_ANALYZE_XMLSCHEMA package which sorts out all of the scripting and possible options, while you feed it the actual XML Schemas. As was demonstrated by Mark Drake, all the FpML schemas which have a lot of dependencies of each other where analyzed, annotated, registered and it created over 100 tables and more than 2500+ objects in minutes. Try doing this by hand…

Also while using this package it will sort out the proper XML Schema dependencies and in which order all those XML schemas have to be registered in the correct order (based on includes, imports and ref’s used by Simple- and ComplexTypes). Sometimes you have to break up column create table statements because the maximum amount of columns allowed by Oracle in one single CREATE TABLE statement is “only” 1000 columns. This package will help you figure out how much of those Object Relational storage items will have to be moved “out of line” and/or to break up on a certain level in the XML hierarchy of the XML tree to avoid this 1000 column limitation but also to provide the design info needed to get the maximum performance.

This tool set used for XMLType Object Relational storage is only useful if you XML design is highly relational. If not then, your XMLType storage module should be based on Binary XML. The advantage of using XMLType Object Relational storage is that you make full use of Oracle relational technology and optimizations, which is available since a long long time and full use of, for example, the Cost Based Optimizer will kick into effect. On the other hand, be aware if your XML design is really relational, maybe you should have created it by relational means. There should be a proper use case to work with the XML format in the first place. My adagio always is: if it is not XML, don’t use Oracle XMLDB. If it is, go for Oracle XMLDB, if not only that is a “no cost option” within your Oracle database and it has been designed, since version 9.2.0.3.0, to optimal handle XML in your database.

For further information about choosing the proper XMLType storage model and how to optimally query these structures, have a look at:

HTH

Marco

Marco Gralike Written by: