Whitepaper: Current Support of XML by the “Big Three”

Today I discovered a web page that has some cool whitepapers and thesis of Mr. Martin Necasky. Besides the work he is doing regarding conceptual modeling for XML, he also wrote a book about this topic called “Conceptual Modeling for XML” in which he, among others, discusses his own conceptual model for XML called XSEM that extends the Entity-Relationship model, he and Ms Irena Mlynkova, wrote a 13 page brief discussion on “The Current Support of XML by the ‘Big Three'” (March 2009 – Oracle 11g, IBM DB2 9, and Microsoft SQL Server 2008).

Some remarks (about Oracle only of course) on this brave attempt (no seriously you can read that it isn’t copy paste work – so a lot of work in testing was probably involved in putting this to paper)…

  1. Page 3 – “Storing XML data…”: Oracle supports 3 storage strategies: CLOB, Binary XML and Object Relational (OR). The later, OR, also can contain CLOB and is in that case referred to as “hybrid” storage. There is only one XMLType datatype with three physical representations, or you could make the distinction between “Non-Schema-Based XMLType” and a ” Schema-Based XMLType”. Differences regarding validation are explained here (beware an old pre-Binary XML post…): “Why do I get an ORA-19030 error when invoking schemaValidate() ?
  2. Page 4 – “With regard to the three strategies we have three options – nothing, CLOB or BINARY XML…”: That is…if the “STORE AS…” is not explicitly made in the create statement, if defaults to CLOB. Binary XML and Object Relational storage has to be addressed in the create statement…
  3. Page 4 – “XMLSCHEMA…”: ALLOW ANYSCHEMA or ALLOW NONSCHEMA only refer to Binary XML storage and its encoding. It has no relation to CLOB or OR.
  4. Page 4 – “If we do not specify anything, the default storage strategy is structured…”: If we do not specify anything, the default is CLOB (so unstructured storage).
  5. Page 4 – “XML collection, i.e. an element with maxOccurs > 1 can be stored either into a VARRAY or into LOB…”: Each collection, in the case of using an Object Relational approach, can be represented as a CLOB, VARRAY as LOB, VARRAY as Nested Table, VARRAY of Ref XMLType as LOB or VARRAY of Ref XMLType as Nested Table. Also see the following whitepaper for more details: “Mastering XML DB Storage in Oracle Database 10g Release 2”
  6. Page 4 – “The XML schema must be first associated with two namespaces
    http://xmlns.oracle.com/xdb and http://xmlns.oracle.com/2004/CSX…”: No, only one (in normal circumstances) should be used: http://xmlns.oracle.com/xdb. The second (/CSX) is internally used to extend the default, in the case of Binary XML and should in normal cases not be used (but there is a bug know that needs this one to be addressed as a workaround…)
  7. Page 4 – “The XMLIndex consists of three parts…”. The domain index XMLIndex structures make use of a PATH TABLE, the content is indexed via 2 B-Tree indexes and 1 Function based index. This structure that contains the by the user specified, needed, XPath values and information about nodes, is addressed / searched via a specific internal function. This function, which is based on a certain known search method (this method can’t be named by me, due to a “non disclosure agreement” with Oracle, but believe me is specially designed in handling unstructured data), is called when a user searches a value or XML fragment.
  8. Page 5 – “For the purpose of XML querying Oracle supports two options…”: What about XPath…?
  9. Page 5 – “Evaluation of the queries can be optimized by the indices as described before…”: Evaluation is also being optimized, if more is known of the XML document / instance, via a in the repository registered XML Schema.
  10. Page 5 – “Validity…or using a built-in function of XMLType…”: To clarify a bit more. While using the XMLType, the XML document is ALWAYS checked on XML well formedness, extra checks to optimize internal re-writes etc. are automatically done if a registered XML Schema is present and the XML document refers to it via a valid namespace. More restrictive validation can be done via “Lax validation” or “Strict validation”.
  11. Page 5/6 – “If we apply the classical UPDATE operation on an XMLType column of a relation, it causes replacement of the whole XML document stored in it…”, this sentence suggest within the context that Oracle XML DB doesn’t support partial updates. Partial updates are supported in at least 11g AFAIK.

To be honest I had hoped for a bit more “precision” in this only 13 page whitepaper, despite this, as said, a brave attempt. I wouldn’t dare to make the attempt. There is still a lot the learn from this and the other papers. I am really looking forward in reading the other stuff, if not only trying to grasp alternative views (the more “university” viewpoints contra my oracle “relational” views).

I will give you an update when I have read the book “Conceptual Modeling for XML“. Its content sounds very interesting and probably is worth its money.

M.

Marco Gralike Written by: