Mastering XML DB – COUNT and SUM

Working for years with Oracle’s relational environment, XML DB is a completely different kind of beast. You’re confronted with all the languages and techniques the XML environment has to offer – XML, XML Schema, XSLT, Xpath, DTD, DOM, SAX, etc,etc,etc – plus the ones you know from Oracle’s (object) relational (database) world, SQL, PL/SQL, java, OCI, java, etc.

I am currently working for a customer, who implemented the XML DB functionality in a new system. The total amount of XML data is roundabout 200 Gb. The XML data is split-up in 4 tables, 2 tables containing an XMLType column (XML Schema based, CLOB storage) and 2 XMLType tables (XML Schema based, Object Relational storage).

Some XML documents are more than 200 pages printed data and fairly complex in their structure , sometimes as big as 3 to 4 Mb. So it’s a real challenge to get the data you want, in a decent time period. Even simple questions like “give me a count of ?” or “summarize the following numbers?” can be tricky if you are dealing with complextype constructions/elements. Usage of COUNT and SUM can be tricky in an XML environment.

The following will give you more insight in Oracle SQL/XPath solutions and the need for a clear understanding of your XML Schema’s and the solutions presented by the Oracle database.

You can use Oracle’s XMLType functions to query and/or extract data. From Oracle’s XML DB developers Guide it states:

You can query XMLType data and extract portions of it using the existsNode(), extract(), or extractValue() functions. These functions use a subset of the W3C XPath recommendation to navigate the document.

  • existsNode()
  • extract()
  • extractValue()


The extract() function returns the node or nodes that match the XPath expression. Nodes are returned as an instance of XMLType. The results of extract() can be either a document or documentFragment.

SELECT extract(object_value,'/PurchaseOrder/Reference')  "Reference"
FROM PURCHASEORDER;
Reference
------------------------------
SBELL-2002100912333601PDT

1 row selected.

The extractValue() function returns the value of the text node or attribute value that matches the supplied XPath expression. The value is returned as a SQL scalar datatype. This means that the XPath expression passed to extractValue() must uniquely identify a single text node or attribute value within the document.

SELECT extractValue(object_value,'/PurchaseOrder/Reference')  
       as "Reference"
FROM PURCHASEORDER;

Reference
------------------------------
SBELL-2002100912333601PDT1 row selected.

The existsNode function evaluates whether or not a given document contains a node which matches a W3C XPath expression. The existsNode() function returns true (1) if the document contains the node specified by the XPath expression supplied to the function and false (0) if it does not.

SELECT COUNT(*)
 FROM PURCHASEORDER
 WHERE existsNode(object_value,'/PurchaseOrder/Reference') = 1;

COUNT(*)
---------
132

1 row selected.

SELECT count(*)
 FROM PURCHASEORDER
 WHERE existsNode(object_value,
       '/PurchaseOrder[Reference="SBELL-2002100912333601PDT"]')=1;

COUNT(*)
---------
1

1 row selected.

These examples (see “Example 3-23 Searching XML Content Using the existsNode() Function” ) from the XML DB Developers Guide can be misleading if you focus to much on the count(*) expression.

The count is only correct (in the last example) because there is an explicit “where Reference = {value}” XPath declaration in it. If you would base your count, as in the statement above, in XPath terms on “where there exists a node called reference”, then the XPath expression will probably not give you the expected result.

This is because the existsNode only validates the existence off the node, not how many nodes there are of this node in this branch. If it finds a node, it validates the condition as valid – and exits the branch.

This would give you the “expected” result for the count(*), in the example below = 2 records, if the structure would be a structure like:

 
   SBELL-2002100912333601PDT
 


   SBELL-2004100912389972PDT

The statement would give you an “unexpected” result for the count(*), in the example below = 1 record, containting repeating elements like a structure as described below:


   SBELL-2004100912333601PDT
   SBELL-2004100912389972PDT

Look at the following XML Schema structure:

XMLSchema for Payments

Based on this schema we now create a table and insert some rows.

set lines 50
 set long 10000
 set trimspool on

drop table amis_xml;

create table amis_xml (data xmltype);

insert into amis_xml
 values
(xmltype('
             Marco
             Gralike
             2345
             '));

insert into amis_xml
 values
(xmltype('
             
             BertJan
             Meinders
             3456
             125
             '));

insert into amis_xml
 values
(xmltype('
             
             Chris
             Gralike
            4567
            '));

insert into amis_xml
 values
 (xmltype('
              
              Karin
              Kriebisch
              '));

commit;

describe AMIS_XML

 Name    NULL?    Type
 ------- -------- --------
 DATA             XMLTYPE

select t.data from AMIS_XML t;

DATA
 ------------------------------
 
  
    Marco
    Gralike
    2345
  
 
 
  
    BertJan
    Meinders
    3456
    125
  
 
 
  
    Chris
    Gralike
    4567
  
 
 
  
    Karin
    Kriebisch
  
 

4 rows selected.

select * from amis_xml  t
 where existsnode(t.data,'/PAYMENT/PERSON/SALARY')=1;

DATA
 ------------------------------
 
  
    Marco
    Gralike
    2345
  
 
 
  
    BertJan
    Meinders
    3456
    125
  
 
 
  
    Chris
    Gralike
    4567
  
 

3 rows selected.

select count(*) from amis_xml  t
 where existsnode(t.data,'/PAYMENT/PERSON/SALARY')=1;

COUNT(*)
---------
3

1 row selected.

Instead off the (maybe) expected 4 rows selected, existsnode only returns 3. The two salary nodes of Mr. Meinders are counted as one.

The solution to this problem is presented by the XMLSequence function.

The XMLSequence() function makes it possible to take an XMLType containing a fragment and perform SQL operations on it. It generates a collection of XMLType objects from an XMLType containing a fragment. The collection contains one XMLType for each of the root elements in the fragment. This collection of XMLType objects can then be converted into a virtual table using the SQL table() function. Converting the fragment into a virtual table makes it easier to use SQL to process the results of an extract() function that returned multiple nodes.

The examples as shown in the XML DB Developers Guide, show us examples why and when we should use XMLSequence (mark the XXX records selected).

The described structure:

 
 
    SBELL-2004100912333601PDT
    SBELL-2004100912389972PDT
 

Would give with the statement:

SELECT extract(object_value,’/PurchaseOrder/Reference’) “Reference”
FROM PURCHASEORDER;

Reference
——————————
SBELL-2004100912333601PDT
SBELL-2004100912389972PDT

1 record selected

SELECT extract(value(v),’/Reference’) “Reference”
FROM PURCHASEORDER t
, TABLE(XMLSequence(extract(object_value, ‘/PurchaseOrder/Reference’))) v
;

Reference
——————————
SBELL-2004100912333601PDT
SBELL-2004100912389972PDT

2 records selected

Knowing this, we now can re-write our SQL/XPath statements.

 select extract(value(v),'/SALARY') "Salary"
 from amis_xml t
 , TABLE(XMLSequence(extract(t.data,'/PAYMENT/PERSON/SALARY'))) v;

 Salary
 ------------------------------
 2345
 3456
 125
 4567

 4 rows selected.

select count(*)
 from amis_xml t
 , TABLE(XMLSequence(extract(t.data,'/PAYMENT/PERSON/SALARY'))) v;

 COUNT(*)
 ----------
 4

 1 row selected.

select extractvalue(value(v),'/SALARY') "Salary"
 from amis_xml t
 , TABLE(XMLSequence(extract(t.data,'/PAYMENT/PERSON/SALARY'))) v;

Salary
-------
 2345
 3456
 125
 4567

 4 rows selected.

select sum(extractvalue(value(v),'/SALARY')) "Total"
 from amis_xml t
 , TABLE(XMLSequence(extract(t.data,'/PAYMENT/PERSON/SALARY'))) v;

Total
------
10493

1 row selected.

The next release of the Oracle database, version 10g version 2, will hopefully present more flexible tools like XQuery and or better XPath possibilities like:

select extract(t.data,'count(/PAYMENT/PERSON/SALARY)')
 from amis_xml t ;

 ERROR:
 ORA-31012: Given XPATH expression not supported
Marco Gralike Written by:

13 Comments

  1. Fernando Rocha
    June 6

    good article…thanks!!

  2. June 6

    Be aware that time passes on. From of database version 10.2.x you should use the XMLTABLE syntax instead of the table(xmlsequence(extract())) construct.

  3. Mithun Goswami
    August 26

    Hi,

    I have a problem using XML. I have a following XML

    Now I want to parse the xml with PL/sql and store the value of client and message_type in two variable like v_client and v_msg_typ.

    Could you please help me sending the code of that?

    Thanks in advance
    Mithun

  4. Peter
    October 30

    Good article, it helped me. Thanks!

  5. Jeywin Lizy
    April 7

    select extract(x,’/Product/qty/no/text()’).getStringVal() into p from dual;
    and

    select extract(x,’/Product/qty/no/text()’).getStringVal() into p from dual where existsNode(x,’/Product/qty[no=”2″]’) = 1;

    returns–> 12
    Please let me know where I am going wrong

    My XML:
    ——

    1

    2

  6. Jeywin Lizy
    April 7

    My XML used in the above comment is
    ’12’

  7. April 10

    Use TABLE(XMLSequence(extract())) in the FROM clause to unnest the “strings”

  8. Pastorie
    June 24

    Hi there,

    When I try with “select extractvalue” it returns “ORA-19025. EXTRACTVALUE returns value of only one node” to me. Do you know why it happens?

    Anyway, thanks a lot for the usefull information!

  9. June 24

    EXTRACTVALUE can only return a scalar value. To find out where you get XML fragments or multiple values temporarily replace the EXTRACTVALUE with EXTRACT…

  10. Alex
    June 1

    Hello!

    Thank you a lot for your article! It was exactly what I was looking for!

    Although, I do have a problem there. By using
    SELECT count(‘/*/row’)
    FROM SAP_XML t , TABLE(XMLSequence(extract(t.xml_file,’/*/row’))) v
    where …;
    I always get “12” as return. I use the 11g Version. I understand I have to “use the XMLTABLE syntax instead of the table(xmlsequence(extract())) construct” but how?

    Thank you for your time in advance!

  11. mohamed
    January 6

    i can’t understand this expresion i hope any one explain it

    SELECT COUNT(*) FROM amis_xml t
    WHERE existsnode(t.DATA,’/PAYMENT/PERSON/SALARY’)=1;

    COUNT(*)
    ———
    3

    1 ROW selected.

    • January 8

      Find all nodes where exists a node SALARY under /PAYMENT/PERSON which is not without value/empty and then count the results.

Comments are closed.