21 Comments

  1. 7/12/2008

    Marco,

    Great post. I’ve been thinking about doing something like this myself. Now I have a link to share when people ask me how to load large XML files.

    Thanks,

    LewisC

  2. 7/12/2008

    Your welcome, Lewis.

    ;-)

    I needed this piece for two reasons:

    1. To demonstrate to myself I could do this task in JDeveloper, as well as my normal routine to do it on the command line
    2. Because I need this as a start for some series about XMLDB performance and test scenario’s

    All the “groundwork” (and thanks) is off course done to Marchelo and Mark, but the problem for me is that those guys are far better Java programmers then me and they probably though that this is common knowledge (which it isn’t, at least not for me). It is almost embarrassing for me to see how much time I spend to figure out all the JDeveloper details.

    It was a good exercise for me in the end and I hope other people can use it.

  3. Michael
    7/21/2008

    Nice post. Any comparison of loading with respect to performance and storage as compared to just a 10g xmltype column?

  4. 7/21/2008

    I wanted to, but I had to destroy my 10.2.0.3 and 11.1.0.6 software installations because I needed to switch from the standard to the enterprise edition.

    For now I pin pointed to the 11gR1 instance (binary xml). The moment I will start comparing CLOB agains Binary (next step, because they are closest in some of their use and behavior) I will involve the 10.2.0.3 (probably 10.2.0.4) software again as well.

  5. 7/24/2008

    Marco,
    great posting, there is one thing i am not clear with though: how big was the test file you used for testing the process (shown in the last picture i guess)?
    17Gb or less?

    Another problem that is haunting me lately is the definition of “big XML file”.
    What i am still missing to understand in this definition is the relationship between the size of the XML file and the available resources of the database instance. How are they related to each other? I mean, are we talking of a logarithmic scale, an exponential scale, a linear factor?
    I was doing some tests using traditional DOM validation with not incredibly large XML documents and object-relational storage model and i am getting really nasty errors with just a few megabytes (ORA-03113 in particular), so i was really questioning the approach, hence the interest for the SAX approach, although i find it overly complicated if you imagine that i would like to release a 1-click installation package…

    I am still investigating what are the real physical constraints in the traditional DOM approach because i’d like to have a clear picture before taking any decisions.

    Regards,
    Flavio

  6. 7/24/2008

    The XML file being processed is 17 Gb as shown on NTFS / via Windows Explorer.

    What is “big”…

    An XML document can be big, is seen big in the XML world if it exceeds 1.5 Mb. Or at least that is my gut feeling.

    It also depends a little bit the complexity of the structure and the way complextypes, or schemata via, for instance “include”‘s, etc, have circular dependencies.

    The XML document can contain a structure that could represent a piece off data that has been stored in a SAP Oracle Schema (with 1000th off tables etc), with 1 record in all these tables, being in total “the XML document”.

    So what is “big”…?

    Another problem is that, at current state, if you want a piece of specified data or a fragment then via the Document Object Module, the whole XML structure has to be build into memory before this structure can be used to search for the specified fragment or data piece wanted.

    So regarding DOM, you will have to take into account what the method is to do it, via Java (Oracle JVM Kernel) or via a more optimized method build in the Oracle C kernel. So “java_pool” can suddenly be an important factor to be aware off, besides PGA and other structures…

    Regarding the relational factors (logarithmic scale, an exponential scale, a linear factor), this is still one on my to do list.

    You could test with a statement done via XQuery. In 10.2.0.2 it has still dependencies with the JVM bit / loaded classes. In 10.2.0.3 XQuery has been further optimized and is dependent on C code.

    You should be able to read up on it when searching for “xquery” / “mdrake” / “10.2” via the OTN XMLDB Forum.

  7. suresh
    1/28/2009

    Hi,

    We have xml files generated 2 to 4 each min with each file having 200000 records. This needs to be loaded into orcle 10g/11g into individual columns represented by the tags. All records in the xml file have similar tags with the ROWSET at the start and end.

    Regarding SAX utility, do we have to go through the steps you mentioned using the gui interface? As we will be on unix os, will it be different (should we download on unix etc…)? Also, can we schedule these loads just like we do on cron?

  8. 1/28/2009

    I am not absolutely sure what your question is here, but my guess (if I interpret your question correctly) would by to have a look at updateable “XMLType Views” (using an extra trigger for update)

    I am not a very good java programmer but the code used can probably made “executable” very easily by someone that knows java.

  9. Des McCrory
    9/12/2009

    Hi Marco,

    A big thanks to you, and Mark Drake. After a week or so of frustration in building a solution, getting an “out of process memory” and back to the drawing board to find an alternative I had a solution and document loaded in one hour on a Friday night, and with one eye on the football on TV :-).

    I was about to try and build my own Sax Loader (and in PL/SQL too lol) when I came across your blog and the link to Marks code.

    Like you it also gave me an opportunity to play with JDeveloper, with some confidence this time thankyou and test my limited Java skills (I use the standard DBA response “I haven’t time to read the Java manual yet).

    Now I can go and give the good news to the client :-D.

    Kind regards,
    Des

  10. Bill Conniff
    10/15/2009

    New technologies are being developed to addess the size and performance issues of xml. Xponent software’s XMLMax loads any size xml into a treeview using at most 20MB of memory and can do XSL transformations within the same memory limit. The CAX xml parser is a pull parser that can look backward through all parsed xml, thereby enabling any xml transformation with a fast pull parser and without memory constraints.

  11. 10/15/2009

    @Bill, sounds interesting, but is the XML parser content aware…?

  12. Davide C.
    1/19/2010

    Hi Marco,
    this is a very nice post.
    For my information, how long it has to load the file?

  13. 1/19/2010

    For what it is worth (not that much actually if you start thinking about it) I started the program on a now I guess 7 year old 1Gb 1 CPU desktop to load the 17 Gb Wikidump data accross the network in multiple different setup Binary XML tables. The different storage setups took approx. between 20 and 23 hours. The Binary XMLType tables were not XML Schema driven at that time, that is, based on a registered XML Schema in the XDB Repository.

  14. Aaron C.
    3/4/2010

    Marco,
    Thanks for all you and Mark Drake have done trying to catch the rest of us up on XMLDB. If given the goal to take an incoming large xml file similar in size to the mediawiki file above and shred it into ultimately relational, not object-relational, what would your recommendation be in terms of approach on 11g? I see most seem to favor a SAX approach and/or chunking the large file into smaller files. Is anything else even viable from an XMLDB perspective performance-wise? Given the final goal is relational is there an XMLDB approach that could compete with shredding into files outside of Oracle and then loading via sqlldr/external tables?

  15. doul
    5/3/2010

    Hi,

    When I run the standalone application “saxloader”, whether using the sample file “Accounts.xml” or my own files, I get this error:

    ****************************************************************************************************************************
    Thread Writer_000001 started at 3 mai 2010 22:14:37 CEST
    Thread Writer_000001 using SQL : insert into ACCOUNTS_TABLE values (xmltype(?).createSchemaBasedXML(‘http://otn.oracle.com/xmldb/account.xsd’))
    BaseApplication.getConnection(): Database Connection Established
    Thread Writer_000002 started at 3 mai 2010 22:14:37 CEST
    Thread Writer_000002 using SQL : insert into ACCOUNTS_TABLE values (xmltype(?).createSchemaBasedXML(‘http://otn.oracle.com/xmldb/account.xsd’))
    Unexpected Exception raised in thread Writer_000002
    java.sql.SQLException: ORA-30676:socket read or write failed
    ORA-06512: à “SYS.XMLTYPE”, ligne 272
    ORA-06512: à ligne 1

    at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134)
    at oracle.jdbc.ttc7.TTIoer.processError(TTIoer.java:289)
    at oracle.jdbc.ttc7.Oall7.receive(Oall7.java:573)
    at oracle.jdbc.ttc7.TTC7Protocol.doOall7(TTC7Protocol.java:1891)
    at oracle.jdbc.ttc7.TTC7Protocol.parseExecuteFetch(TTC7Protocol.java:1093)
    at oracle.jdbc.driver.OracleStatement.executeNonQuery(OracleStatement.java:2047)
    at oracle.jdbc.driver.OracleStatement.doExecuteOther(OracleStatement.java:1940)
    at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:2709)
    at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:589)
    at oracle.otnsamples.xmldb.saxloader.examples.DocumentWriter.uploadDocument(DocumentWriter.java:336)
    at oracle.otnsamples.xmldb.saxloader.examples.DocumentWriter.run(DocumentWriter.java:286)

    Unexpected Exception raised in thread Writer_000001
    java.sql.SQLException: ORA-30676: échec de lecture ou d’écriture du socket
    ORA-06512: à “SYS.XMLTYPE”, ligne 272
    ORA-06512: à ligne 1

    at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134)
    at oracle.jdbc.ttc7.TTIoer.processError(TTIoer.java:289)
    at oracle.jdbc.ttc7.Oall7.receive(Oall7.java:573)
    at oracle.jdbc.ttc7.TTC7Protocol.doOall7(TTC7Protocol.java:1891)
    at oracle.jdbc.ttc7.TTC7Protocol.parseExecuteFetch(TTC7Protocol.java:1093)
    at oracle.jdbc.driver.OracleStatement.executeNonQuery(OracleStatement.java:2047)
    at oracle.jdbc.driver.OracleStatement.doExecuteOther(OracleStatement.java:1940)
    at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:2709)
    at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:589)
    at oracle.otnsamples.xmldb.saxloader.examples.DocumentWriter.uploadDocument(DocumentWriter.java:336)
    at oracle.otnsamples.xmldb.saxloader.examples.DocumentWriter.run(DocumentWriter.java:286)

    Thread Writer_000001 completed at 3 mai 2010 22:14:37 CEST
    Thread Writer_000002 completed at 3 mai 2010 22:14:37 CEST
    Active Children = 0
    Documents Queued = 5. Docuuments De-Queued = 2. Documents Written = 2.
    Writer_000001,3 mai 2010 22:14:37 CEST,3 mai 2010 22:14:37 CEST,0,0
    Writer_000002,3 mai 2010 22:14:37 CEST,3 mai 2010 22:14:37 CEST,0,0
    Process exited with exit code 0.
    *************************************************************************************************************************************

    So I wonder what the problem is.

    Could you help please?

    Regards.

  16. doul
    5/3/2010

    Hi everybody,

    thank you Marco for your explanation…I did it eventually…now my XML fragments are right in the database.

    Doul

  17. 5/3/2010

    OK. Would have guessed that the XML Schema was not yet registered in the XDB Repository ’http://otn.oracle.com/xmldb/account.xsd’ via DBMS_XMLSCHEMA

  18. David horton
    4/8/2011

    Rejected by admin – Commercial spam

Comments are closed.