Loading XML documents into an Oracle database (1)

Very often people want to load XML documents into an Oracle database so, for example, they can shred the needed values from those XML documents into relational tables.

In most cases, as experienced users know, the fastest way to load data into an Oracle database is via the use of an external table. This is also the case for XML, but how can you set this up? The following explains how the mechanisms work and how to achieve, amongst others automatic shredding of XML content into relational tables. This demo explains the basics, but you can of course completely automate the principles shown.

The sample scripts can be downloaded here: LoadingXMLviaExternalTables

Setting up the environment

For this example I have created an directory called “xml”, under my home directory “/u01/oracle/otn”. The database should be able to read the contents in directory “/u01/oracle/otn/xml” so check that those privileges on (in my case Linux environment) are properly set.

[oracle@localhost otn]$ pwd
/u01/oracle/otn

[oracle@localhost otn]$ ll
totaal 100
-rw-r--r--. 1 oracle oracle   165 jul 18 14:20 sampleData.ctl
-rw-r--r--. 1 oracle oracle 29101 jul 18 14:30 sampleData.dat
-rw-r--r--. 1 oracle oracle  1413 jul 18 14:34 testcase.sql
drwxr-xr-x. 2 oracle oracle 57344 jul 18 14:20 xml
-rw-r--r--. 1 oracle oracle  1646 jul 18 14:40 XT_testcase.sql
 

So on the “top level” /u01/oracle/otn, I have stored my SQL and SQLLoader files. In the directory “xml” (/u01/oracle/otn/xml), I stored my, to be loaded, XML documents.

[oracle@localhost otn]$ pwd
/u01/oracle/otn

[oracle@localhost otn]$ cd xml

[oracle@localhost xml]$ ls -l | head
totaal 3184
-rw-r--r--. 1 oracle oracle 1764 jul 18 14:19 ABANDA-20120407201438314PDT.xml
-rw-r--r--. 1 oracle oracle 1813 jul 18 14:19 ABANDA-20120417140257543PDT.xml
-rw-r--r--. 1 oracle oracle 1671 jul 18 14:19 ABANDA-20120427125050959PDT.xml
-rw-r--r--. 1 oracle oracle 1608 jul 18 14:19 ABULL-20120406132327821PDT.xml
-rw-r--r--. 1 oracle oracle 1646 jul 18 14:19 ABULL-20120406134455141PDT.xml
-rw-r--r--. 1 oracle oracle 2037 jul 18 14:19 ABULL-20120414210318287PDT.xml
-rw-r--r--. 1 oracle oracle 1981 jul 18 14:19 ABULL-2012041616035493PDT.xml
-rw-r--r--. 1 oracle oracle 2020 jul 18 14:19 ABULL-20120417140940985PDT.xml
-rw-r--r--. 1 oracle oracle 1601 jul 18 14:19 ABULL-20120418133607274PDT.xml

[oracle@localhost xml]$ ls -l | wc -l
797

Currently 797 XML documents are stored in this /u01/oracle/otn/xml directory. So far, so good.

A SQLLoader file must be created that contains the paths to the XML documents to be able to load the needed XML files. This can be achieved via the following statement (while your command prompt is in the xml directory that contains the XML documents):

[oracle@localhost xml]$ ls -l *.xml | awk '{print "\/xml\/"$9}' > /u01/oracle/otn/sampleData.dat 2>/dev/null

[oracle@localhost xml]$ cd ..

[oracle@localhost otn]$ cat sampleData.dat | head

/xml/ABANDA-20120407201438314PDT.xml
/xml/ABANDA-20120417140257543PDT.xml
/xml/ABANDA-20120427125050959PDT.xml
/xml/ABULL-20120406132327821PDT.xml
/xml/ABULL-20120406134455141PDT.xml
/xml/ABULL-20120414210318287PDT.xml
/xml/ABULL-2012041616035493PDT.xml
/xml/ABULL-20120417140940985PDT.xml
/xml/ABULL-20120418133607274PDT.xml
/xml/ABULL-20120421201211152PDT.xml

As you can see, the sampleData.dat file lists all XML files starting with “/xml/…”. This will be our fixed starting point for our Oracle Directory alias we will create later on so the “sampleData.dat” file can be used in our external table.

Setting up the user environment

In our database we need a user with the proper privileges, in my case a user called LOADTEST in my pluggable database with the TNSALIAS called “pdb1”.

SQL> grant create any directory, drop any directory, connect, resource, unlimited tablespace, alter session, create view to LOADTEST identified by LOADTEST;

SQL> alter user LOADTEST default tablespace USERS temporary tablespace TEMP;

SQL> connect LOADTEST/LOADTEST@pdb1

SQL> create or replace directory XTDIR as '/u01/oracle/otn';

Creating the external table

Now we have setup the basics, the external table can be created.

SQL> CREATE TABLE XML_XT_DOCUMENTS (
  2    FILENAME VARCHAR2(100),
  3    DOCUMENT CLOB
  4  )
  5  ORGANIZATION EXTERNAL (
  6   TYPE ORACLE_LOADER
  7   DEFAULT DIRECTORY XTDIR
  8   ACCESS PARAMETERS (
  9  	FIELDS (
 10  	  FILENAME CHAR(100)
 11  	)
 12  	COLUMN TRANSFORMS (
 13  	  DOCUMENT FROM lobfile (FILENAME)
 14  	)
 15   )
 16   LOCATION ('sampleData.dat')
 17  )
 18  /

Table created.

Via the XTDIR directory alias the external table is able to find the XML documents specified in the contents of “sampleData.dat”.

SQL> select count(*)
  2    from XML_XT_DOCUMENTS
  3  /

  COUNT(*)
----------
       796

Elapsed: 00:00:00.17

SQL> select *
  2    from XML_XT_DOCUMENTS
  3   where rownum < = 5
  4  /

FILENAME
--------------------------------------------------
DOCUMENT
--------------------------------------------------
/xml/ABANDA-20120407201438314PDT.xml

Shredding XML content into relational columns

The following can now be used to get the “REFERENCE” element value from the XML documents in the external table.

SQL> create table RELATIONAL_TABLE (
  2  	REFERENCE  VARCHAR2(100),
  3  	XML_VALUE  XMLTYPE
  4  )
  5  /

Table created.

Elapsed: 00:00:01.12
SQL> insert into RELATIONAL_TABLE
  2  select XMLCAST(
  3  	      XMLQUERY(
  4  		'/PurchaseOrder/Reference'
  5  		passing DOCUMENT
  6  		returning content
  7  	      )
  8  	      AS VARCHAR2(100)
  9  	   ),
 10  	   DOCUMENT
 11   from (
 12  	      select XMLTYPE(DOCUMENT) DOCUMENT
 13  		from XML_XT_DOCUMENTS
 14  	   )
 15  /

796 rows created.

Elapsed: 00:00:01.18
SQL> commit
  2  /

Commit complete.

The used “select XMLTYPE(DOCUMENT) DOCUMENT from XML_XT_DOCUMENTS” in the from clause will get the XML documents from the external table and feed it via the DOCUMENT column alias into the XMLQUERY bit of the statement via the PASSING clause.

The XPATH expression ‘/PurchaseOrder/Reference’ now gets the needed REFERENCE element values of the XML document input and is transformed via the XMLCAST operator into a VARCHAR2(100) datatype. Now the rest is easy because the content of column REFERENCE in the relational table is accessible as in a standard relational environment via SQL.

SQL> select count(*)
  2    from RELATIONAL_TABLE
  3  /

  COUNT(*)
----------
       796

Elapsed: 00:00:00.02

SQL> select REFERENCE
  2    from RELATIONAL_TABLE
  3   where rownum < = 5
  4  /

REFERENCE
--------------------------------------------------
ABULL-20120421201211152PDT
ABULL-20120423143056875PDT
ABULL-20120430174525177PDT
ACABRIO-20120401201015839PDT
ACABRIO-20120406143307807PDT

Elapsed: 00:00:00.00
Marco Gralike Written by:

One Comment

Comments are closed.