Quering RSS Feeds The XMLDB Way

Actually this is old stuff (2006), but it got lost in a comment section. I think this can still be useful to some and I also post this here for prosperity.

Somewhere in 2006 my colleague Lucas Jellema wrote a post on the AMIS Technology site about querying rss feeds from the database. My colleague Anton Scheffer and I commented on that article with our XMLDB functionality mindset. Later on in 2007 Lucas wrote another useful post called “Querying RSS feeds in SQL…“. Peter Wolf commented that he had also written a very nice blog post about using XMLDB functionality while Integrating Yahoo Pipes into APEX.

I think this is still useful stuff to a lot of us, although you should keep in mind that the table(xmlsequence(extract())) construct will be, in time, out lived by the XMLTABLE function. Also XMLTABLE supports XPath V2, the table(xmlsequence(extract())) doesn’t. The XMLTABLE function is available from Oracle database version 10.2.

I reconstructed my statement in the comment section from the post in 2006 and “solved” the small puzzle that I couldn’t solve at that time. Of course, especially via XQuery, there are more good solutions, but the following is a first step.

Anton’s example that is still very useful from database version 9.2.0.3 up to 10.1 (if not only remembering the “old” Oracle 9.2 syntax).

SQL> SELECT VALUE( i ).EXTRACT( '/item/dc:creator/text()'
  2                           , 'xmlns:dc="http://purl.org/dc/elements/1.1/"' ).getstringval() creator 
  3  FROM   TABLE( XMLSequence( HTTPURITYPE( 'http://technology.amis.nl/blog?feed=rss2' ).getXML().EXTRACT('//item') ) ) i 
  4  ;
 
CREATOR
--------------------------------------------------
Marco Gralike
Lucas Jellema
Lucas Jellema
Aino Andriessen
Lucas Jellema
Lucas Jellema
Lucas Jellema
Lucas Jellema
Lucas Jellema
Michiel Jonkers
 
10 ROWS selected.

Although my feeling is here that you should go for a extract(xmltype) construct instead of xmltype.extract (the latter is actively discouraged by Oracle).

The advantage of my statement in 2006 is that, performance wise, it is smart to solve as much problems as possible in the XPath section before producing XML fragments, that have to be handled via the EXTRACT function. Most of the time XML fragments will be dealt with in memory and are very resource intensive regarding CPU and memory.

SQL> SELECT * 
  2  FROM xmltable('//item' 
  3                 passing HTTPURITYPE('http://www.liberidu.com/blog/?feed=rss2').getXML() 
  4                 COLUMNS title            varchar2(4000) path '/item/title/text()', 
  5                         link             varchar2(4000) path '/item/link/text()', 
  6                         publication_date varchar2(4000) path '/item/pubDate/text()', 
  7                         creator          varchar2(4000) path '/item/*[namespace-uri()="http://purl.org/dc/elements/1.1/" 
  8                                                                      and local-name()="creator"]/text()', 
  9                         description      varchar2(4000) path '/item/description/text()',
 10                         category         XMLTYPE path '/item/category/text()' 
 11  );
 
TITLE
--------------------------------------------------------------------------------
LINK
--------------------------------------------------------------------------------
PUBLICATION_DATE
--------------------------------------------------------------------------------
CREATOR
--------------------------------------------------------------------------------
DESCRIPTION
--------------------------------------------------------------------------------
CATEGORY
--------------------------------------------------------------------------------
Quering RSS Feeds The XMLDB Way
http://feeds.feedburner.com/~r/Bloggralikecom/~3/319836447/
Wed, 25 Jun 2008 16:47:19 +0000
Marco Gralike
Actually this IS OLD stuff (2006), but it got lost IN a comment SECTION. I think
 this can still be useful TO SOME AND I also post this here FOR prosperity.
Somewhere IN 2006 my colleague Lucas Jellema wrote a post ON the AMIS Technology
 site about querying rss feeds FROM the DATABASE. My colleague [...]<img src="ht
tp://feeds.feedburner.com/~r/Bloggralikecom/~4/319836447" height="1" width="1"/>
< ![CDATA[11g]]>< ![CDATA[General]]>< ![CDATA[Howto]]>< ![CDATA[Oracle]]>< ![CDATA[RD
BMS]]>< ![CDATA[XMLDB]]>< ![CDATA[XSQL, XPath, XQuery]]>< ![CDATA[httpuritype]]>< ![
CDATA[rss]]>< ![CDATA[xmlnamespaces]]>< ![CDATA[xmltable]]>< ![CDATA[Xpath]]>< ![CDA
TA[XQuery]]>
 
Wordpress LightBox Plugin Trouble - Part II
http://feeds.feedburner.com/~r/Bloggralikecom/~3/319645225/
Wed, 25 Jun 2008 11:42:53 +0000
Marco Gralike
I just couldn&#8217;t find it. I searched a lot ON the internet but couldn&#8217
;t find a solution FOR my problems described IN my post &#8220;Wordpress Plugin
Trouble&#8220;. So hopefully am rid OF it now, while switching back TO the OLD v
ersion off Lightbox JS v2.2. Hopefully this won&#8217;t cause any issues anymore
 IN Windows Internet [...]<img src="http://feeds.feedburner.com/~r/Bloggralikeco
m/~4/319645225" height="1" width="1"/>
< ![CDATA[General]]>
 
...
.etc.

You can clean it a little bit up by addressing the namespace reference issue via the following syntax:

SQL> SELECT *
  2  FROM xmltable(XMLNAMESPACES('http://purl.org/dc/elements/1.1/' AS "MGR"),
  3                 '//item'
  4                 passing HTTPURITYPE('http://www.liberidu.com/blog/?feed=rss2').getXML()
  5                 COLUMNS title            varchar2(50) path '/item/title/text()',
  6                         link             varchar2(50) path '/item/link/text()',
  7                         publication_date varchar2(50) path '/item/pubDate/text()',
  8                         creator          varchar2(50) path '/item/MGR:creator/text()',
  9                         description      varchar2(250) path '/item/description/text()',
 10                         category         XMLTYPE      path '/item/category/text()'
 11  );
 
TITLE
--------------------------------------------------
LINK
--------------------------------------------------
PUBLICATION_DATE
--------------------------------------------------
CREATOR
--------------------------------------------------
DESCRIPTION
--------------------------------------------------------------------------------
CATEGORY
--------------------------------------------------------------------------------
Quering RSS Feeds The XMLDB Way
http://feeds.feedburner.com/~r/Bloggralikecom/~3/3
Wed, 25 Jun 2008 16:47:19 +0000
Marco Gralike
Actually this IS OLD stuff (2006), but it got lost IN a comment SECTION. I think
 this can still be useful TO SOME AND I also post this here FOR prosperity.
Somewhere IN 2006 my colleague Lucas Jellema wrote a post ON the AMIS Technology
 site about q
< ![CDATA[11g]]>< ![CDATA[General]]>< ![CDATA[Howto]]>< ![CDATA[Oracle]]>< ![CDATA[RD
BMS]]>< ![CDATA[XMLDB]]>< ![CDATA[XSQL, XPath, XQuery]]>< ![CDATA[httpuritype]]>< ![
CDATA[rss]]>< ![CDATA[xmlnamespaces]]>< ![CDATA[xmltable]]>< ![CDATA[Xpath]]>< ![CDA
TA[XQuery]]>
 
Wordpress LightBox Plugin Trouble - Part II
http://feeds.feedburner.com/~r/Bloggralikecom/~3/3
Wed, 25 Jun 2008 11:42:53 +0000
Marco Gralike
I just couldn&#8217;t find it. I searched a lot ON the internet but couldn&#8217
;t find a solution FOR my problems described IN my post &#8220;Wordpress Plugin
Trouble&#8220;. So hopefully am rid OF it now, while switching back TO the OLD v
ersion off
< ![CDATA[General]]>
 
Farwell TO One OF the Best Goalkeepers We Ever Had
http://feeds.feedburner.com/~r/Bloggralikecom/~3/3
Sat, 21 Jun 2008 22:31:09 +0000
Marco Gralike
Edwin van der Sar LAST championship game has been played. Again AS &#8220;Man OF
 the Match&#8221; regarding the Dutch team; brilliant AND needed saves ALL OVER
the place during the game against Russia. His FIRST championship MATCH was IN 19
95 AND unt
< ![CDATA[General]]>

HTH

8-)