HOWTO: Building a JSON Database API (1)

When learning new things, I always try to apply it to a hopefully useful and practical idea. Of course the fun part of the project is important as well. This not only makes the process more realistic, but you will encounter also more realistic problems and as a result the learning process is richer. In all I tried to apply some new database features and in the process found some new “features”, of which one is/was a documentation addition that will be in the next revision of the manual.

So to pick up working with, after a while already again, the new native Oracle database 12.1.0.2 JSON functionality, I set out to create a database process that should be able to “grab” content of a JSON API somewhere on the Internet, pull it into the database, store the JSON data, create views on top of the JSON content and index it.

Among others I used 12c:

And although not really 12c, you should revisit the content on Fine-Grained Access to Network Services as well, due to the fact that we are using HTTPURITYPE to scrape content outside the database. Access must be granted to avoid an ORA-24247 error (“Network access denied by access control list”).

Finding a JSON service

Finding a decent JSON service wasn’t that easy. Most, like twitter, need an authentication somehow and I wasn’t in for a lot of issues before even getting started. In the end, with a bit of searching, website “Programmable Web” made it easy. One of my favorites, and for me with well known content structures, seemed to have a (MediaWiki based) API build in: Wikipedia.

I like the content of Wikipedia, if not only anyone knows about it and I don’t have to explain “what it is”. If not only, I like the man made content which faces me always with new challenges and therefore in its own right good content to test with.

It also comes with a sandbox to try out some of the examples given: API Sandbox

I pinpointed on JSON content related to pictures, have a go at the following (click the links and see its output in your browser):

Add the JSONView plugin for Firefox or Chrome via jsonview.com if you want to see the outcome in Pretty Printed JSON.

In the end I used API URL’s, referencing pictures, like the following:

Notice the outcome from the last two, which actually reference each other.

In the end, due to Unicode issues with method getClob() of HTTPURITYPE, I needed a workaround and had to switch to the method getBlob(). These Unicode character references in Obsesion.jpg and Obsesión_(Daniela_Castillo_album).jpg broke my initial code. Although the control API characters can easily be escaped via URL Encoded character references, it actually took me a while for finding this encoding bug in httpuritype.getclob().

In the following sections I will explain JSON solutions based on both storage JSON solutions, that is CLOB and BLOB, if not only I learned a lot while implementing them. The final script with the code, which can be downloaded via the download link in the final section of these series, will be based on the BLOB implementation.

Scraping JSON content

As you might have noticed in the Obsesion.jpg and Obsesión_(Daniela_Castillo_album).jpg content,

  1. The URL parameter &aifrom=Obsesion.jpg has a JSON reference, points to via “aicontinue, that is the next picture “Obsesión_(Daniela_Castillo_album).jpg
  2. Switch the value Obsesion.jpg to Obsesión_(Daniela_Castillo_album).jpg in the URL and you get the next picture in JSON array section allimages of aicontinue.
  3. Switch the value Obsesión_(Daniela_Castillo_album).jpg to Obsesión_(Miguel_Mateos_album).jpg in the URL and you get the next picture in JSON array section allimages of aicontinue.
  4. End so forth and so forth…

In pictures outcome step 1:

In pictures outcome step 2:

In pictures outcome step 3:

So in short, a method to start with a value and then get every consecutive picture until you stop the process.

Via method HTTPURITYPE you can do the same as in your browser via

HTTPURITYPE.getClob()

You will notice, if you execute the statement via, for example, SQL*Plus or SQL*Developer, that the &-sign gets in the way because it is also used as a method to ask for bind variables.

  • To disable this behavior in SQL*Plus you can temporarily define a different (unlikely used) character, for instance, via using:  set define = “!”.
  • To disable this behavior in SQL*Developer you can disable this feature via using: set define off

So now knowing how to get the content, its time to create some control structures and needed storage tables.

JSON Storage clauses and JSON Constraints

I started with a table that should hold some configuration parameters and control values for my “SCRAPER” PL/SQL package.

--------------------------------------------------------
--  DDL for Table JSON_SCRAPER_CONFIG
--------------------------------------------------------
  DROP TABLE "JSON_SCRAPER_CONFIG" PURGE;
  --
  CREATE TABLE "JSON_SCRAPER_CONFIG" 
   (	"ID" NUMBER GENERATED ALWAYS 
             AS IDENTITY 
			 MINVALUE 1 
			 MAXVALUE 9999999999999999999999999999 
			 INCREMENT BY 1 
			 START WITH 1 
			 CACHE 20 
			 NOORDER  
			 NOCYCLE 
                         PRIMARY KEY, 
	"URL_JSON_API" VARCHAR2(4000), 
	"URL_PARAMETER_STATIC" VARCHAR2(4000),
        "URL_PARAMETER_DYNAMIC" VARCHAR2(4000), 
	"TARGET_TABLE" VARCHAR2(30), 
	"TARGET_TABLE_SCHEMA" VARCHAR2(30), 
	"STARTED" VARCHAR2(10) DEFAULT 'FALSE' 
   ) SEGMENT CREATION IMMEDIATE 
  TABLESPACE "USERS" ;
--------------------------------------------------------

The notion of this table and its contents here was that I wanted a column for:

  • ID column, functioning as primary key and with automatically generated values via the identity column functionality.
  • URL_JSON_API column, used for the base URL/URI used by the called JSON API PL/SQL code.
  • URL_PARAMETER_STATIC column, used to define the static parameter URL portion used with the JSON API PL/SQL code.
  • URL_PARAMETER_DYNAMIC column, used for fix point, the value, of the URL that dynamically had to be changed.
  • TARGET_TABLE column, to be used to store the name of the JSON content table used in the JSON API PL/SQL code.
  • TARGET_TABLE_SCHEMA column, to be used for to store the name of the schema owner of the JSON content table.
  • STARTED column, storing a value that would act as a semaphore value to show the process status but also to start or stop the process.

And I needed a simple table for storing the fetched JSON content. Initially it was based on HttpUriType.getClob() which would need the following DDL statement:

--------------------------------------------------------
--  DDL for Table JSON_SCRAPED_CONTENT
--------------------------------------------------------
  DROP TABLE "JSON_SCRAPED_CONTENT" PURGE;
  --
  CREATE TABLE "JSON_SCRAPED_CONTENT" 
   ( "ID"   NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY ,
     "JSON" CLOB, 
     CONSTRAINT "VALIDATE_JSON" CHECK (JSON IS JSON)  
   )
   LOB ("JSON") STORE AS SECUREFILE "SEG_JSON_SCRAPED_CONTENT";
--------------------------------------------------------

But as mentioned I had to switch to BLOB content, using HttpUriType.getClob() which results in the following DDL statement:

--------------------------------------------------------
--  DDL for Table JSON_SCRAPED_CONTENT
--------------------------------------------------------
  DROP TABLE "JSON_SCRAPED_CONTENT" PURGE;
  --
  CREATE TABLE "JSON_SCRAPED_CONTENT" 
   ( "ID"   NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY ,
     "JSON" BLOB, 
     CONSTRAINT "VALIDATE_JSON" CHECK ("JSON" IS JSON FORMAT JSON)
   )
   LOB ("JSON") STORE AS SECUREFILE "SEG_JSON_SCRAPED_CONTENT";
--------------------------------------------------------

Although the CLOB based JSON DDL statement is more or less obvious and can be found in the manual or in blog posts, the BLOB based JSON DDL statement was not. If you would simply replace CLOB with BLOB in the first DDL statement you will encounter the following error “ORA-40499: no format defined for binary data type”

  CREATE TABLE "JSON_SCRAPED_CONTENT" 
   ( "ID"   NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY ,
     "JSON" BLOB, 
     CONSTRAINT "VALIDATE_JSON" CHECK (JSON IS JSON)  
   )
   LOB ("JSON") STORE AS SECUREFILE "SEG_JSON_SCRAPED_CONTENT";
--
Error at Command Line:Column:
Error report:
SQL Error: ORA-40499: no format defined for binary data type
--

As mentioned before, there is (not yet) a documented solution regarding the DDL syntax, but with a bit of guessing I came up with the constraint section as…

--  
  CONSTRAINT "VALIDATE_JSON" CHECK ("JSON" IS JSON FORMAT JSON)
--

Maybe a bit confusing to call the column also JSON but here the important part is in the last section of (“JSON” is JSON FORMAT JSON).

Regarding the string

(“JSON” is JSON FORMAT JSON)

  • “JSON”  – is the column name
  • is JSON – is the constraint forcing LAX JSON validation
  • FORMAT JSON – is defining that the BLOB is containing JSON formatted content

Apparently the documentation enhancement was already known. In the near future the documentation will mention something like:

When JSON data is stored in a BLOB column you must use keywords FORMAT JSON in queries that use Oracle SQL functions or conditions for JSON (json_value, json_query, json_table, json_exists), to declare that the data is JSON. Otherwise, an error is raised, letting you know that the JSON input data is binary and you have not specified its format.

 You can use the FORMAT and DATA_TYPE columns in views USER | ALL | DBA_JSON_COLUMNS to check if the BLOB content has value TEXT (for JSON) while using the shown syntax.

What I have already learned in these first few steps was

  • The appropriate BLOB “FORMAT JSON” syntax in the constraint call
  • That you could address a value for a securefile (which was unknown for me)
    SECUREFILE "SEG_JSON_SCRAPED_CONTENT"

Next time I will discuss how to create views displaying JSON BLOB content, the needed PL/SQL JSON “Sraper” API code & a few statements to avoid the ORA-24247 error…

Marco Gralike Written by: