Using Database In-Memory Column Store with Complex Datatypes

The Oracle database 12.1.0.2 version, with the In-Memory option, isn’t yet released, but a lot of detail is already out there since it’s announcement by Mr. Ellison during Oracle OpenWorld 2013. In June this year, there was even another push of Mr. Ellison during a webcast to push the message (read the Slideshare slides here).

While the announcement of “Oracle Big Data SQL” still is fresh in the air, I totally agree with Mr. Mendelsohn, during the webcast, that, as he mentioned, “Big Data is just data“. Although I don’t really acknowledge that the “Big Data SQL” solution might always fit. I see for instance more opportunities for the “Oracle XQuery for Hadoop Big Data” connector solution that was also announced during Oracle OpenWorld 2013.

That, if not only, XQuery with it’s strong base in semi-structured and unstructured data, should fit more into the world of data, datatypes, that handle “Big Data”. Big Data, IMHO, is not a “Big” volume issue, but more a problem regarding data mining and the complete “un-structured-ness” of the data involved. The strong machinery of current times, architecture and cheap hardware, make up some of the solutions to the problem, for example via Hadoop, but I believe the real problem is settled in the “datatype” storage of things, at least the biggest issue with it is.

Within the realm of handling XML, it is all about the proper “storage” (“containerization“) to make access to the needed data smart and applicable. Using SQL or XQuery Big data connectors, that is re-using known technology and knowledge, is a smart way of getting there. Adjusting the data to the needed “storage”, maybe with adding a bit of extra energy adding to the data modelling, so it fits, another part. Although the market has shown that a multipurpose database is probably not really what you might need, it is also a shame to every time throw away all those solutions to problems we have created over decades. Regarding Query Engines, the JSONiq effort is a great example of such a “Don’t reinvent the wheel, but re-use what we have got. That is not only a smart thing to do, it also needs a minimal effort to update the knowledge people already have, in this case re-using XQuery knowledge for JSON.

Although I normally am a strong believer for “use only what you need”-solutions, I celebrate the fact that an Oracle database has so much options. I have the tools in this environment to solve problems, for example in the “XML realm of things”, that aren’t solved yet and/or apply functionality that hasn’t have its origin in the XML world, but for instance in the relational or Java world (which is all support in a Oracle “RDBMS” database). Also methods for backup & recovery, partitioning, parallelism etc, are already in place to be used for (semi-)unstructured data structures like XML. That is one of the reasons why I don’t agree with vendors out there promoting that, for instance, “the native XML storage type” is the only way to go.

At the end of the day customers don’t care (yet) about how you solved it conceptually (“natively”), as long as it is fast, integer (reliable), cheap (ROI) and fits the need of the business. One might run into “Impedance Mismatches” but as long there isn’t a real solution, it will be part of the deal and as long we have part(s) of a decent solution at hand, I think, this is a temporal concession will have to make, to actually make it right (parts of the first steps on our journey). The right technical architectural approach will prove its point, on its own, when you have met the challenge and solved it.

Using the proper “store” maters, even with “fast” amounts of data. In the “old days” (10 years back), handling a 10 Mb XML document was a performance nightmare, nowadays a fiddle around and test at home with the full Wikipedia XML dumpfile (around 2012 the English set was a whopping 40+ GB’s in size). So in all “Big Data” might not be “that big” (in all its facets) when seeing it in (time) context.

The Oracle Database, version 12.1.0.2, adds the “In Memory Column Store” to the mix. This added new options to attacking datatype problems that might be multidimensional in nature. But more on this after this first post of series on which those might help solving another part of the problem and after the NDA on version 12.1.0.2 has lifted and freely downloadable…

😉