Agile Data Warehouse with Dat-a-MazeHouse

Posted by

About Data Warehouse

Data warehouse architecture consists of a series of storages and Extract-Transform-Load (ETL) between those. Data is continuously extracted from source, then transformed to yet in another format and loaded in a new place. In practice, to provide clean data for actual analytics or data discovery, takes a significant effort from a team of IT people, knowledge workers, domain experts, data modelers, DW experts; typically, bootstrapping a DW takes some 6 months and only thereafter the cost actually takes off, as answers will beg for more questions.

Current mainstream DW methods are not agile; from a business value perspective – the time the answer is finally delivered – the question is often obsolete.

About Agile Data Warehouse

The main reason that today’s mainstream DW’s aren’t supporting agile analytics is their reliance on relational database technology. Regardless if data warehouse (the Data Vault) is built using dimensional or normalized relational schemes – it always requires a schema to exist before the data can be loaded to database.

The new trend in IT industry is to replace relational databases and SQL, as their query language is not quite a right tool for highly volatile data, with NoSQL databases. These are mainly supposed to be “schema-less”. That means that data can be pushed in to the database without defining a rigid schema first. Some are allowing one to define an entirely proprietary meta-structure. NoSQL databases however solve one problem and create another one. There are no standards. There is no compatible NoSQL language. And more importantly, anarchy has never worked out in any context. Freedom and anarchy are totally different things. We can talk about “schema-less” database and storage but then it also means nobody is ever going to understand what the data means and how to query it (apart of the original publisher perhaps). So rather talking about “schema-less” – it is more appropriate to differ between “schema-first” and “schema-last” data management technologies. While relational databases are “schema-first” – in order to support agile analytics on top of Data Warehouses, the DW’s would greatly benefit if the Data Vault and Marts are “schema-last”. In order to transform data warehouse into an agile discipline, what is needed is to enable analytic projects to scale and run in parallel. Each knowledge worker must be able to express her own view of the data independent of others, and more importantly without getting stuck in coordination activities. Agile DW does not eliminate the benefits or opportunity of team work, or usage of shared models and vocabularies. But converging can happen naturally and as a consequence of realization of proven good stuff, and not as a prerequisite to get going at all.

About Dat-a-MazeHouse

Vinge Free’s product encapsulates experience and R&D thus solving data warehouse and data analysis problems that hinder these disciplines to perform agile. Our patent pending algorithms and methods are united into a data integration and migration product that is able to unify data from heterogeneous sources. Our platform provides the user, for a specific question or hypothesis the user wants to test, with the means to search, clean and align data; the user may infer infer implicit facts, shortcut through and exclude unnecessary information. At the end of an analytic cycle the user may query a well-formed dataset that could then be fed into any data analysis tool. One important point is that the search for an answer may fail, as the user’s data set may not contain any value pertinent to the user’s question or that the posed hypothesis is simply false. In such a case, our platform helps users fail fast and cheap.

The entire analytic cycle we support is agile as a consequence of the fact that users may run their own race independent of each other and so being able to concentrate on their own particular concerns without the need for company wide coordination efforts.