When you read about business intelligence these days, you almost inevitably end up reading about Big Data, Self-Service BI, Google, Facebook or Amazon. The authors of most of these articles, in turn, take it upon themselves to grumble about the limits of relational databases and how traditional business intelligence systems have failed to live up to their expectations.
It is normally around that point in the article where the author proclaims with great fervour, much like one would the second coming of the Messiah, the age of NoSQL and Big Data Analytics. Does this mean that we have been going around storing our data and building business intelligence the wrong way all these years?
Definitely not.
Concepts like Big Data and NoSQL (which is short for “not only SQL”) collided with the tidy, structured and indexed world of SQL databases a few years ago and the fallout has been growing with every public success story – LinkedIn, Netflix, Facebook – the list goes on.
The rising interest in Big Data and NoSQL surely stems from the fact that mankind is generating more data than ever before. In fact, 90% of the world’s data was generated in the past two years. Most of this data does not come in the form of neatly-formatted tabular data. It comes as web logs, videos, pictures and so on. Our businesses use more data than ever before, in forms that are far more complex than what they are traditionally used to, and they need the data faster than ever before.
SQL is based on set theory, which makes it effective (and in most times, efficient) for a number of applications and datasets. In today’s world, some datasets are too large to be efficiently handled this way, and some too complex. Ill-fitting solutions could come at the cost of scalability or performance. This is exactly what the growing family of NoSQL databases is here to resolve.
NoSQL databases come in many shapes and forms and cannot always be applied in the same way that SQL databases can. Those of us who are new to the 150-strong NoSQL family can find themselves easily stifled by the slew of new technologies (and the funny names that accompany them). To make matters easier, here is a little breakdown of what they’re all about. Before hopping into that it would be good to state the main purpose (and definition) of NoSQL technology: if SQL technology stores tabular data, NoSQL technology deals with non-tabular data.
At present, the NoSQL family is made up of a host of products based on the following technologies:
- Key-Value Pair Databases
- Wide-Column Store Databases
- Document Databases
- Graph Databases
Key-Value Pair Databases
Key-Value Pair (KV) databases store entities as pairs of attribute names and data. The kind of data stored can range from a simple value (ex: 13) to a complex object (ex: the content and metadata of a Facebook post). KV databases vary with each product, but on the whole they are aimed at being highly scalable, fast and adaptable.
- Notable example: Redis
- Notable client: Twitter
Wide-Column Store Databases
Wide-Column Store databases build upon KV databases by grouping data, thereby organising and storing data as a series of columns (much like relational databases store data as a series of rows) and super-columns. Storing data this way allows data to be queried faster when only a few columns are required for a large number of rows. This allows different columns to be stored on different machines, allowing a table to virtually exist across a cluster.
- Notable example: Apache Cassandra
- Notable client: GoDaddy
Document Databases
Document databases store data as separate objects in a database, with each object (or document) containing a set of attributes. The effect of this approach is that all the data related to an entity is stored in a single object, rather than across a number of tables, thus eliminating the need to write complex queries. Each object is not bound by a predefined schema, which allows for very flexible storage.
- Notable example: MongoDB
- Notable client: eBay
Graph Databases
Graph databases store data as a series of interconnected nodes, forming a network of relationships. Each node represents a record and each relationship carries data. Graph databases are used when the relationship between entities bears informational value, and not just structural value. These databases are best used to store complex, non-tabular and interconnected data. A popular use of graph databases is powering recommendation engines.
- Notable example: Neo4J
- Notable client: T-Mobile
The complex problems that we face today force us to consider options that are faster, more complex or more scalable than traditional relational databases. In some cases, the analysis and processing of NoSQL data tends to produce tabular results which, as you may have guessed, are then stored in SQL databases. NoSQL is not, therefore, the solution to SQL. It is an extension of our current technology that will keep us operating in a world drenched in complex data.