Big Data – Is it a solution in search of a problem?
So, what is big data? Is it the next path breaking technology which will change everything or is it just a hype which will die down after sometime?
Let us take a realistic look at what the term big data mean and what problem it can solve.
What is “Big Data”?
Here is a short explanation.
Big Data is the name given to the classes of technologies that needs to be used when your data volume becomes so much that the RDBMS technologies can no longer handle it.
Big data spans three dimensions (taken from this article of IBM):
- Variety – Big data extends beyond structured data, including unstructured data of all varieties: text, audio, video, click streams, log files and more.
- Velocity – Often time-sensitive, big data must be used as it is streaming in to the enterprise in order to maximize its value to the business.
- Volume – Big data comes in one size: large. Enterprises are awash with data, easily amassing terabytes and even petabytes of information.
In short – if your data volume can be handled efficiently by RDBMS you NEED NOT worry about Big Data.
How did it all start?
With the advent of cloud computing which provided easy access to massive amount distributed computing power there was a realization RDBMS cannot be effectively parallelized. In fact CAP theorem states that Consistency, Availability & Partition Tolerance cannot simultaneously be guaranteed. This led to a No-SQL movement and multiple non-relational databases sprang up.
Trigger Point of Big Data happened when Google published the paper on the “Map-Reduce” algorithm. It involves processing of highly distributable problems across huge datasets using a large number of computers. Map-Reduce is at the heart of Google’s search engine.
In short: Big Data requires large DISTRIBUTED processing power.
Why would you want to process so much data?
There are 3 basic assumptions which are driving the big data movement:
- Faster analysis of larger operational data will help you make better decision
- More in-depth analysis of customer data will guide you to better customer segmentation
- Insight into larger data set will help you come up with innovative product design
Companies that have successfully leveraged this are Google, Facebook, Amazon, Walmart, Yahoo etc.
In short – the ASSUMPTION is that more data and faster analytics will lead to more innovation and better decision making.
3 Prerequisites for leveraging Big Data
Let us assume that your data volume is large enough and you have access to enough distributed processing power. Will that be sufficient for you to venture into big data?
No … you need three more things.
- Business problem which you think that the data at your disposal can help to resolve
- Set of questions to be answered through data analysis
- Algorithm to analyze the data – this is the domain of the new field Data Science
Big Data will be useful only if you are equipped with all these.
Therefore, for most of us, Big Data is a solution which is in search of a problem.