Big Data – Is it a solution in search of a problem?


If you look at the predictions made for 2012, you will find a new entry which was not there last year. Be it Gartner, Forrester or McKenzie  – “Big Data” finds a place in the prediction.

So, what is big data? Is it the next path breaking technology which will change everything or is it just a hype which will die down after sometime?

Let us take a realistic look at what the term big data mean and what problem it can solve.

What is “Big Data”?

(The Wikipedia page on Big Data is not that good. The clearest explanation I have found is from O’Reilly Radar – here is the link)

Here is a short explanation.

Big Data is the name given to the classes of technologies that needs to be used when your data volume becomes so much that the RDBMS technologies can no longer handle it.

Big data spans three dimensions (taken from this article of IBM):

  • Variety – Big data extends beyond structured data, including unstructured data of all varieties: text, audio, video, click streams, log files and more.
  • Velocity – Often time-sensitive, big data must be used as it is streaming in to the enterprise in order to maximize its value to the business.
  • Volume – Big data comes in one size: large. Enterprises are awash with data, easily amassing terabytes and even petabytes of information.

In short – if your data volume can be handled efficiently by RDBMS you NEED NOT worry about Big Data.

How did it all start?

With the advent of cloud computing which provided easy access to massive amount distributed computing power there was a realization RDBMS cannot be effectively parallelized. In fact CAP theorem states that Consistency, Availability & Partition Tolerance cannot simultaneously be guaranteed. This led to a No-SQL movement and multiple non-relational databases sprang up.

Trigger Point of Big Data happened when Google published the paper on the “Map-Reduce” algorithm. It involves processing of highly distributable problems across huge datasets using a large number of computers. Map-Reduce is at the heart of Google’s search engine.

Takeoff happened when Apache open source “Hadoop” project which created its own implementation of Map-Reduce. The largest Hadoop implementation is probably at Facebook.

In short: Big Data requires large DISTRIBUTED processing power.

Why would you want to process so much data?

There are 3 basic assumptions which are driving the big data movement:

  1. Faster analysis of larger operational data will help you make better decision
  2. More in-depth analysis of customer data will guide you to better customer segmentation
  3. Insight into larger data set will help you come up with innovative product design

Companies that have successfully leveraged this are Google, Facebook, Amazon, Walmart, Yahoo etc.

In short – the ASSUMPTION is that more data and faster analytics will lead to more innovation and better decision making.

3 Prerequisites for leveraging Big Data

Let us assume that your data volume is large enough and you have access to enough distributed processing power. Will that be sufficient for you to venture into big data?

No … you need three more things.

  1. Business problem which you think that the data at your disposal can help to resolve
  2. Set of questions to be answered through data analysis
  3. Algorithm to analyze the data – this is the domain of the new field Data Science

Big Data will be useful only if you are equipped with all these.

Therefore, for most of us, Big Data is a solution which is in search of a problem.

Related Articles

Comments
13 Responses to “Big Data – Is it a solution in search of a problem?”
  1. Many thanks for sharing this fantastic web-site.

  2. pahariayogi says:

    Sound like Data Mining & OLAP Tool? Is ‘Big Data’ not extension of data mining?

    • Udayan Banerjee says:

      Big Data is not a single technology. Like most technology terms floating around it may not have a clear definition. So, you will be both right and wrong if you say that it is an extension to data mining!

      • Hello Udayan, I am revisiting this post after almost 2 yrs.

        I agree, Big data analytics is an over-hyped, poorly-defined and over-used term. Despite that, and despite the challenges outlined above, I believe that for many businesses, the opportunities presented by the big data revolution are as significant and fundamental as those presented by e-commerce 15 years ago. Large data rich companies (particularly retailers) should be bold and determined in reacting to these challenges.

        A lot of Indian IT service providers are gearing up to occupy a major pie in this space. Infy, for instance, is busy doing POC for few large US accounts and also building proprietary products in parallel around Hadoop platforms.

        I think, It could be next growth engine and big opportunity for NTL in chosen verticals (TTL, BFSI, Retail, Health Care etc)

  3. Murali Narayanamurthy says:

    Why is Big Data viewed as Big Problem? In reality Big Data means Big Opportunity…All that data can be put to use to leverage and give you Knowledge discovery, Insight and Predictive Analytics capabilities for quick Decision Making and efficiency within your enterprise or organization.
    Consuming Big data, be it structured, un-structured or semi-structured in totality and to recognize all natural patterns within, would be a solution to surf over the issue of chaotic data and multiple disparate data sources. Our research labs at Xurmo have invented an “Intelligent Information Fabric” which precisely does this and creates a pattern store (ever learning) for enterprises. One can then slice and dice this fabric to leverage all that one wants to, be it KD, Insight or predictive Analytics. If anyone is interested to know more, we can connect offline in this regard.

    Udayan, we can discuss more offline…n_murali@lycos.com

Trackbacks
Check out what others are saying...
  1. […] Big Data – Is it a solution in search of a problem? […]

  2. […] existing education research for the allure of learning analytics, which have been characterized as a solution in search of a problem.  Prior education reform based on such analytics and standardizations have seen reformers stand […]

  3. […] existing education research for the allure of learning analytics, which have been characterized as a solution in search of a problem.  Prior education reform based on such analytics and standardizations have seen reformers stand […]

  4. […] It feels to me as if these are the first steps of a strategy or methodology for Big Data: after learning about the generics of Big Data and reading up on the theoretical potential, it’s now time to turn inwards and consider the applicability of Big Data in your organization. In the phrasing is mention of the ‘problem’, but you can easily substitute ‘opportunity’ of course. Link.  […]

  5. […] still think “Big Data” is a solution is search of a problem. We tend to assume that since there is so much data available there must be an underlying pattern […]



Leave a comment