Information Explosion – So What

With the exponential growth of information why are we not paralyzed yet?

IDC had predicted 40-60% Y-O-Y growth of information. This translates to a ten times growth in about 6 years time. Here are some interesting analogies on information explosion from Infographic: The Information Explosion, Forecasts, and the Cloud.

In spite of such growth we still continue to find that piece of information that we are looking for – we still have place to store data, picture, music, and video. We still reply to the important emails and catch up on Facebook. We are still able to afford to store the content within the budget and space constraint.

3 things have saved us! No, cloud storage in not one of them.

Reduction in cost of storage

Storage cost has come down at about the same rate as the rate of increase of the data volume. One seems to have compensated the other. Matthew Komorowski has collected the cost data and has arrived at a formula for cost reduction – see the post A History of Storage Cost and see this post by Volkan Tunalı say that price become half every 14 months.

The reduction works out to about one order of magnitude every 4 years. Here is couple of other documents produced in 1992!

Improvement in search technology

Getting Information off the Internet is like taking drink from a fire hydrant … Mitch Kapor

Thanks primarily to Google, we are able to find our needle in the haystack or we are able to drink comfortably from the fire hydrant!

Today we can move away for from the data classification of structured vs. unstructured to:

  • Machine processable data: Such data is typically stored in a database or in a data warehouse. Computer can process, transform, analyze and aggregate such data. All transactional data fall in this category
  • Machine searchable data: Though machine can read this class of data and index it for people to search, computers cannot directly make sense of the data. Documents (word, excel, pdf etc) and web pages fall into this category. The semantic wave or the web 3.0 movement is aimed at making the searchable data into processable data.
  • Machine opaque data: Most of the audio, image and video data falls in this category. For many years now AI community has been working on trying to make opaque data into searchable data. OCR (optical character recognition) and Google similar images is a result of such effort. Opaque data can also be made partially searchable by adding meta-data to it.

Interestingly, the volume of searchable data is an order of magnitude more than processable data and the volume of opaque data is an order of magnitude more than searchable data. Gartner believes that over the next three years video will become a commonplace content type and interaction model for most users, and by 2013, more than 25 percent of the content that workers see in a day will be dominated by pictures, video or audio. (

Multi-factor knowledge acquisition

We have entered an era where we have adapted our ability to gather bits and pieces of information from multiple sources and build our understanding or find an accurate answer to a question we are asking. The beauty of this process is that the individual pieces of information may be partial and may even be erroneous. Our brain is able to resolve the contradiction and make sense and learn from this cacophony. This is the best way to combine what computers can do well (index and search) with what human can do well (make sense from multiple, incomplete and even contradictory set of information).

This process is like multifactor authentication where more than one piece of information about user identity is used to get more secure authentication.

Some of us follow this process with our own contributing to this cacophony. We go back and publish our understanding. We curate the content and create new content. As per Business Insider – Content Is No Longer King: Curation Is King.

Coming back to cloud storage and for it to remain viable the reduction in the cost of cloud storage needs to be visible.

That is the storage cost should be half every 14 months. However, such trend in not yet visible. I had done an analysis one year back – Cloud Economics – A Platform Comparison. You can use data from that post to do your own calculation.

