The Blind Men and the Elephant
If you are associated with IT in any capacity, you must have heard the term Hadoop. A term that’s still a mystery to many. I have found IT leaders to wonder, is Hadoop appropriate for our environment, what really Big Data and Hadoop are all about, do we even need it. The reason is, it has various definitions in the available limited material, mostly based on individual’s own interpretation and understanding. It reminds me an ancient story from Indian subcontinent “the blind men and the elephant,” that illustrates how perception can be misleading and risky, where people perceive only what they hear or see and based their decisions on that.
As part of Yahoo when Hadoop was being developed and one of the internal early adopters of technology when it was hot from the oven to move Yahoo’s own internal system to Hadoop, and experiencing so many variations first hand, I came across explanations, that helped me to understand the meaning and the need of the real technology behind these hottest buzzwords, a technology adopted in many industries today and making a difference.
Finding a right direction:
Things have been evolving to fit the need of the time. All aimed to minimize the turnaround time and get most of the work done by consuming as much data as possible. Computers have also been evolving to compute the complex logics by crunching lots of data. We were busy to develop very strong and powerful CPUs to solve very complex problems, by only concentrating on improving instructions and commands and provided very limited data, solving only one half of the riddle. With this limited data we have been expecting these machines to think and be smart, until we realized, that it is not only CPU that would make these machines smart, it is in fact the data, a lot of data, that could make these machines think outside the box.
An information explosion:
These machines were still dumb and dependent on the commands of their masters, it is the data that makes these machines being intelligent. As the joke goes “Data is like people – interrogate it hard enough and it will tell you everything you want to know”. The more the information, the smart these machines get. Voila!!! We found a direction. As we learned that intelligence is basically information driven, we started to collect a lot of information, and need of Big Data was realized. We discovered that if we have information then even commodity machines can do complex jobs. We collected information through different sources like social media, manual entries, emails, ecommerce, etc. Now we have information, but how to convert this information into actionable intelligence. That is where Hadoop comes into a picture. A framework, an open source platform that is fault tolerant, scalable and process large datasets across cluster of machines. A complete ecosystem to transform this massive data into an actionable intelligence by running thousands of commodity machines to carry out computation of data distributed on those machines, in parallel. That gives businesses a competitive edge in the market, more targeted contents to customer, analysis to help executives make informed business decisions, insights that lead corporates to profitability.
Hadoop enables the executives to properly see what future holds rather than finding out what happened in the past, a move from reacting on reports to be proactive on predictive analysis, results into an improved performance, reduction in cost, customer satisfaction and increase in revenue.
“An IBM study found that more than four out of five CIOs (83 percent) see business intelligence and analytics as top priorities for their businesses as they seek ways to act upon the growing amounts of data that are now at their disposal”.
When something gets this powerful, it generates a lot of perplexity around it. This misconception caused many to make wrong decisions, some times being in pressure to be part of industry trend, without doing a proper homework to understand what they are getting into, and rushing to replace existing conventional solutions with “cool” latest technology, where existing solution may be well suited for the job.
I have seen people confusing Hadoop with Big Data, it is not, Big Data refers to a large volume of structured and unstructured data and Hadoop framework uses that data along with other technologies to extract the intelligence out of it.
Hadoop is also not a database, as many thought of it being one. Another misconception that I encountered so often is, confusing Hadoop with NoSQL. Again NoSQL is not Hadoop, but highly scalable, distributed, schema-less database that supplement Hadoop to create massive data stores and data warehouses.
Packaging of Hadoop:
Hadoop continuously growing and reaching its heights, it has many ups and downs in this expedition. In fact, Hadoop 2.x has come a long way and overcomes many of the shortcomings of the earlier version. As the technology started to pickup, companies found business opportunities to package this technology to offer commercially as a turnkey solution with their own flavors added to their distributions. It is recommended to go with one of these distribution rather than taking up a challenge of experimenting with the open source raw version. These distributions are carefully packaged with stable, tested versions of all the required tools by these vendors.
Cloudera, the earliest vendor to jump the bandwagon, employed Doug Cutting, a founder of Hadoop, also has a strong backing of Intel with funding and technology. Intel is among the few who realized the value of technology very early. Another most adopted distribution is from Hortonworks, a spin off from Yahoo, with many original authors of Hadoop, selected by Microsoft to be used with Window’s Azure, also supported by IBM.
The Bottom Line:
Good understanding of the technology and proper due diligence can help executives to take off the blindfold, understand the over all technology and break out of limited conventional transactional reporting to real-time analytics and monitoring by running statistical and machine learning algorithms to make predictions and forecasts that lead to well informed decisions and increase profitability. At Innowi Inc., a Silicon Valley startup, our philosophy is to understand our requirement of the business well, adopt a technology that is highly scalable, architect it to be future ready, start small, slow and less complex and then gradually scale it up as business grows.