YourStory.in’s ‘Big Data Big Questions’ event hosted the founder of one of the most celebrated companies in the big data Industry today. Dr. Amr Awadallah has been associated with Big Data products since its early days. As VP of Product Intelligence Engineering at Yahoo! he ran one of the first business units using Hadoop for data analysis, besides being the founder, Amr currently serves as the chief technical officer (CTO) at Cloudera. We caught up with to discuss the Big Data industry and what it means to India.
YS: You’ve been associated with Big Data products since its early days. Tell us how you have seen it grow since your time at Yahoo! and now at Cloudera.
Amr: The problem of Big Data is threefold – volume, variety and velocity. At Yahoo!, working on business intelligence involved working with data and that gave me too many headaches! Not only was the amount of data very large, we were getting very different kinds of data from different sources, like mobile for example. And 8 years ago there really weren’t many solutions in the market that could work around these three problems. However, within Yahoo! there was a technology called Hadoop, which was being used for web search. I instantly felt that it offered great value for what I was working on, and employed it in the business intelligence team and we reaped rich dividends. Since then, companies all over have used Hadoop and the rest is history.
YS: Experts in the Big Data domain say there is still no Big Data product that can analyse exabytes of data. Is there a limit to the data that can be analysed by the current crop of Big Data products? If so, is it a technology problem or a business one?
Amr: Yes, you are right in saying so, but I don’t think the capabilities of Big Data products today is limited by technology. Hadoop itself is used by many companies handling hundreds of peta-bytes scale, which is a lot more data than what most companies in world have today. I think it is a maturity problem, which essentially poses two challenges – firstly, the number of people who can work on big data products are very few. And this is a problem with any growing segment. This gap will be filled over time. For example, when Java first came out, there were not many people who knew how to work on it, but look at it now. The second problem is getting the already existing business intelligences tools in the market to work with Apache Hadoop. That is getting better and better every day. We’re seeing a lot of big technology partners, like Informatica that are already working with Hadoop.
YS: How does India as a market, look to you?
Amr: We look at India as an emerging market, which is growing at a brisk pace. Before this, we had our eyes set on larger markets, like the USA. Now we’re looking to expand more in the emerging markets. India especially has shown the willingness to adopt newer technology. However, this year, we will be looking to expand in Europe and Japan. The year after that, we will look to expand in India.
YS: What is your advice to a big data startups?
Amr: My advice would be to identify problems which can be solved using powerful tools like what Cloudera has and go solve them! Take advantage of these tools. I gave some examples of that in my talk, and we’ve seen people leverage these technologies to build very innovative applications that we could have never thought about. We have many powerful tools at our disposal today. The Cloudera distribution system with Hadoop, for example, is a very powerful tool which can handle very large amounts of dynamic data. That capability did not exist before this. Now that you have it, make sure to use it in the best possible way.