#Spark on #Hadoop

October 22, 2014

I just watched a Hortonworks webinar about Hortonworks tackling integration of Spark with Hadoop 2.0 and YARN. An issue with Spark is locking and stalls in large multi-tenant Hadoop clusters based on the way Spark uses resources vs the rest of the Hadoop stack. Hortonworks aims to make Spark run fully within YARN and behave more like the rest of the stack.

There are some compelling reasons to like Spark. One, it provides a single consistent programming model across single instances, Spark clusters and Hadoop. Second, it draws on the vast library of Python analytics like NumPy and SciPy. Of course, on the downside it means being able to program in NumPy and SciPy which is pretty techy.

Still, if Hortonworks makes it work, next year could be the year of Spark.

Did Oracle miss the Big Data Boat

October 6, 2014

Not just Oracle but did the whole RDBMS industry let the Hadoop thing get away from them? They didn’t take it seriously and left the market wide open for open source to march in. Now they can’t get rid of it.

Over the next five years or so, all the spending will be on “Big Data”. But very little will actually be spent on “Big Data” database software because Hadoop is free. Now Hortonworks, Cloudera and MapR will make scads of money at their level but it will be a pittance by Oracle’s standards. Every customer will put their RDBMS systems in maintenance mode while they concentrate on “Big Data” which means flat spending and no growth for Oracle et al. Eventually the “Big Data” thing may saturate and we’ll get back to spending on core RDBMS systems but that’s years away.

The RDBMS vendors can try things like Oracle with its Exadata big data appliance but really this is either silly or desperate.

If you are an Oracle DBA look to a future doing upgrades, applying patches, maybe upgrading capacity to handle the load coming from Hadoop. Nobody is going to launch anything new and exciting on your platform.