#Spark on #Hadoop

October 22, 2014

I just watched a Hortonworks webinar about Hortonworks tackling integration of Spark with Hadoop 2.0 and YARN. An issue with Spark is locking and stalls in large multi-tenant Hadoop clusters based on the way Spark uses resources vs the rest of the Hadoop stack. Hortonworks aims to make Spark run fully within YARN and behave more like the rest of the stack.

There are some compelling reasons to like Spark. One, it provides a single consistent programming model across single instances, Spark clusters and Hadoop. Second, it draws on the vast library of Python analytics like NumPy and SciPy. Of course, on the downside it means being able to program in NumPy and SciPy which is pretty techy.

Still, if Hortonworks makes it work, next year could be the year of Spark.