#Spark on #Hadoop

I just watched a Hortonworks webinar about Hortonworks tackling integration of Spark with Hadoop 2.0 and YARN. An issue with Spark is locking and stalls in large multi-tenant Hadoop clusters based on the way Spark uses resources vs the rest of the Hadoop stack. Hortonworks aims to make Spark run fully within YARN and behave more like the rest of the stack.

There are some compelling reasons to like Spark. One, it provides a single consistent programming model across single instances, Spark clusters and Hadoop. Second, it draws on the vast library of Python analytics like NumPy and SciPy. Of course, on the downside it means being able to program in NumPy and SciPy which is pretty techy.

Still, if Hortonworks makes it work, next year could be the year of Spark.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: