MySQL Workshop

November 2, 2016

Last week I attended a MySQL workshop presented by Oracle

The main feature of the workshop was to introduce the development release of MySQL 8.0. This is the next major release, for several reasons Oracle is skipping from 5.7 to 8.0.

There are a couple of changes that make MySQL look more like Oracle Database

– A new table based data dictionary replacing the text file based one

– A SYS schema that looks somewhat like the Oracle Database one, with management views

The other big item is a JSON based database engine, to compete with MongoDB. MySQL JSON retains the ACID transactions of MySQL so in that way it is a bit different from Mongo. It will be slower but more durable. This is implemented with an entirely new protocol called X, rather than extensions to the SQL protocol. It just didn’t fit into the SQL protocol.

Some of these features, most notably JSON, have been back ported to 5.7 as the production 8.0 release is perhaps a year away.

The 8.0 development release is available at It is still small and still installs and runs easily on a laptop.

#Ubuntu on SSD

November 1, 2016

Rescued an SSD drive from a dead laptop and repurposed it to install Ubuntu on one of my Win10 desktops. It certainly is fast, about 15 secs from cold boot to ready to work. Certainly out-performs Win10 on the same hardware. Even something as slow as Firefox just pops. Handy when I want to look up something quickly, it is faster than struggling with my iPad even.

#IRMAC Open House @PwC Dec 11. #DAMA Toronto

November 24, 2015

IRMAC (DAMA – Toronto) would like to invite you to attend our 2015 Open House hosted by PwC (Bremner Blvd, near the AIr Canada Centre) on Dec 11th at 5:00pm for the reception.

IRMAC Letter to Universities #data #education

April 14, 2015

IRMAC, the DAMA Toronto chapter, has posted an open letter to the Computer Science and related departments of universities calling for more data related content in the curricula. See it here

#Spark on #Hadoop

October 22, 2014

I just watched a Hortonworks webinar about Hortonworks tackling integration of Spark with Hadoop 2.0 and YARN. An issue with Spark is locking and stalls in large multi-tenant Hadoop clusters based on the way Spark uses resources vs the rest of the Hadoop stack. Hortonworks aims to make Spark run fully within YARN and behave more like the rest of the stack.

There are some compelling reasons to like Spark. One, it provides a single consistent programming model across single instances, Spark clusters and Hadoop. Second, it draws on the vast library of Python analytics like NumPy and SciPy. Of course, on the downside it means being able to program in NumPy and SciPy which is pretty techy.

Still, if Hortonworks makes it work, next year could be the year of Spark.

Did Oracle miss the Big Data Boat

October 6, 2014

Not just Oracle but did the whole RDBMS industry let the Hadoop thing get away from them? They didn’t take it seriously and left the market wide open for open source to march in. Now they can’t get rid of it.

Over the next five years or so, all the spending will be on “Big Data”. But very little will actually be spent on “Big Data” database software because Hadoop is free. Now Hortonworks, Cloudera and MapR will make scads of money at their level but it will be a pittance by Oracle’s standards. Every customer will put their RDBMS systems in maintenance mode while they concentrate on “Big Data” which means flat spending and no growth for Oracle et al. Eventually the “Big Data” thing may saturate and we’ll get back to spending on core RDBMS systems but that’s years away.

The RDBMS vendors can try things like Oracle with its Exadata big data appliance but really this is either silly or desperate.

If you are an Oracle DBA look to a future doing upgrades, applying patches, maybe upgrading capacity to handle the load coming from Hadoop. Nobody is going to launch anything new and exciting on your platform.

Is #Hadoop the Next #bigthing

August 10, 2014

Yes Hadoop is the #bigdata thing but is it the next big thing in general? Is it time to stop talking about #bigdata and just say #data? It is time to take these so called bigdata technologies mainstream for the whole data centre?

There are some attractive ideas about having Hadoop in the centre of your data centre

· The cost of saving data drops so low, you can just save everything and worry about it later. Without all that expensive ETL, aggregation, and dimensional modelling running up costs for storing data you don’t even know if you really need yet.

· The cost moves to the extract side with "schema on read" . It may even be higher but presumably you know the value of extracting it at this point so it makes sense to have the cost here.

· It is truly enterprise scale. How often do we talk about 5 year plans to "boil the ocean" for some enterprise wide initiative like MDM . This is an ocean that can be boiled in days or even hours.

· It is open source so you can store your data without vendor lock in

The big question though, is what do you really do with it.

· I can’t imagine many business users writing Python scripts to run on Spark for example. The users that can will have a big early advantage though.

· So you still need those legacy data marts and report repositories hung off of it. Users already complain that their data warehouses and data marts aren’t real time enough now. doesn’t adding Hadoop in the middle make this worse?

· For now that is, until there are new tools that let users build and query datasets off the Hadoop cluster easily. Spark sort of does this now, though not easily. Once these are available, those data marts and report repositories will fade away

The modern data architecture will be something like the HortonWorks DataLake in the middle with new Hadoop enabled tools and all those legacy RDMSs and DataMarts strung around the periphery. All of the interesting work in the next 5 years will centre around this DataLake , the new tools, and the connections to it. Everything else will be just maintenance.