Is #Hadoop the Next #bigthing

August 10, 2014

Yes Hadoop is the #bigdata thing but is it the next big thing in general? Is it time to stop talking about #bigdata and just say #data? It is time to take these so called bigdata technologies mainstream for the whole data centre?

There are some attractive ideas about having Hadoop in the centre of your data centre

· The cost of saving data drops so low, you can just save everything and worry about it later. Without all that expensive ETL, aggregation, and dimensional modelling running up costs for storing data you don’t even know if you really need yet.

· The cost moves to the extract side with "schema on read" . It may even be higher but presumably you know the value of extracting it at this point so it makes sense to have the cost here.

· It is truly enterprise scale. How often do we talk about 5 year plans to "boil the ocean" for some enterprise wide initiative like MDM . This is an ocean that can be boiled in days or even hours.

· It is open source so you can store your data without vendor lock in

The big question though, is what do you really do with it.

· I can’t imagine many business users writing Python scripts to run on Spark for example. The users that can will have a big early advantage though.

· So you still need those legacy data marts and report repositories hung off of it. Users already complain that their data warehouses and data marts aren’t real time enough now. doesn’t adding Hadoop in the middle make this worse?

· For now that is, until there are new tools that let users build and query datasets off the Hadoop cluster easily. Spark sort of does this now, though not easily. Once these are available, those data marts and report repositories will fade away

The modern data architecture will be something like the HortonWorks DataLake in the middle with new Hadoop enabled tools and all those legacy RDMSs and DataMarts strung around the periphery. All of the interesting work in the next 5 years will centre around this DataLake , the new tools, and the connections to it. Everything else will be just maintenance.


NoSQL in Odd Places

July 5, 2013

What started me thinking NoSQL will be where all the application action goes was finding NoSQL databases is unexpected places. Meaning not the usual ‘Big Data” but something like embedded systems for example.

Why would an embedded systems developer choose NoSQL over SQL for a small embedded database? On the surface you have to wonder but the answer is likely the programming model. It is a lot easier for a Java developer to work with Java objects and JSON documents than to map those objects to SQL. Most of them don’t understand SQL very well and they make a mess of it, so just pick something they understand better.

I was shocked to find an RDM system that runs on MongoDB. Surely RDM is one place that would need the robustness of a relational database but no it doesn’t. Look a bit deeper and you see why. It is a closed system, nobody will be doing adhoc queries and updates to an RDM, it is closed to just the app. While the database doesn’t provide much in the way of data consistency, the app has control and can do it. At the same time, it has to be able to deal with data is all sort of formats and variations, the schema-less database makes this easier.

I expect we’ll see more and more of this. The transaction control and object integrity moves out of the datastore and into the app framework. It is easier or faster to program there. Time to market is everything today and if NoSQL is good enough and faster to market, it will win most of the time.


SQL vs. NoSQL, OK, I Give Up

July 4, 2013

In the long running relational (SQL) vs NoSQL database debate, I’ve always been a big relational fan, I still believe it is the best technology to store data without error but I give in. It’s great technology but it’s too inaccessible. It’s too expensive, too fragile, too hard to manage, too hard to program etc. All the action from now on will be with NoSQL

I feel like the guy with a Sun Workstation in 1995 telling the world that Windows95 PCs are vastly inferior tech. They were but it didn’t matter. Windows95 was so accessible, it just took over everything. All the application action moved to that platform. And eventually Microsoft and Intel created a platform as good as any workstation and nobody has talked about workstations for years now.

Relational DBs will survive of course. Just as Sun servers now flourish hidden in every datacenter in the world, behind all that NoSQL will still be Oracle and DB2 holding the really important stuff. But only a few DBA geeks will ever see it. No interesting new apps will be written to it directly, just legacy maintenance. Business wil no longer put up with the high cost to run relational DBs, they will just run what they must, outsourced to the lowest bidder. Not a very interesting place to be.

I’ll grant an exception to MySQL. Since it was born into the online WEB world that NoSQL also serves ,it has some of that same easy accessibility. The SQL database for the NoSQL world perhaps.


MDM Inter-operability

July 20, 2012

Can MDM hubs inter-operate? Without a whole lot of custom code or arcane proprietary interfaces?

I attended Aaron Zornes presentation on the state of MDM today at the MDM Summit Toronto and frankly I found it to be a bit depressing. First, no one product does everything well. So to do everything well, I might want to use more than one vendor, if that was possible. Second, I will get several vendors products together whether I want to or not due to MDM and RDM being bundled into the application stacks by the big vendors like Microsoft and Oracle.

If I am going to have several MDM products together, can these things talk to each other? Now I’m sure there lots of SI’s out there that would be happy to charge handsomely to build some custom interfaces. There may also be some proprietary interfaces out there but these will always behave differently depending on the vendor and each will have its custom setup.

What I’d really like to see is an industry standard protocol for MDM hubs so that there would be relatively predictable results when hubs work together and without a whole lot of custom code or setup to support.

I’ve never seen such a thing though, or even a discussion about it.


Data Modeling Jobs

December 19, 2011

Last week at IRMAC, I heard Karen Lopez talking about the severe shortage of experienced data modelers. Sadly, I drifted out of the field about 10 years ago when the jobs dried up. When I used ERwin, it was still owned by Logicworks. I still have a Logicworks shirt from one of their user conferences

So, if had stuck with it and been gainfully unemployed as a data modeler the last 10 years, would I be in big demand now?


#UWMath Reunion 2011

September 25, 2011

Saturday I attended the 2011 UWaterloo Math reunion, partly to take the tour that I missed on my 2009 25th anniversary reunion. On this tour I got to see the brand new Math 3 building that just opened. It has that stark minimalist look of many new public buildings where the budget is very tight but it does have a lot of natural light which makes it a pleasant place.

Other things I discovered on the tour:

They still have computer labs, I would have thought that personal laptops would make labs obsolete but there are lots of labs still and still open 24×7 as far as I could tell.

MathSOC is still there in the same place. The CSC is still there too. Hopefully the CSC couch is NOT still there, the office wasn’t open. Watsfic is gone though, perhaps moved somewhere else.

The Math 3rd floor lounge is now the “Comfy” and the new funiture certainly is comfy, you could easily sleep in one of those chairs. The C+D is still there too of course

The Red Room is long gone, can’t even tell where it was. But MFCF is still there, though instead of a Honeywell and a VAX there is a bunch of high end SGI and Sun (Oracle) servers.

They still hand in assignments on paper into that big mailbox with the slots. It looks like the same one that was there 25+ yuears ago. Unbelieveable, I thought they would use email and something like PDFs long ago.

And I forgot to look for the study table donated by our 25th anniversay reunion. The students do love the study tables setup around the Math building now.The have power and network connections and many were in use even on a Saturday afternoon.


CDMP Data Governance beta exam #datagovernance

July 5, 2011

I wrote the DAMA CDMP Data Governance beta exam recently. I scored 68%, which actually surprised me. Due to time constraints I only studied the data governance chapter of the DMBOK and attended the June MDM+DG conference in Toronto as study materials. That may have been good timing because there were some questions I recognized from material covered at the conference. There were also some questions that I didn’t understand what was being asked, which probably shows that I should have used a broader set of study materials.

This exam has a different feel from the other CDP exams, less technical, more shades of gray perhaps. I would characterize it as picking the most correct answer rather than just picking correct answer as on the more technical exams. I put that to the nature of data governance being more political than the other more technical areas. It was an interesting exam to write because of that.

I already have my CDMP so I didn’t really need to write it but beta exams are free and DAMA needs beta testers to validate the exam. If the beta is still open, I encourage anyone interested in data governance to give it a try. It helps DAMA get the exam validated and helps you to “know what you know”. It is also an interesting set of questions to work with.


Follow

Get every new post delivered to your Inbox.