post

#BigData – All of the Hype – None of the Calories!

There has been a lot of hype in the IT industry around Big Data over the last year. It’s been quite breathtaking watching the polarization that has come with this hype, and I have found that everyone falls into one of three camps. When you are reading or listening to some rhetoric about big data, it helps me to classify the argument into one of these three camps, that way I can discern what I am hearing or reading more quickly.

Camp #1 – They are selling something else. So, company A has a mobile app that collects data from a MSSQL server in the cloud, handles maybe 500 GB of data, and they don’t really have a clue what big data is all about. But, everyone is talking about big data, so they want to get into the game. They use phrases like “We’ve always been about big data” or  “We’ve been doing big data for 20 years”, or “We go from big data to smart data”. I’m sure they have a great mobile app, or they have a really clever data model in Teradata – but I generally see these as red flags as NoBigData. No big deal. Maybe the convolution they are creating around big data is generating some big sales for them. I don’t know. Big data sounds cool, right? Surely everyone does big data? If you find the conversation about big data quickly devolves to the cool user interface, or real time data collection via a mobile app, chances are they are just selling something else. I don’t have a problem with non big data products. Not everything should be a big data project to be completely honest. Just don’t call your product big data when it isn’t. It’s ok, really. My car or my DVD player aren’t big data either, but I use them regularly.

Camp #2 – They’ve been ordered to implement big data. These are the teams that know they have a mobile cloud app or a clever DB2 data warehouse, but the CIO has given the mandate that” everyone is doing big data, and so shall we all”. These folks would rather have their arm caught in a wood chipper than feed the big data hype, and they will do everything in their power to kill the initiative without sounding like an obstruction. Generally, I see these folks as hardened IT folks who have made their careers on another database platform, are comfortable with it, heck may even love it. It’s like asking them to cheat on their significant others with someone they don’t like. If it ain’t broke, why replace it? You see these teams, trying to install Hadoop on a TI-85 calculator, and then loading 5 TBs of data, and then complaining the query took an hour, so big data is busted. Maybe. Probably not. I bet they didn’t read the manual, because, you know – that’s against the code….wink wink.

Camp #3 – They get it. To be fair, some of these concepts require a completely new understanding on how data works. Schema on read, for instance. This just boggles the traditional RDBMS developers mind. Typically, you look at your data, create a table, set it’s column data types (these columns are integers, these other columns are strings), then load the data, and then output it to a report. This is called schema on write. I define the table, when I write the table structure. With Hadoop, We load the data, then figure out which columns are integers or strings, and then output the data to a report. This is called schema on read. Why is this significant? I can change the entire data model in 5 minutes. That is not very easy in Oracle or MSSQL. If I built an entire reporting application, and realized a year down the road that what I thought was an integer was supposed to be a character, I can still fix that in 5 minutes on a Hadoop platform. It would take longer than that to put together a project plan to do that in Oracle. I can also add a column to the end of my table, and then figure out what to do with it 6 months later. These concepts are so far removed from traditional databases that they just do not compute. But, when you see someone truly understand what these new super powers bring, then you can see the excitement. Speed, scalability, agility, cost savings – every single one of those are worth a consideration. Maybe when you put all four together in the same sentence, people just assume they are too good to be true?

 

#BigData