25 Surreal Facts about #BigData You Should Know

There have been a lot of negative connotations about the term “Big Data”. I watch DBAs cringe when you mention it. Whether or not you like the term, the fact of the matter is that we are now generating more data in the world and at an explosive rate – faster than we have ever seen. Take a look at this list of facts about how much and how fast we are generating data today.

  1. Over 90% of the world’s data was created in the past 2 years. [Source]
  2. In 2 days we generate as much data today as what was generated in our entire history up until 2003 [Source]
  3. The total volume of data being generated by industry doubles every 1.2 years [Source]
  4. Experts think that by 2020 the total amount of data in the world will grow from 3.2 zettabytes today to 40 zettabytes. [Source]
  5. Retailers could increase their profit margins by more than 60% through the full exploitation of big data analytics. [Source]
  6. Every 60 seconds we fire off 204 million emails, click 1,8 million Facebook likes, Tweet 278 thousand times, and post 200 thousand photos to Facebook [Source]
  7. About 100 hours of videos are posted to YouTube every minute and it would take 15 years to watch every video uploaded one day. [Source]
  8. Google processes an average of 40 thousand searches per second, over 3.5 billion in a single day. [Source]
  9. If you burned all of the data created in one day onto DVDs, you could stack them up and reach the moon – and back. [Source]
  10. It is believed that AT&T has the world’s largest database – its phone records database is 312 terabytes in size, and almost 2 trillion rows. [Source]
  11. 1.9 million Big Data jobs will be created in the US by 2015, and those will be supported by 3 new jobs created outside of IT – totaling of 6 million new jobs. [Source]
  12. Data transferred over mobile networks increased by 81% to 1.5 exabytes (1.5 billion gigabytes) per month from 2012 to 2014. Video makes up 53% of that total. [Source]
  13. The NSA is believed to analyze 1.6% of all global internet traffic – 30 petabytes (30 million gigabytes) daily. [Source]
  14. The Hadoop market is expected to grow from $2 billion in 2013 to $50 billion by 2020. [Source]
  15. Facebook users share 30 billion posts every day. [Source]
  16. 570 new websites are created minute. [Source]
  17. The number of electronic bits of data in existence is believed to have exceeded the number of stars in the physical universe in 2007. [Source]
  18. This year, there will be over 1.2 billion smart phones in the world which are constantly collecting data from sensors, and the growth is predicted to continue. [Source]
  19. The Internet of Things will mean that the number of devices that connect to the Internet will rise from about 13 billion today to 50 billion by 2020. [Source]
  20. 12 million RFID tags had been sold in 2011. By 2021, it is estimated that number will rise to 209 billion as the Internet of Things takes off. [Source]
  21. Data centers now occupy an area of land equal in size to almost 6,000 football fields. [Source]
  22. Almost as creepy as the movie Minority Report – Big data has been used to predict crimes before they occur– a “predictive policing” trial in California was able to identify areas where crime will occur three times more accurately than existing methods of forecasting. [Source]
  23. By better integrating big data analytics into healthcare, the industry could save $300 billion a year, according to a recent report – that’s the equivalent of reducing the healthcare costs of every man, woman and child by $1,000 a year. [Source]
  24. The big data industry is expected to grow from US$10.2 billion in 2013 to about US$54.3 billion by 2017. [Source]

So what is your Big Data strategy for your company? Still don’t think you need one?


Uncover #BigData Quality Issues With Your #Analytics Tool

As Business Intelligence grows in importance within many large and medium-sized organizations there are many issues surrounding the data that an organization has to deal with in order to improve its decision making processes. One of the most important is data quality which is frequently highlighted by Business Intelligence.

Comprehensive management of data quality is a crucial part of any Business Intelligence endeavor. It is important to address all types of data quality issues and come up with an all-in-one solution.

  • A Single (Trusted) Version of the Truth

    • Governing data quality ensures trust in your information, fixing data problems during the extraction, transformation and loading process, and creating policies to know when data is an outlier.
    • VortiSieze software supports the consistent accuracy of complete data so you can focus on making more informed decisions and gain efficiencies in your business processes.
    • Supporting growth, innovation and compliance is based your ability to make crucial business decisions which suffers when you lack credible information.
    • Ensuring a successful data management initiative requires carefully planning for data quality, i.e. accuracy.
    • A carefully planned data quality initiative is essential to any successful data management initiative – be it a business intelligence (BI) or data warehousing (DW) project, a new implementation of a customer relationship management (CRM) system, or a data migration (DM) project.
    • You can be more confident in your business decisions by taking the necessary steps to provide complete and reliable data.
  • Data Cleansing Delivers Data You Can Trust

    • With VortiSieze, parsing, standardizing and cleansing data, from any domain, source or type, is functionality built into the solution.
    • Parsing data identifies individual elements and breaks those into components. These are rearranged into a single field or move may elements from a single field into many, unique fields.
    • Once parsed, your data is check for consistency, preparing for validation, correction, and accurate record matching.
    • Your data is standardized using business rules that defines formatting, abbreviations, acronyms, punctuation, greetings, casing, order, and pattern matching – placing you in control according to your business needs.
    • Dirty data (data with incorrect elements) is cleansed by correcting or adding missing elements and is done on a wide variety of data types
  • Enhancing Data Gives Your Greater Insight and Opportunity

    • You can maximize the value of your data by enhancing data with internal or external sources, i.e. enriching your existing data set by appending additional data to it.
    • This provides a more complete view of your data that can help you, for example, more effectively target customers and prospects, take advantage of cross-selling opportunities, and gain deeper insights into your business.
    • With VortiSieze, enhancement options include:
      • Weather data to predict long term trends in agriculture.
      • Commodity prices to aid in negotiating with a valued distributor or retailer.
      • Planogram or modular data to enhance shelf display planning.
      • Geocoding longitude and latitude information to records for marketing initiatives that are geographically or demographically based.
      • Geospatial assignment of customer addresses for tax jurisdictions, insurance rating territories, and insurance hazards.
  • Uncover Real Issues with Data Input, Matching and Consolidation

    • Consolidate data to uncover hidden relationships and provide a single version of the truth.
    • Incorrect data creates problems that flow ‘downstream’ making it difficult to identify the correct entity to enter new information against and to verify even basic information such as how many customers you have, which products they own, and which products come from which suppliers.
    • Duplicate data presents a myriad of issues and it becomes difficult to:
      • Identify the correct data to key new information against
      • Verify even basic quantitative information on customers, products, or which products come from which suppliers.
    • Duplicate records can exist in more than one source systems; data matching algorithms within VortiSieze can reduce or eliminate duplicate data.
  • Governing Data With Data Quality Measures

    • VortiSieze software helps you to analyze and understand how trustworthy is your enterprise information.
    • You will also get continuous insight into the quality of your data.
  • You Make Better Decisions with Reliable Data That is Trusted

    • VortiSieze empowers you to enhance data quality for effective decision making and business operations.
    • You can easily find data outliers and as these arise correct the issue working to proactively prevent quality issues.
    • With VortiSieze, you can:
      • Define and implement aggressive data policies, continuously assess data quality and repair data problems.
      • Improve data by parsing, standardizing, and cleansing data from any source, domain, or type.
      • Enhance data with internal or external sources to maximize the value of your data.
      • Consolidate data to uncover hidden relationships and provide a single version of the truth.




5 Steps to Plan for Successful #BusinessIntelligence

Business intelligence is gain a lot of attention in successful organizations. And for good reason – there is a correlation between more advanced use of data and a positive impact on bottom-line earnings and business overall performance. It comes down to this – organizations which successfully leverage their data for insight and strategic advantage perform better and move with the market faster than those that do not.
Those groups which embrace technology that allows for data visualization and discovery – correctly –achieve success more often those who don’t at all – or do so incorrectly. Developing a plan for using quality data to their advantage keep these organizations from being left behind. Here are some tips on how to get started:

  1. Have a vision: BI technology is ubiquitous and new advances are made almost weekly. New technologies allow for data visualization and data discovery allows for the exploration of data in intuitive ways. However, that is like saying new darts are as accurate as drone missiles but missing a dartboard. Fundamentally there must be something to aim at – and in BI this comes down to figuring out what questions to ask and work out which data matters the most. Leveraging BI requires an insight into the big picture – one that receives support across all organizational functions and establishes how the organization can successfully evolve with a clear vision.
  2. Business outcomes must be defined: The Cheshire Cat said, “If you don’t know where you are going, any road will get you there.” For your BI project to succeed it is important to set specific and measurable targets. Begin by leveraging a mix of top-down and bottom-up approaches to recognize potential business use cases. Using a top down approach can be used to spot KPIs (Key Performance Indicators) and bottom up approach can be used to determine the data to improve the KPIs. It is advisable to achieve quick wins – use cases which can be improved in a short period of time to lock-in ongoing business support.
  3. Build the team structure: Generally, a lack of skills is one of the main obstacles in building a successful business intelligence team. When a sufficient number of people with the right skillsets are missing, either in organizations, or the marketplace optimal use of data cannot be achieved. A best practice adopted by the leading data revolution companies is to appoint a chief insight officer or a chief data officer to establish actionable insights.
  4. Create a governing group: When you consider implementing a business intelligence solution creating a Center of Excellence comprised of people who understand both – the company’s business and the IT environment – is highly recommended. This team can help build a BI system that is flexible and adaptable, two very important factors if the analytics solution is to stay relevant as the business evolves.
  5. Stress the technology: Many BI tools are architected as hierarchical, top-down data structures. This is easy to organize the data behind the scenes but limits the users to a predefined path required to find the data that is needed. Other tools are associative in nature which allows data to be ‘discovered’ intuitively much the way information is uncovered using, for example, a Google search. Obtain trial versions of both types and have a pilot group (or groups) stress-test these to uncover which approach best suits your needs.

Big data analytics is a big trend in the business world today and for good reason. However, as with everything in business, there is no one-size-fits approach. Every business has unique needs. However, by starting with the end in mind and working backwards, instead of buying-in a system and then adapting your organizational culture to shoe-horn-fit-it-in, you are likely to discover the best way to grow your business with the help of your business intelligence system.



Ten Secrets to Win with #Analytics


Nearly every organization today uses analytics. But not every organization is getting as much out of its analytics as it could. So, how do you truly excel with analytics to deliver the best support for decisions?

  1. Don’t fail to plan:  Doesn’t sound like a secret at all does it? Well, too feworganizations have spent the time to begin with the end in mind. The most successful companies always begin their analytics projects with a clear vision of what is the target. The key stakeholders should be aligned by writing down and sharing:
    • What you’re trying to achieve
    • Who you’re trying to reach
    • Why it matters
    • How you’ll measure success
  2. Use your analytics tool to uncover data quality issues:  Don’t let the desire for perfect data be the barrier to very good data. Instead, use your analytics tools to spot abnormalities in your data and learn from them. Then, work with the people who own the data and share your insights; thus helping them fix their processes. By forming partnerships, you can significantly improve your data quality over time.
  3. Use Good Design:  Most of your data consumers visualize data to understand it, so aesthetics play an important role. Like an interior decorator, a good designer can help you develop an intuitive and effective user experience and a great look and feel for dashboards and visualizations. However, data visualization best practices always outrank aesthetic design – every time.
  4. Repetition, repetition & repetition – learn through play & through doing:  Your worst data model is your first one – nobody creates a perfect model for their data on the first try. And that’s OK. Truth is, looking at your data from different angles can teach you a lot about it. Let everyone connect with the data in their own ways — you’ll be amazed at what they discover. Use what they do to inform your strategy (back to #1).
  5. Be your loudest evangelist!  Some software projects are mandatory for users, however, adoption of analytics is voluntary in most organizations. So, if you want people to know you have built a better mouse trap, act like Guy Kawasaki and start promoting it. Recruit your marketing department and sell the value of analytics throughout your entire team and organization.
  6. You need a champion:  champion-awardFind an influential person or team that has an unmet need and empower them with analytics. This can turn them into true believers by showing them what’s possible. Then turn the spotlight on their success to prove the value of analytics to the rest of your business.
  7. Build a Cross Functional Team:  Selling analytics is simple when it becomes easy to repeat successes and avoid failures. Bring together a cross-functional team and put them in charge of:
    • Deciding the role of analytics
    • Defining the standards and tools
    • Identifying best practices and gaps
    • Iterating and improving the solution over time
  8. Have dual processes:  Changing the method of measuring KPIs or profits requires taking your time and getting it right. However, sometimes you have a unique and urgent situation and must develop an app right now to analyze it. Put in place different processes for both scenarios — and accept the fact that it’s OK to build temporary throw-away apps for one‑off projects.
  9. Reports are so 90’s:  Don’t be like most BI deployments that tend to focus on delivering the same out-of-date reports that has been around for decades. Simply describing the situation presented in the data does not provide analytic value for decision makers. You must answer the ‘why?’, not just the ‘what?’. So, shift your efforts to emphasize diagnostic discovery and exploration capabilities.
  10. What is your data worth?  Are you sitting on the proverbial goldmine with your information? Would outside organizations (internal or external to your BI-dollar-signcompany) pay good money to gain access to your proprietary data? Or, as some large retailers do, can you use it to add value for your customers or vendors? Take a step back and see the forest – think creatively about all the ways you could monetize the data you already own.






What #BigData Is The Money Hiding Under?

So the big push I am seeing in the last 5 years is that everything needs to be productized. In English, this means that all of the hard work has been done for you, all you need to so is sit back and run these 50 reports that we are going to give you, and you will take these product configurations as the gospel and nobody will need anything else. Basically, everyone is trying to tell you that not only do they know your business better than you do, and, we already know all of the questions you are going to ask of the “product” to make your business successful. Who needs custom reports, or the ability to drive customizations, or what if I want to ask a question of the data/system/consultants that the “product” does not answer out of the gate? Well then you pay through the nose for some software development to ask the “one question” you had at this given time.

As technology is becoming easier and easier to grab data and do something with it (MicroStrategy, Tableau, Qlikview, R) it is becoming harder and harder for companies that have outsourced a tool to a third party vendor to ask the questions of today, much less tomorrow. As tools become “productized” it becomes harder and harder to ask questions of your data that might lead to an increase in sales, or help minimize out of stocks, or help identify your ever changing customer. How about identify if a product post goes viral? How do you capture that and run with it? What is a company to do? First, make sure you have easy access to your data. It might not be your tool, but it is your data. How will the vendor provide data to run a scenario to help identify product attributes or sales trends that you can then mix with other data channels like twitter or YouTube that you can do something with. Will the vendor help you with this for a reasonable cost?

Here is a question I posed to my seniors at my last staff meeting – Do you want to ask the same question (that everyone else is asking) 600 times this year, or do you want to ask 12 really hard questions nobody else is asking with a failure rate of 85%? Are those two really good questions going to be worth it to help differentiate yourself from all of your peers in your space? My guess is yes. The problem is that the “products” I am seeing aren’t going to let you ask those 12 really hard questions – but that’s where the money is hiding these days.




What’s next for #CPG #CategoryManagement?

What do Walmart, Facebook, Yahoo, Twitter, IBM, Google, EBay, Teradata, LinkedIn, Hulu, The New York Times, MicroStrategy, and P&G have in common? They are all harnessing the power of Hadoop to store, serve, slice, and rationalize Big Data to advance their business. To peer into volumes of data like never before. Volumes of data that have been too big to be this nimble and uncovering things about their customers that they never dreamed possible – until today.

The biggest downside to standard data warehousing and BI tools today is that you have to know the questions you want to ask ahead of time. This creates a never ending search for patterns, outliers, and relationships in your data. If you dream up a question your existing architecture doesn’t support, you have to involve IT or software vendors and re-architect the whole data warehouse.

What if you could gaze into a magic 8-ball and it would tell you everything you needed to know about your retail category – all of the SKU changes to maximize sales,  your out-of-stocks and phantom inventory, your sales by geography or store traits, plus patterns in your data that you did not even know to look for. Welcome to the next generation of BI data warehousing in retail category management – Hadoop!

Why Hadoop?

  • Hadoop is powering today’s Big Data initiatives and is gaining more and more acceptance across many different business units. Coupled with Hive, Pig, Scoop, MapReduce, and numerous others, there are multiple robust ways to attack and slice your data.
  • Your original data formats are unchanged, so you can reuse them in their raw form at a later date. This guarantees no data loss in case you think of some way to explore your data in the future that you have not thought of today. It also does not lock you into a proprietary third party data format.
  • No ETL is required. Data is loaded into the HDFS and then you are done. Then use coupled tools to go unearth the data you are looking for rather than churning it into a cookie cutter format that you hope will give you insights.
  • Hadoop is scalable using inexpensive hardware. Add nodes to your cluster all day long, using junker PCs you have lying around in the closet. No longer do you need a $50K RAID SAN to house and protect your data. Running out of space after 5 years of category data? Just load up some more nodes and you will be good for another few years.
  • Hadoop couples with several analytics vendors – MicroStrategy, Pentaho, Zoomdata, SSRS, Tableau, SAS, with other open source products as well as numerous several built-in packages.

We are breaking new ground focusing on implementing Retaillink or other Demand Signal data in a Hadoop cluster, and applying several analytics packages on top of that to let this new Big Data platform shine in the category management space like never before.