Getting big data right
The evolution of big data, data science and analytics was on full display at this year’s Strata Data Conference with one overarching message: We need to get big data right.
This article first appeared on CIO.com. The evolution of big data, data science and analytics was on full display at this year’s Strata Data Conference with one overarching message: We need to get big data right. The gathered crowd looked on with amazement and let out an audible gasp. The volunteer thought he had stumped the magician with his chosen number, 83. The magician had written 16 numbers in a four by four matrix — and 83 was nowhere in the mix. But then the magician broke into a wide grin. “Check this out. If you add up all the numbers across each row, they add up to 83. The numbers in each column? That’s right, 83. In fact, every combination adds up to your number. Amazing, right? It’s just like what happens when you get data right — it’s like magic!” The magician was pitching for Diwo, a new cognitive decision-making platform, at their booth at the Strata Data Conference held last week in New York. It was a fun way to introduce their new solution — and, perhaps unwittingly, it was also allegorical to the evolution of the big data market. While we have been talking about big data, data science and analytics for quite some time, there is an evolution of the market that was on full display at this year’s event. It could be seen in the several interwoven themes that permeated both the keynote stage and the exhibit floor. The overarching message: It’s time to get big data right. These themes all touched on the same broad idea that it is time to move beyond the exploration stage and apply big data, data science and analytics in real life and at scale so that the power of data can transform business models and the customer experience — and perhaps make it all feel a bit like magic.
Making big data real
For most of its existence, big data has been a technically focused domain. While the business implications have (almost) always been clear, the focus of the market has predominately been on experimentation and figuring out how to solve the big, hairy technical problems incumbent with massive data sets. There were, of course, successful applications of big data that produced significant business results, but the primary driver of the market was technical development — not business application. This year, however, there was a visible change. First, there was significant airtime given to the societal impact of data and the important role that data scientists and practitioners must play as the industry continues to evolve. “We haven’t established standards for what is good enough in data science,” cautioned Cathy O’Neil, mathematician and author of Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. “Bad algorithms can destroy people’s lives…and AI isn’t fair. It automates the status quo and all of its implicit biases. That needs to be acknowledged, particularly when we’re focused on those things that impact people’s lives.” Manuel García-Herranz, chief scientist with UNICEF’s Office of Innovation, and Jer Thorp, innovator in residence at the Library of Congress, echoed similar sentiments about both the positive — and potentially negative — impact that data can have on the world. Garcia-Herranz shared how UNICEF is integrating data science and real-time systems to discover insights while it can still use them to take meaningful action and “apply data for humanity.” Thorp, on the other hand, cautioned attendees that their desired objectivity could keep them from understanding the real power and impact of data. His advice was as simple as it was profound: “Get out of your chair.” At the same time, the intersection of big data and artificial intelligence (AI) is increasing the urgency with which enterprises are addressing their big data initiatives. While big data's nature as the fuel that powers AI has made the two symbiotic, the rapid ascent of AI as a top-of-mind issue with business executives is making the need to deploy big data on an enterprise scale a strategic imperative.
Making big data real-time
Shifting the focus to how organizations apply big data in the real world has also led to a second significant trend: the shift to apply data and analytics in real time. There is increasing recognition that post-transaction analytics is not the only big data use case — and in fact, may not even the best one. Many organizations now realize that they can harness the greatest value from their big data initiatives by applying the resulting analytics and insights at the point of the transaction. With this use case, organizations can go beyond merely using data retrospectively for analysis and planning and can instead use it to shape the customer experience, enable better decision-making and reduce risk in real time before negative outcomes occur. This use of data at the point of transaction can take many forms and should be an essential element of any modern big data strategy. Several technology companies have introduced tools and strategies to help enterprise organizations integrate the insights and analytics they derive from their big data initiatives in real time. These include (in alphabetical order):
Cambridge Semantics: An end-to-end, exploratory analytics solution built on a semantic relational data model that enables real-time analytics and reduces time to market by structuring data according to business context.
MapR: A converged data platform offering what it calls a "data fabric" that integrates traditional data lakes with streaming data in a single, location-independent, and context-aware platform.
Splice Machine: An application development platform that creates a new breed of "predictive applications" that merge transactional and analytical processing and injects analytics-derived insights into the application workflow.
Striim: A real-time data integration and streaming analytics platform that analyzes data at the time of ingestion to support decision-making with real-time insights.
VoltDB: An operational data platform that offers real-time event processing and analytics with millisecond response times.
While these technology providers are taking very different approaches to apply big data in real time, each of them is delivering the same broad message to enterprise organizations: The best time to use big data is right now.
Making big data work
Understanding the real-world implications of big data and moving its application to the point of a transaction won’t do any good, however, if an organization is unable to make big data work at enterprise-scale and within the enterprise operating model. As organizations have attempted to move big data beyond the realm of experimentation and head toward full-scale, enterprise-wide application, they have run into significant governance, management, and scale issues. It should be no surprise, therefore, that the third major theme on display at Strata was the need for organizations to make big data work at enterprise-scale. Unquestionably, a significant part of that process requires cultural and organizational transformation. But it’s also clear that organizations must also transform how they apply the technology itself to make it work at scale. Several technology companies showed new technologies and technology-driven approaches that they believe will help enterprise organizations grapple with the various aspects of making big data enterprise-ready, including (in alphabetical order):
Dataguise: A data governance platform that discovers, detects, protects and monitors sensitive information, such as PII, PCI, and HIPPA, wherever it is within the organization’s data landscape.
Dataiku: A big data platform that creates a single data pipeline that breaks down silos and enables data scientists and data analysts to easily work together and speed the deployment of predictive solutions.
DriveScale: A software company changing the way organizations deploy and consume storage with what it calls "software composable infrastructure," which connects disaggregated components (compute and storage) in an intelligent and highly dynamic manner.
Pure Storage: A highly efficient storage platform purpose-built for scaled-out big data deployments, modern analytics needs, and AI.
Zaloni: A data lake management platform focused on helping enterprises operationalize their data lake and rapidly get to business value at scale.
Getting big data right
The technical challenges that have been at the heart of the big data industry persist — and will endure for the foreseeable future. As data growth continues exponentially, enterprise organizations and the technology companies that serve them will remain locked in a continuous battle to tame it and make it manageable. Nevertheless, it is clear that organizations are now beginning to earnestly address the challenges that come with applying big data across the enterprise so that they can accelerate digital transformation and feed their growing AI initiatives. And, it seems clear, both the industry and enterprise organizations are recognizing that for big data to fulfill its promise, they need to get big data right. [Disclosure: The Strata Data Conference provided me a free pass for this event, a standard industry practice.]