Skip to content


Leadership Means Sacrificing

Screen Shot 2014-04-24 at 3.55.53 PM

What do Special Forces, Army Rangers, Navy SEALs, and Marines all have in common?

Teams like these go through what is considered by some to be the toughest military training in the world.

They also encounter obstacles that develop and test their stamina, leadership and ability to work as a team like no other.

I was talking recently to a colleague of mine about some our own own leadership at work. Emotions were strong. Deep sighs punctuated every other sentence.

We’re going through a business transformation, and as with most company turn-arounds, there is a strong conflict between the “old” and the “new”. This means old vs. new target markets, old vs. new business processes,  old vs. new people…and, at the core of most issues, old vs. new culture.

This colleague is part of the “new team”, chartered to help create change.

“I struggle with some of the leadership.”  he said, which reflected a general theme throughout the conversation.

This reminded me of the book, Fearless, the story of Adam Brown, a Navy SEAL, who sacrificed his life during the hunt for Osama Bin Laden.

Strange to think about the military when talking about business, since these two worlds couldn’t be the furthest from the other….or are they?

What Kind of Leadership Would You Prefer?

When Navy SEAL Adam Brown woke up on March 17, 2010, he didn’t know that he would die that night in the Hindu Kush Mountains of Afghanistan.

Who risks their lives for others so that they may survive? Heroes like Adam Brown do. You’ll find that military personnel are trained to risk their lives for others so that they may survive.

Would you want to be a part of a team with people who are willing to sacrifice themselves so that others like you may gain? Who wouldn’t?

In business, unfortunately, we give bonuses to employees who are willing to sacrifice others so that the business may gain.

I don’t know about you, but most people that I know want to work for an organization in which you have ABSOLUTE CONFIDENCE that others in the organization would sacrifice…so that YOU can gain…not them, not the business.

And guess what? The leadership and the business end up gaining in the end….because they have a workforce that doesn’t waste its time always looking over its shoulder, wondering what is going to happen next.

A Winning Culture

In my work to create high performing teams, I look for the type of business collegaues who are more like Adam Brown…the ones who sacrifice for the good of the team, not themselves. We want people who value this. This isn’t negotiable.

I want the team to know that I will GO OUT OF MY WAY to improve their well-being….that I care more about their success than my own. It’s not bullshit. Just ask anyone who has been part of a high-performing team….and you’ll probably hear the same.

” I care more about their success than my own”

Why? Because their success is our success. It’s that simple.

A winning culture is one where you have a team of people who are interested in improving each other…sacrificing their own interests in order to help the other.

In the end, you are NEVER looking over your shoulder…you are NEVER wasting energy trying to understand the mission. You’re focused, and you execute.

That’s a winning culture…a winning team…that’s leadership.

My collegue and I regained our enthusiasm as we reflected on our similar views. His last words still echoing in my head…

“One team one fight. Unity is what brings the necessary efficiencies to fight effectively. Lack of unity creates unnecessary distractions from the objective at hand.”

Posted in Leadership.

Tagged with , , , , , , , , , , , .


Big Data Top Ten

 

Screen Shot 2013-12-20 at 7.52.56 AM

What do you get when you combine Big Data technologies….like Pig and Hive? A flying pig?

No, you get a “Logical Data Warehouse”.

My general prediction is that Cloudera and Hortonworks are both aggressively moving to fulfilling a vision which looks a lot like Gartner’s “Logical Data Warehouse”….namely, “the next-generation data warehouse that improves agility, enables innovation and responds more efficiently to changing business requirements.”

In 2012, Infochimps (now CSC) leveraged its early use of stream processing, NoSQLs, and Hadoop to create a design pattern which combined real-time, ad-hoc, and batch analytics. This concept of combining the best-in-breed Big Data technologies will continue to advance across the industry until the entire legacy (and proprietary) data infrastructure stack will be replaced with a new (and open) one.

As this is happening, I predict that the following 10 Big Data events will occur in 2014.

1. Consolidation of NoSQLs begins

A few projects have strong commercialization companies backing them. These are companies who have reached “critical mass”, including Datastax with Cassandra, 10gen with MongoDB, and Couchbase with CouchDB.  Leading open source projects, like these, will pull further and further away from the pack of 150+ other NoSQLs, who are either fighting for the same value propositions (with a lot less traction) or solving small niche use-cases (and markets).

2. The Hadoop Clone wars end

The industry will begin standardizing on two distributions. Everyone else will become less relevant (It’s Intel vs. AMD. Lets not forget the other x86 vendors like IBM, UMC, NEC, NexGen, National, Cyrix, IDT, Rise, and Transmeta). If you are a Hadoop vendor, you’re either the Intel or AMD. Otherwise, you better be acquired or get out of the business by end of 2014.

3. Open source business model is acknowledged by Wall Street

Because the open source, scale-out, commodity approach to Big Data is fundamental to the new breed of Big Data technologies, open source now becomes a clear antithesis of the proprietary, scale-up, our-hardware-only, take-it-or-leave-it solutions. Unfortunately, the promises of international expansion, improved traction from sales force expansion, new products and alliances, will all fall on deaf ears of Wall Street analysts. Time to short the platform RDBMS and Enterprise Data Warehouse stocks.

4. Big Data and Cloud really means private cloud

Many claimed that 2013 was the “year of Big Data in the Cloud”. However, what really happened is that the Global 2000 immediately began their bare metal projects under tight control. Now that those projects are underway, 2014 will exhibit the next phase of Big Data on virtualized platforms. Open source projects like Serengeti for VSphere; Savanna for OpenStack; Ironfan for AWS, OpenStack, and VMware combined, or venture-backed and proprietary solutions like Bluedata will enable virtualized Big Data private clouds.

5. 2014 starts the era of analytic applications

Enterprises become savvy to the new reference architecture of combined legacy and new generation IT data infrastructure. Now it’s time to develop a new generation of applications that take advantage of both to solve business problems. System Integrators will shift resources, hire data scientists, and guide enterprises in their development of data-driven applications. This, of course, realizes the concepts like the 360 degree view, Internet of things, and marketing to one.

6. Search-based business intelligence tools will become the norm with Big Data

Having a “Google-like” interface that allows users to explore structured and unstructured data with little formal training is the where the new generation is going. Just look at Splunk for searching machine data. Imagine a marketer being able to simply “Google Search” for insights on their customers?

7. Real-time in-memory analytics, complex event processing, and ETL combine

The days of ETL in its pure form are numbered. It’s either ‘E’, then ‘L’, then ‘T’ with Hadoop, or it’s EAL (extract, apply analytics, and load) with new real-time stream-processing frameworks. Now that high-speed social data streams are the norm, so are processing frameworks that combine streaming data with micro-batch and batch data, performing complex processors on that data and feeding applications in sub-second response times.

8. Prescriptive analytics become more mainstream

After descriptive and predictive, comes prescriptive. Prescriptive analytics automatically synthesizes big data, multiple disciplines of mathematical sciences and computational sciences, and business rules, to make predictions and then suggests decision options to take advantage of the predictions. We will begin seeing powerful use-cases of this in 2014. Business users want to be recommended specific courses of action and to be shown the likely outcome of each decision.

9. MDM will provide the dimensions for big data facts

With Big Data, master data management will now cover both internal data that the organization has been managing over years (like customer, product and supplier data) as well as Big Data that is flowing into the organization from external sources (like social media, third party data, web-log data) and from internal data sources (such as unstructured content in documents and email). MDM will support polyglot persistence.

 10. Security in Big Data won’t be a big issue

Peter Sondergaard, Gartner’s senior vice president of research, will say that when it comes to big data and security that “You should anticipate events and headlines that continuously raise public awareness and create fear.” I’m not dismissing the fact that with MORE data comes  more responsibilities, and perhaps liabilities, for those that harbor the data. However, in terms of the infrastructure security itself, I believe 2014 will end with a clear understanding of how to apply those familiar best-practicies to your new Big Data platform including trusted Kerberos, LDAP integration, Active Directory integration, encryption, and overall policy administration.

Posted in Big Data.

Tagged with , , , , , , , , , , , , , .


SAP & Big Data

Gartner_DW_SAP

SAP customers are confused about the positioning between SAP Sybase IQ and SAP Hana as it applies to data warehousing. Go figure, so is SAP. You want to learn about their data warehousing offering, and all you hear is “Hana this” and “Hana that”.

It reminds me of the time after I left Teradata when the BI appliances came on the scene. First Netezza, then Greenplum, then Vertica and Aster Data, then ParAccel. Everyone was confused about what the BI appliance was in relation to the EDW. Do I need an EDW, a BI appliance, an EDW + BI appliance?

With SAP, Sybase IQ is supposed to be the data warehouse and Hana is the BI or analytic appliance that sits off to its side. Ok. SAP has a few customers on Sybase IQ, but are they the larger well-known brands? Lets face it….since its acquisition of Sybase in 2010, SAP has struggled with positioning it against incumbents like Teradata, IBM, and even Oracle.

SAP Roadmap

SAP_Roadmap

SAP’s move from exploiting it’s leadership position in enterprise ERP to exploring the new BI appliance and Big Data markets has been impressive IMHO. With acquisitions of EDW and RDBMS company, Sybase, in 2010 after earlier acquisition of BI leader, Business Objects, in 2007 was necessary to be relevant in the race to providing an end-to-end data infrastructure story. This was; however, a period of “catch-up” or “late entry” to the race.

The beginning of its true exploration began with SAP Hana and now strategic partnership with Hadoop commercialization company, Hortonworks. The ability to rise ahead of Data Warehouse and database management system leaders will require defining a new Gartner quadrant – the Big Data quadrant.

SAP Product Positioning

SAP_Product_PositioningLets look back in time at SAP’s early positioning. We have the core ERP business, the new “business warehouse” business, and the soon to be launched Hana business. The SAP data warehouse equation is essentially = Business Objects + Sybase IQ + Hana. Positioning Hana, as with most data warehouse vendors, is a struggle since it can be positioned as a data mart within larger footprints, or as THE EDW database altogether in smaller accounts. One would think that with proper guidelines, this positioning would be straightforward. But there is more than database size, and complexity of queries, but a very challenging variable of customer organizational requirements and politics that play into platform choice. As shown above, you can tell that SAP struggled with simplifying its message for its sales teams early on.

SAP Hana – More than a BI Appliance

SAP released its first version of their in-memory platform, SAP HANA 1.0 SP02, to the market on June 21st 2011. It was (and is) based on an acquired technology from Transact In Memory, a company that had developed a memory-centric relational database positioned for “real-time acquisition and analysis of update-intensive stream workloads such as sensor data streams in manufacturing, intelligence and defense; market data streams in financial services; call detail record streams in Telco; and item-level RFID tracking.” Sound familiar to our Big Data use-cases today?

As with most BI appliances back then, customers spent about $150k for a basic 1TB configuration (SAP partnered with Dell) for the hardware only – add software and installation services and we were looking at $300K, minimally, as the entry point. SAP started off with either a BI appliance (HANA 1.0) or a BW Data Warehouse appliance (HANA 1.0 SP03). Both of these using the SAP IMDB Database Technology (SAP HANA Database) as their underlying RDBMS.

BI Appliances come with analytics, of course

Hana_Analtics

When SAP first started marketing their Hana analytics, you were promised a suite of sophisticated analytics as part of their Predictive Analysis Library (PAL) which can be called directly in a “L wrapper” within an SQL Script. The inputs and outputs are all tables. PAL includes seven well known predictive analysis algorithms in several data mining algorithm categories:

  • Cluster analysis (K-means)
  • Classification analysis (C4.5 Decision Tree, K-nearest Neighbor, Multiple Linear Regression, ABC Classification)
  • Association analysis (Apriori)
  • Time Series (Moving Average)
  • Other (Weighted Score Table Calculation)

HANA’s main use case started with a focus around its installed base with a real-time in-memory data mart for analyzing data from SAP ERP systems. For example, profitability analysis (CO-PA) is one of the most commonly used capabilities within SAP ERP. The CO-PA Accelerator allows significantly faster processing of complex allocations and basically instantaneous ad hoc profitability queries. It belongs to accelerator-type usage scenarios in which SAP HANA becomes a secondary database for SAP products such as SAP ERP. This means SAP ERP data is replicated from SAP ERP into SAP HANA in real time for secondary storage.

BI Appliances are only as good as the application suite

Other use-cases for Hana include:

  • Profitability reporting and forecasting,
  • Retail merchandizing and supply-chain optimization,
  • Security and fraud detection,
  • Energy use monitoring and optimization, and,
  • Telecommunications network monitoring and optimization.

Applications developed on the platform include:

  • SAP COPA Accelerator
  • SAP Smart Meter Analytics
  • SAP Business Objects Strategic Workforce Planning
  • SAP SCM Sales and Operations Planning
  • SAP SCM Demand Signal Management

Most opportunities were initially “accelerators” with its in-memory performance improvements.

Aggregate real-time data sources

There are two main mechanisms that HANA supports for near-real-time data loads. First is the Sybase Replication Server (SRS), which works with SAP or non-SAP source systems running on Microsoft, IBM or Oracle databases. This was expected to be the most common mechanism for SAP data sources. There used to be some license challenges around replicating data out of Microsoft and Oracle databases, depending on how you license the database layer of SAP. I’ve been out of touch on whether these have been fully addressed.

SAP has a second choice of replication mechanism called System Landscape Transformation (SLT). SLT is also near-real-time and works from a trigger from within the SAP Business Suite products. This is both database-independent and pretty clever, because it allows for application-layer transformations and therefore greater flexibility than the SRS model. Note that SLT may only work with SAP source systems.

High-performance in-memory performance

HANA stores information in electronic memory, which is 50x faster (depending on how you calculate) than disk. HANA stores a copy on magnetic disk, in case of power failure or the like. In addition, most SAP systems have the database on one system and a calculation engine on another, and they pass information between them. With HANA, this all happens within the same machine.

 Why Hadoop?

SAP HANA is not a platform for loading, processing, and analyzing huge volumes – petabytes or more – of unstructured data, commonly referred to as big data. Therefore, HANA is not suited for social networking and social media data analytics. For such uses cases, enterprises are better off looking to open-source big-data approaches such as Apache Hadoop, or even MPP-based next generation data warehousing appliances like Pivotal Greenplum or similar.

SAP’s partnership with Hortonworks enables the ability to migrate data between HANA and Hadoop platforms. The basic idea is to treat Hadoop systems as an inexpensive repository of tier 2 and tier 3 data that can be, in turn, processed and analyzed at high speeds on the HANA platform. This is a typical design pattern between Hadoop and any BI appliance (SMP or MPP).

Screen Shot 2013-11-30 at 7.26.13 AM

SAP “Big Data White Space”?

Where do SAP customers need support? Where is the “Big Data White Space?”. SAP seems to think that persuading customers to run core ERP applications on HANA is all that matters. Are customer responding? Answer – not really.

Customers are saying they’re not planning to use it, with most of them citing high costs and a lack of clear benefit (aka use-case) behind their decision. Even analysts are advising against it - Forrester research said the HANA strategy is “understandable but not appealing”.

“If it’s about speeding up reporting of what’s just happened, I’ve got you, that’s all cool, but it’s not helping me process more widgets faster.”, SAP Customer.

SAP is betting its future on HANA + SaaS. However, what is working in SAP’s favor for the moment is the high level of commitment among existing (european) customers to on-premise software.

This is where the “white space” comes in. Bundling a core suite of well-designed business discovery services around the SAP solution-set will allow customers to feel like they are being listened to first, and sold technology second.

Understanding how to increase REVENUE with new greenfield applications around unstructured data that leverages the structured data from ERP systems can be a powerful opportunity. This means architecting a balance of historic “what happened”, real-time “what is currently happening”, and a combined “what will happen IF” all together into a single data symphony. Hana can be leveraged for more ad-hoc analytics on the combined historic and real-time data for business analysts to explore, rather than just be a report accelerator.

This will require:

  • Sophisticated business consulting services: to support uncovering the true revenue upside
  • Advanced data science services: to support building a new suite of algorithms on a combined real-time and historic analytics framework
  • Platform architecture services: to support the combination of open source ecosystem technologies with SAP legacy infrastructure

This isn’t rocket science. It just takes a focused tactical execution, leading with business cases first. The SAP-enabled Bid Data system can then be further optimized with cloud delivery as a cost reducer and time-to-value enhancer, along with a further focus around application development. Therefore, other white space includes:

  • Cloud delivery
  • Big Data application development

SAP must keep its traditional customers and SI partners (like CSC) engaged with “add-ons” to its core business applications with incentives for investing in HANA, while at the same time evolving its offerings for line of business buyers.

Some think that SAP can change the game by reaching/selling to marketers with new analytics offerings (e.g. see SAP & KXEN), enhanced mobile capabilities, ecosystem of start-ups, and a potential to incorporate its social/collaboration and e-commerce capabilities into one integrated offering for digital marketers and merchandisers.

Is a path to define a stronger CRM vision for marketers? It won’t be able to without credible SI partners who have experience with new media, digital agencies and specialty service providers who are defining the next wave of content- and data-driven campaigns and customer experiences.

Do you agree?

Posted in Big Data.

Tagged with , , , , , , , .


Infochimps, a CSC Company = Big Data Made Better

Hero-tile-template-announcement

What’s a $15B powerhouse in information technology (IT) and professional services doing with an open source based Big Data startup?

It starts with “Generation-OS”. We’re not talking about Gen-Y or Gen-Z. We’re talking Generation ‘Open Source’.

Massive disruption is occurring in information technology as businesses are building upon and around recent advances in analytics, cloud computing and storage, and an omni-channel experience across all connected devices. However, traditional paradigms in software development are not supporting the accelerating rate of change in mobile, web, and social experiences. This is where open source is fueling the most disruptive period in information technology since the move from the mainframe to client-server – Generation Open Source.

Infochimps = Open Standards based Big Data

Infochimps delivers Big Data systems with unprecedented speed, scale and flexibility to enterprise companies. (And when we say “enterprise companies,” we mean the Global 2000 – a market in which CSC has proven their success.) By joining forces with CSC, we together will deliver one of the most powerful analytic platforms to the enterprise in an unprecedented amount of time.

At the core of Infochimps’ DNA is our unique, open source-based Big Data and cloud expertise. Infochimps was founded by data scientists, cloud computing, and open source experts, who have built three critical analytic services required by virtually all next-generation enterprise applications: real-time data processing and analytics, batch analytics, and ad hoc analytics – all for actionable insights, and all powered by open-standards.

CSC = IT Delivery and Profession Services

When CSC begins to insert the Infochimps DNA into its global staff of 90,00 employees, focused on bringing Big Data to a broad enterprise customer base, powerful things are bound to happen. Infochimps Inc., with offices in both Austin, TX and Silicon Valley, becomes a wholly-owned subsidiary, reporting into CSC’s Big Data and Analytics business unit.

The Infochimps’ Big Data team and culture will remain intact, as CSC leverages our bold, nimble approach as a force multiplier in driving new client experiences and thought leadership. Infochimps will remain under its existing leadership, with a focus on continuous and collaborative innovation across CSC offerings.

I regularly coach F2K executives on the important topic of “splicing Big Data DNA” into their organizations. We now have the opportunity to practice what we’ve been preaching, by splicing the Infochimps DNA into the CSC organization, acting as a change agent, and ultimately accelerating CSC’s development of its data services platform.

Infochimps + CSC = Big Data Made Better

I laugh many times when we’re knocking on the doors of Fortune 100 CEOs.

“There’s a ‘monkey company’ at the door.”

The Big Data industry seems to be built on animal-based brands like the Hadoop Elephant. So to keep running with the animal theme, I’ve been asking C-levels the following question when they inquire about how to create their own Big Data expertise internally:

“If you want to create a creature that can breathe underwater and fly, would it be more feasible to insert the genes for gills into a seagull, or splice the genes for wings into a herring?”

In other words, do you insert Big Data DNA into the business savvy with simplified Big Data tools, or insert business DNA into your Big Data-savvy IT organization? In the case of CSC and Infochimps, I doubt that Mike Lawrie, CSC CEO, wants to be associated with either a seagull or a herring, but I do know he and his senior team is executing on a key strategy to become the thought leader in next-generation technology starting with Big Data and cloud.

Regardless of your preference for animals (chimpanzees, elephants, birds, or fish), the CSC and Infochimps combination speaks very well to CSC’s strategy for future growth with Big Data, cloud, and open source. Infochimps can now leverage CSC’s enterprise client base, industrialized sales and marketing, solutions development and production resources to scale our value proposition in the marketplace.

“Infochimps, a CSC company, is at the door.”

 Jim Kaskade
CEO
Infochimps, a CSC Company

Posted in Big Data, Cloud Computing.

Tagged with , , , , .


Real-time Big Data or Small Data?

big_little_bird

Have you heard of products like IBM’s InfoSphere Streams, Tibco’s Event Processing product, or Oracle’s CEP product? All good examples of commercially available stream processing technologies which help you process events in real-time.

I’ve been asked what I consider as “Big Data” versus “Small Data” in this domain. Here’s my view.

Real-Time Analytics Small Data Big Data
Data Volume None None
Data Velocity 100K events / day (<<1K events / second) Billion+ events / day (>>1K events / second)
Data Variety 1-6 structured sources AND 1 single destination (an output file, a SQL database, a BI tool) 6+ structured and 6+ unstructured sources AND many destinations (a custom application, a BI tool, several SQL databases, NoSQL databases, Hadoop)
Data Models Used for “transport” mainly. Little to no ETL, in-stream analytics, or complex event processing performed. Transport is the foundation. However, distributed ETL, linearly scalable in-memory and in-stream analytics are applied, and complex event processing is the norm.
Business Functions One line of business (e.g. financial trading) Several lines of business – to – 360 view
Business Intelligence No queries are performed against the data in motion. This is simply a mechanism for transporting transaction or event from the source to a database.Transport times are <1 second.Example: connect to desktop trading applications and transport trade events to an Oracle database. ETL, sophisticated algorithms, complex business logic, and even queries can be applied to the stream of events as they are in motion.  Analytics span across all data sources and, thus, all business functions.Transport and analytics occur in < 1 second.Example: connect to desktop trading applications, market data feeds, social media, and provide instantaneous trending reports. Allow traders to subscribe to information pertinent to their trades and have analytics applied in real-time for personalized reporting.

Want to see my view of Batch Analytics? Go Here.

Want to see my view of Ad Hoc Analytics? Go Here.

Here are a few other products in this space:

 

Posted in Big Data.

Tagged with , , , , , , , , , .


Ad Hoc Queries with Big Data or Small Data?

big-dog-little-dog

Do you think that you’re working with “Big Data”? or is it “Small Data”? If you’re asking ad hoc questions of your data, you’ll probably need something that supports “query-response” performance or, in other words, “near real-time”. We’re not talking about batch analytics, but more interactive / iterative analytics. Think NoSQL, or “near real-time Hadoop” with technologies like Impala. Here’s my view of Big versus Small with ad hoc analytics in either case.

Ad Hoc Analytics Small Data Big Data
Data Volume Megabytes – Gigabytes Terabytes (1-100TB)
Data Velocity Update in near real-time (seconds) Update in real-time (milliseconds)
Data Variety 1-6 structured data sources 6+ structured AND 6+ unstructured data sources
Data Models Aggregations with tens of tables Aggregations with up to 100s – 1000s of tables
Business Functions One line of business (e.g. sales) Several lines of business – to – 360 view
Business Intelligence Queries are simple, regarding basic transactional summaries/reports.Response times are in seconds across a handful of business analysts. 

 

Example: retrieve a customer’s profile and summarize their overall standing based on current market values for all assets.

 

This is representative of the work performed when a business asks the question “What is my customer worth today?”

 

The transaction is a read-only transaction. Questions vary based on what business analyst needs to know interactively.

Queries can be as complex as with batch analytics, but generally are still read-only and processed against aggregates. Queries span across business functions.Response times are in seconds across large numbers of business analysts.Example: retrieve a customer profile and summarize activities across all customer-touch points, calculating “Life-Time-Value” based on past & current activities.

This is representative of the work performed when a business asks the question “Who are my most profitable customers?”

 

Questions vary based on what business analyst needs to know interactively.

Want my view on Batch Analytics? Look here.

Want my view on Real-time analytics? Look here.

Here are a few products in this space:

Posted in Big Data.

Tagged with , , , , , .


Batch with Big Data versus Small Data

Big-Small

How do you know whether you are dealing with Big Data or Small Data? I’m constantly asked for my definition of “Big Data”. Well, here it is…for batch analytics, now addressed by technologies such as Hadoop.

Batch Analytics

Batch Analytics Small Data Big Data
Data Volume Gigabytes Terabytes – Petabytes
Data Velocity Updated periodically with non-real-time intervals Updated both in real-time  and through bulk timed intervals
Data Variety 1-6 structured sources 6+ structured AND 6+ unstructured sources
Data Models Store data without cleaning, transforming, or normalizing. Store data without cleaning, transforming, and normalizing. Then apply schemas based on application needs.
Business Functions One line of business (e.g. sales) Several lines of business – to – 360 view
Business Intelligence Queries are complex requiring many concurrent data modifications, a rich breadth of operators, and many selectivity constraints. However, they are applied to a simpler data structure.Response times are in minutes to hours, issued by one or maybe two experts.Example: determine how much profit is made on a given line of parts, broken out by supplier, by geography, by year.

 

Queries are complex requiring many concurrent data modifications, a rich breadth of operators, and many selectivity constraints. Queries span across business functions.Response times are in minutes to hours, issued by a small group of experts. 

Example: determine how much profit is made on a given line of parts, broken out by supplier, by geography, by year; and then determining which customers purchased the higher profit parts, by geography, by year; determining the profile of those high-profit customers; finding out what products purchased by high-profit customers were NOT purchased by other similar customers in order to cross-sell / up-sell.

Want to see my view on Ad Hoc and Interactive Analytics? Go here.

Want to see my view on Real-Time Analytics? Go here.

Here are a few other products in this space:

ICS Hadoop

Cloudera

MapR

Hortonworks

Pivotal

Intel

IBM

Wandisco

Posted in Big Data.

Tagged with , , , , .


Splice data scientist DNA into your existing team

Screen Shot 2013-05-07 at 2.38.52 PMAs organizations continue to grapple with big data demands, they may find that business managers who understand data may meet their “data scientist” needs better than the hard core data technologists.

There’s little doubt that data-derived insight will be a key differentiator in business success, and even less doubt that those who produce such insight are going to be in very high demand. Harvard Business Review called “data scientist” the“sexiest” job of the 21st century, and McKinsey predicts a shortfall of about 140,000 by 2018. Yet most companies are still clueless as to how they’re going to meet this shortfall.

Unfortunately, the job description for a data scientist has become quite lofty. Unless your company is Google-level cool, you’re going to struggle to hire your big data dream team (well, at least right now), and few firms out there could recruit them for you. Ultimately, most organizations will need to enlist the support of existing staff to achieve their data-driven goals, and train them to become data scientists. To accomplish this, you must determine the basic elements of data scientist “DNA” and strategically splice it into the right people.

READ MORE>>

Image credit: Thinkstock

Posted in Big Data.

Tagged with , , , .


Why the Pivotal Initiative’s Fate will Mirror VMware’s

Screen Shot 2013-04-24 at 8.16.00 AM

An Enterprise PaaS must truly be agnostic to the underlying elastic infrastructure, and fully support open standards. So the big question is whether the Pivotal Initiative will be able to break away from its roots with EMC and VMware and the associated ties to VSphere?

Lets itemize just a few of the major components of the stack from top to bottom:

  • Pivotal Labs: Besides the source of Paul Maritz‘s new company name, this is an agile software development consulting firm focused on Ruby on Rails, pair programming, test-driven development and behavior driven development. It is known for Pivotal Tracker, a project management and collaboration software package.
  • OpenChorus: real-time social collaboration on predictive analytics projects, allowing businesses to iterate faster and more effectively.
  • Cetas: End-to-End analytics platform from data ingestion, to data source connectors, to data processing and analytics, and visualization to recommendations.
  • Vfabric SpringSource: Eclipse-based application development framework for building Java-based enterprise applications.
  • Vfabric Data Director: Database provisioning, high availability, backup, and cloning. This product includes the ability to provision Hadoop on VSphere using open source project Serengeti (powered by the open source orchestration project Ironfan).
  • Vfabric Gemfire: An in-memory stream processing technology that combines the power of stream data processing capabilities with traditional database management. It supports ’Continuous Querying‘ which eliminates the need for application polling and supports the rich semantics of event driven architectures.
  • Vfabric RabbitMQ: Enterprise messaging middleware implementation of AMQP supporting a full range of Internet protocols for lightweight messaging— including HTTP, HTTPS and STOMP – enabling you to connect nearly any imaginable type of applications, components, or services.
  • Greenplum: An ad hoc query and analytics database. The Greenplum database is based on PostgreSQL. It primarily functions as a data mart / analytic appliance and utilizes a shared-nothing, massively parallel processing (MPP) architecture. It has a parallel query optimizer that converting SQL AND MapReduce into a physical execution plan.
  • Pivotal HD (Hadoop Distribution): The distribution is competitive with Cloudera. EMC (now Pivotal) created their own distribution so it could improve query response time (but this occurred before they were aware of the introduction of Impala. Many believe that Pivotal HD was created solely to boost struggling sales of its Greenplum software and appliances.
  • Cloudfoundry: an open source cloud computing Platform as a service (PaaS) software written in Ruby.
  • Bosh: an open source tool chain for release engineering, deployment and lifecycle management of large scale distributed services. It was initially developed to manage the Cloud Foundry PaaS, but as it is a large scale distributed application, bosh turned into a genreal purpose orchestration tool chain that can handle any application. It currently bosh supports four different IaaS providers: OpenStack, AWS, vSphere & vCloud.
  • IaaS – OpenStack, AWS, vSphere & vCloudSupport starts with vCloud and VCenter APIs, and extends with later additions of OpenStack and AWS (via the Bosh orchestration layer).

So when you look at this sample of technologies (and I’m sure I’m leaving many off the list), you might see through the EMC/VMware veil….to see a collection of open source projects. We’ll see how Paul Maritz pulls this all together – clearly a powerful number of teams and technology.

So why do I refer to VMware’s “fate”…well, it’s no secret that VMware’s business has begun to plateau under the pressure from open projects like OpenStack. Did Paul get out right in the “nick of time”? Can he create a long-term sustainable business on open source?

Posted in Big Data, Cloud Computing.

Tagged with , , , , , , , , , , , , , , , .


Big Data and Banking – More than Hadoop

Jim's_BankFraud is definitely top of mind for all banks. Steve Rosenbush at the Wall Street Journal recently wrote about Visa’s new Big Data analytic engine which has changed the way the company combats fraud. Visa estimates that its new Big Data fraud platform has identified $2 billion in potential annual incremental fraud savings. With Big Data, their new analytic engine can study as many as 500 aspects of a transaction at once. That’s a sharp improvement from the company’s previous analytic engine, which could study only 40 aspects at once. And instead of using just one analytic model, Visa now operates 16 models, covering different segments of its market, such as geographic regions.

Do you think Visa, or any bank for that matter, uses just batch analytics to provide fraud detection? Hadoop can play a significant role in building models. However, only a real-time solution will allow you to take those models and apply them in a timeframe that can make an impact.

The banking industry is based on data – the products and services in banking have no physical presence – and as a consequence, banks have to contend with ever-increasing volumes (and velocity, and variety) of data. Beyond the basic transactional data concerning debits/credits and payments, banks now:

  • Gather data from many external sources (including news) to gain insight into their risk position;
  • Chart their brand’s reputation in social media and other online forums.

This data is both structured and unstructured, as well as very time-critical. And, of course, in all cases financial data is highly sensitive and often subject to extensive regulation. By applying advanced analytics, the bank can turn this volume, velocity, and variety of data into actionable, real-time and secure intelligence with applications including:

  • Customer experience
  • Risk Management
  • Operations Optimization

It’s important to note that applying new technologies like Hadoop is only a start (it addresses 20% of the solution). Turing your insights into real-time actions will require additional Big Data technologies that help you “operationalize” the output of your batch analytics.

Customer Experience

Customer-Experience-Management-Customer-Centric-Organization-copyBanks are trying to become more focused on the specific needs of their customers and less on the products that they offer. They need to:

  • Engage customers in interactive/personalized conversations (real-time)
  • Provide a consistent, cross-channel experience including real-time touch points like web and mobile
  • Act at critical moments in the customer sales cycle (in the moment)
  • Market and sell based on customer real-time activities

Noting a general theme here? Big Data can assist banks with this transformation and reduce the cost of customer acquisition, increase retention, increase customer acceptance of marketing offers, increase sales by targeted marketing activities, and increase brand loyalty and trust. Big Data presents a phenomenal opportunity. However, the definition of Big Data HAS to be broader then Hadoop.

Big Data promises the following technology solutions to help with this transformation:

  • Single View of Customer (all detailed data in one location)
  • Targeted Marketing with micro-segmentation (sophisticated analytics on ALL of the data)
  • Multichannel Customer Experience (operationalizing back out to all the customer touch points)

Risk Management

Quality-Risk-ManagementRisk management is also critically important to the bank. Risk management needs to be pervasive within the organizational culture and operating model of the bank in order to make risk-aware business decisions, allocate capital appropriately, and reduce the cost of compliance. Ultimately, this means making data analytics as accessible as it is at Yahoo! If the bank could provide a “data playground” where all data sources were readily available with tools that were easy to use…well, lets just say that new risk management products would be popping up left and right.

Big Data promises a way of providing the organization integrated risk management solutions, covering:

 

  • Financial Risk (Risk Architecture, Data Architecture, Risk Analytics, Performance & reporting)
  • Operational Risk & Compliance
  • Financial Crimes (AML, Fraud, Case Management)
  • IT Risk (Security, Business Continuity and Resilience)

The key is to focus on one use-case first, and expand from there. But no matter which risk use-case you attack first, you will need batch, ad hoc, and real-time analytics.

Operations Optimization

operations_managementLarge banks often become unwieldy organizations through many acquisitions. Increasing flexibility and streamlining operations is therefore even more important in today’s more competitive banking industry. A bank that is able to increase their flexibility and streamline operations by transforming their core functions will be able to drive higher growth and profits; develop more modular back-room office systems; and respond quickly to changing business needs in a highly flexible environment.

This means that banks need new core infrastructure solutions. Examples might involve reducing loan origination times by standardizing its loan processes across all entities using Big Data.  Streamlining and automating these business processes will result in higher loan profitability, while complying with new government mandates.

Operational leverage improves when banks can deliver global, regional and local transaction and payment services efficiently and also when they use transaction insights to deliver the right services at the right price to the right clients.

Many banks are seeking to innovate in the areas of processing, data management and supply chain optimization. For example, in the past, when new payment business needs would arise, the bank would often build a payments solution from scratch to address it, leading to a fragmented and complex payments infrastructure. With Big Data technologies, the bank can develop an enterprise payments hub solution that gives a better understanding of product and payments platform utilization and improved efficiency.

Are you a bank and interested in new Big Data technologies like HadoopNoSQL datastores, and real-time stream processing? Interested in one integrated platform of all three?

 

Posted in Big Data.




Switch to our mobile site