Saturday, 23 March 2013

Endeca Information Discovery 3.x breaks cover

While just looking at the Endeca documentation I noticed that Endeca 3.x has appeared.

I've not had time to do any in-depth digging around, but there look to be some nice headlines:

  • New chart types in studio (Bubble and Scatter look to be the biggest changes)
  • The documentation looks to have been tied up and better structured
  • End users can add their own data from spreadsheets to analyse - the data provisioning service
  • The Content Acquisition System has become the Integrator Acquisition System 
The new chart types may not seem to be that significant but I've always considered them to be two of the most useful ways of presenting many types of data.  I know that in some use cases I've come across their absence was a major blocker.

The data provisioning service is a major enhancement for many use cases.  I could go on at length about the pros and cons of allowing end users or analysts to "bring your own data" but not even having it as an option was a major blocker.  

I always felt that previous versions of the Endeca documentation had something of a "thrown together" quality about them in some areas, so it looks like there has been some tidying up.

The old content acquisition system, seems to have transformed into the Integrator Acquisition system.  I'm not sure what else has changed there.

I'll try to go through the new features in more detail as I get chance to explore them.  But at first glance this looks to be a major step forward, making this tool much more applicable in the general "data discovery" space against the likes of Tableau and Spotfire,  add in the unstructured power of of the Salience engine and I think Endeca looks all set for lift off.  

More information is available on the Oracle website.

Apologies for not being able to put in more detail at this time but I've got a half marathon to run in the morning.






Monday, 18 March 2013

So what is Endeca for?

I'm taking a slight diversion from my planned route with this post.  I was looking at  some of the technical aspects of how to use Endeca, but talking with various people there still seems to be some confusion about what Endeca is or does or more importantly where it fits in to an information management ecosystem.  My impression is that even Oracle are still not quite sure, some of the industry watchers see it as just an enterprise search tool.   So lets try to get to the root of what Endeca is and what are the right use cases for it.

The term BI is like "Big Data" becoming increasingly abused.  It used to be relatively simple to define what BI was, you had sources, ETL tools, Warehouses reporting, OLAP and dashboards.  Then the term "Enterprise Analytics" came along, so it that BI as well?  Usually this involved the implementation of some form of mathematical model to predict future performance based on past performance and a set of variables. So now BI covers what did happen, and what might happen.  But to enact business change and get the desired results of increased sales, improved retention, reduced costs, or whatever the objective might be there is still the human evaluation and partial speculation in the processing of the data.  There is still the gap of knowing why something happened, while I'd freely admit it is possible to refine descriptive BI to provide some answers to the question why did this happen, it is a rather slow and laborious process.  The BI system architects, analysts and developers needed to know where to get the answers to know why and how to process them to get the insight into why events happened.

In the reality of a very competitive business world people need to make decisions much faster than the ability of the data scientists to find the probable cause and the developers can add it into the BI platform.  Which is where tools like Qlikview appear on the scene.  By delivering the nirvana of self service BI business users could follow up on a hunch and confirm it much more quickly than previously.  Used correctly, products like Qlikview, Tibco Spotfire and Tableau  add immense business value alongside the descriptive BI platforms such as OBIEE, Cognos and Business Objects.  So where does Endeca sit in this space?  

My biggest concern about Endeca revolves around Oracles definition of what its for.  The official line would appear to be it's a tool for "Unlock Insights from Any Source".  But I'm not convinced that they have yet figured out how to unlock the value pitch from this proposition.  It is an expensive product and in any business you must be able to put together a reasonable business case to demonstrate that you will get a return on your investment.  Which straight away leads to "what insights?  and "what are they worth to us?"  finally "does it have to be built by IT?" and I think it's here that the problem lies.  Possibly in the parts of the world where the economy is showing signs of recovery and business leaders are less risk averse the "lets take the risk" approach might come off.    Currently in the UK this is certainly not the case, money is tight in most businesses and management is historically more conservative.  Can end users use Endeca on their own without IT involvement?  Certainly not yet, Clover ETL is a nice tool but it's still beyond the skill-set of most non-techies.So Oracle still have some work to do here, there needs to be a real means of demonstrating value to the business if Oracle are to get their UK sales.  

So lets see if I can help them out a little.  I rather like the concept of Endeca and technically it's very clever, so I'd like to see it succeed.  So first lets look at the business cases where we would use Endeca, or actually as a diversion where we would not use it.  If we're looking at just structured data of reasonable quality then Qlikview, Spotfire or Tableau come higher up the list of solutions.  Just from the perspective of product maturity and feature set all three are more capable than Endeca as it comes out of the box.  So do we need to be looking into the realms of "unstructured" or weakly typed data?  There are three aspects to this, where does this unstructured data come from and how do we get it into Endeca, what value is in the data and how do we extract it?

From a business perspective generally unstructured data could appear in a few places: 
  • email 
  • call logs
  • online publications
  • forums
  • twitter
  • special interest websites 
  • review sites
So is there anything of value in that lot?  If we look at the top business applications of Text Analytics:

  • Product/Brand/Reputation Management
  • Customer Experience Management
  • Search or question answering
  • "other research"
  • Competitive intelligence
There are some potential areas for overlap here, but in email communication we can strike the top two business applications.  In Emails from customers there may well be useful insight that could help identify problems with products and services early  that might adversely affect product or brand reputation.  Equally we might resolve issues around customer service experience by identify problematic customer service processes or people.  This is tightly targeted data of known context so it's reasonably easy to see how it could be mined for value.  To extract this value we could just use a simple white-list tagger of product names and have the metric counting the number of occurrences, any sudden change indicates something worth investigating.  But whats the value of this data? In high value consumer goods industries this would be about protecting a brands reputation,  almost certainly enough to justify the investment.  In service industries with high churn rates such as mobile telephony again there is probably value from brand protection, but there is probably a limit to the volume of email with more service calls via voice.  So this is rather simple analysis of "key word counting"  to produce an actionable metric, hardly rocket science but there is probably value there.    

So away from the product white-listing  where do we go?  Particularly in the rather more qualitative area of customer service "quality of engagement" this is not an easy problem to solve.  Which is where Lexalytics Salience engine steps in.  This is an extra licensed component in Endeca, but could be where a lot more value can be unlocked.  By scoring the overall sentiment of a piece of text, deriving metrics and picking out themes  actionable intelligence is created.  

The email and call log analytics is a very good use case, and entirely inside the corporate firewall.  Looking outside the firewall we can begin to address some of the other problems by looking at forums, special interest websites and review sites.  There are plenty of off the shelf sentiment analysis feeds such as  the salesforce marketing cloud or Radian6 as it was previously known.  This type of product is a feed of data for sites and systems that they choose to monitor and how they chose to monitor it.  So if you needed to answer more specific problems you either need to start getting highly skilled programmers, data scientists and analysts involved, or look at the Endeca content acquisition capabilities along with sentiment analysis.  This is now beginning to take you beyond conventional sentiment analysis as consumed by most marketing departments.  These sort of capabilities would have helped many companies get early warning of problems and issues that blindsided them, but is it possible to make the business case for this sort of investment ahead of time?  That is a more difficult case to make, building a business case on the "unknown, unknowns" is never going to be easy.

Is there a market for Endeca?  Undoubtedly, but I don't think it looks quite like what Oracle are trying to sell into, certainly in the UK.  In the hands of a data scientist or data analyst Endeca can do incredible things, but what it cannot yet do is allow end users to do their own analysis on "Any source".  So What else does Endeca need to really fly?  The methods of reading data need to improve, there needs to be end user level "loading of any data", while I find Clover ETL easy to use, most end users will not.  There are two further significant omissions; statistical analysis in the form of R and better visualisations.  While it is possible to do statistical analysis in EQL it is a rather painful business, requiring the developer to produce the functions from base mathematical functions.  The limited range of visualisations is also frustrating, the absence of bubble charts is a major omission.  

Is there anything that can be dome outside of the Endeca platform to address some of it's shortcomings? Possibly there is.  The system has many APIs and developer indexes so the problem of visualisation might be solvable.  The dynamic metadata capabilities of Clover could possibly give a way in for "any data" creating an end user data load capability. 

I'm sure with time Endeca will find its place in the information management ecosystem, I just hope it does not get stuck just being branded as "an enterprise search tool" because it is much more than that, or even caught up in the big data hype and rush towards Hadoop in some circles.