Real Time CTools Dashboards
If you have read any of my previous blogs, you will have noticed that I like the slightly unconventional and challenging. My last example was a real time dashboard showing the status of the London Underground system using Pentaho Business Analytics Enterprise Edition. In this post I'm repeating the "Real Time" theme but using CTools for the front end.
I've split the post into multiple sections as it's rather a lot to put in a single post.
I've split the post into multiple sections as it's rather a lot to put in a single post.
- Part I Covers the data integration and data sourcing (This post)
- Part II Covers the front end.
CTools? What's that then?
Unless you have been living on another planet for the last 5 years you will surely have come across the great work being produced by WebDetails and others. Over the last few months I've been fortunate enough to work with Pedro's Alves and Martins and the rest of the WebDetails team. I've been inspired to see what I could put together using CTools and the other parts of the Pentaho toolset. This came along at the same time that we were running an internal competition in the sales engineering group to create a CTools dashboard, so with some assistance from Seb and Leo, I created something a little different.
Real time? No problem!
One of the really powerful features of CTools is the ability to use a Pentaho Data Integration transform as a direct data source. As PDI can connect to practically anything at all, transform it and output it as a stream of data, it means you can put practically any data source behind a CTools dashboard, MongoDB, Hadoop, Solr or perhaps a Restful API? Not only that you can use these multiple data sources in a single transformation and blend the results in real time. In effect it's using a tool that had its roots in ETL as an ETR tool, "Extract Transform Report", or to look at it another way, an ultra powerful visual query builder for big data (or any data for that matter).
The first bit is relatively easy, create a search string, get an authentication token, get the results then clean up. To enrich the data feed a little more I've added in a WEKA scoring model for enhancing the data stream with sentiment analysis. At this point I've got a raw feed with the text and some details of the tweeters and some sentiment. To enliven the dashboard, a few aggregates and metrics are needed. One option is to add additional steps to the transformation to create additional aggregations, but I'd rather just run the one twitter query to create a data set and then work with that. There is a way to do that...
To get at the URL open the CDA page associated with the dashboard, select the data access object, and look at the query URL.
This URL can then be used as a source in a transformation, in this case I put it in the HTTP Client step:
Then using the JSON parser the individual fields can be split out again:
I can then use a range of sorting, filtering and aggregation steps to create the data views that I want. Just branching the data flow and copying the data allows you to create multiple views in a single transformation, each of which can be picked separately by a new CDA data source.
That pretty much covers the data integration, at this point I've used PDI as a visual querying tool from a WEB API and the same tool works to query the cached results of the first query - this is powerful stuff. In addition to this (with virtually no effort at all) I'm doing sentiment analysis on twitter using a WEKA model. This process could be enhanced by using MongoDB as a repository for longer term storage of each of the search results. Allowing the possibility of using multiple "iterative" guided searches to build up a results "super-set" that could then be analyzed further.
In the next post I'll talk about building the CTools front end using CDE. I'll include some rather distinct styling, and some very flashy but ultra simple CSS and Javascript tricks.
Querying the CDA cache
This is where we can use a novel approach to get at the results of the last query. In CTools the results of each query are held in the CDA cache. It's possible to access the results of the CDA cache using its URL directly:To get at the URL open the CDA page associated with the dashboard, select the data access object, and look at the query URL.
This URL can then be used as a source in a transformation, in this case I put it in the HTTP Client step:
Then using the JSON parser the individual fields can be split out again:
I can then use a range of sorting, filtering and aggregation steps to create the data views that I want. Just branching the data flow and copying the data allows you to create multiple views in a single transformation, each of which can be picked separately by a new CDA data source.
That pretty much covers the data integration, at this point I've used PDI as a visual querying tool from a WEB API and the same tool works to query the cached results of the first query - this is powerful stuff. In addition to this (with virtually no effort at all) I'm doing sentiment analysis on twitter using a WEKA model. This process could be enhanced by using MongoDB as a repository for longer term storage of each of the search results. Allowing the possibility of using multiple "iterative" guided searches to build up a results "super-set" that could then be analyzed further.
In the next post I'll talk about building the CTools front end using CDE. I'll include some rather distinct styling, and some very flashy but ultra simple CSS and Javascript tricks.