Sunday, 3 February 2013

Agile BI with Endeca - Part 2

Yesterday I looked at possible use cases for Endeca in Enterprise environments, today I'm going to look at possible starting points for taking advantage of the capabilities it offers.

Firstly I'll take a look at your options for licensing if you want to evaluate Endeca, then move on to some options for hosting an evaluation before moving onto an outline of setting up a system and getting started.

First off, Oracle Endeca Information Discovery is a commercial package, obviously this means you need a license to use it.  There are time limited Oracle evaluation licenses available, and if you talk to your Oracle account manager nicely I'm sure trial licenses can be arranged.  But I'll leave that as an exercise for the reader.   In future posts I'm planning on looking at the possible options for recreating some of the Endeca capabilities using community open source packages, but that's for the future.

So what do you need to install Endeca on.  While we'd all love to have an Exalytics box sat under our desks, that's not really very practical for most.  Also, if you go and talk to your infrastructure department and ask for a server, there is a lot of sucking of teeth and a question of how many months time do you want it in?  So it's time to be a little creative.   I'm sure that a lot of you are familiar with Amazons cloud service, if you are not then I'll briefly explain (I'm no AWS expert, I know as much as I need to to get by).  Amazon Web Services is an offering from Amazon that enables you to create and use virtual computing services via a number of methods.  EC2 allows you to create virtual servers and access them in over the web.  You pay for the server by the hour and shut it down when you do not need it.  This makes it perfect for evaluations and demos.   In addition to this instances come in a range of sizes and prices, making it possible to start small at low cost and then move up to the more costly options.

One word of warning here, there are differing types of images, those based on instance store volumes and those based on EBS.  Instance store volumes are lost when the instance shuts down, while useful in some instances they are not really suitable for what we need here.

So for a start you need an Amazon AWS account, I'm not going to go through the details of how to get started because enough has been written on that subject already.

To get started we need an EC2 instance.  There are a number of different types available but I started with a 64 bit, Red Hat EBS backed m1.medium image,   that gives 1 virtual core, 3.75GB Ram.  I also added a second 20GB EBS volume to use for application data storage.  I went for Red Hat because it's a supported OS for installing Endeca (so hopefully no compatibility issues) , and I'm a Linux person by preference.

Once you have your image created there are a few tasks to do, I'm not going to give step by step instructions for the Linux admin tasks, there are plenty of references out there already on how to do this.

First thing after booting is to login using your SSH client.  This can be done using Actions->Connect from the EC2 console instances page.

The next task I performed was to create a mount point and edit the /etc/fstab file to mount the extra EBS storage at boot time.  I created this as /oracle but you can put this where you like, /apps might be a more appropriate name.

You are also going to need an X windows system to be able to run the Integrator application locally on the server.  Instructions for getting an X windows system installed can be found here.

At this point I'll also stress that we only have the root user on the server, this is not a good idea. You really need to create user/group to run Endeca.

Oracle Endeca is available from the Oracle software delivery Cloud (you will have to register, remember Endeca is not free, but there is an evaluation trial license), there are a bewildering array of downloads but as a minimum you need:

  • Endeca Server - the "database"
  • Information Discovery Studio - the web interface
  • Information Discovery Integrator - the ETL tool

Again I'm not going to give you step by step instructions on installing Endeca because Oracle have already done that, their documentation is available here.

But as an outline, start with installing the server, then Integrator and Studio.

If you want any demo data in your system you will need to follow the quick start guide.  This helps you get out of the blocks and see something running.  I used VNC to access Integrator on X Windows to run the quick start data load.  This can be done by using a tunnel for port 5901 on your ssh session, information on how to do that is here.

The only two "gotchas" that you might hit are the Red Hat firewall being closed by default, so you need to open up port 8080.  This how-to should point you in the right direction.  Additionally you will need to ensure that your Amazon network security group has port 8080 open.

If you start the server and the portal (see the installation guides)  you should be able to login to your server at its public url on port 8080.  You should then see the login page, the default login details are in the studio installation guide.

From there I'd suggest that you go through the excellent YouTube videos on how to get started.

Next time I'll follow up with a few practical details on how to startup Endeca at boot time, the hosting ports and how to proxy through apache.  Then I'll go through a few practical steps to build real apps a little more independently than the  "just copy the example file" used on the getting started videos.

No comments:

Post a Comment