Mayank Batra's Technology Blog: Read Data from JDBC connection into endeca CAS

Reading Data using JDBC connection into CAS:

CAS is a component that comes as a separate installer with Oracle Endeca. It stands for Content Acquisition System. As the name suggests, the CAS is used to acquire data from vast data sources and then fetched to the Endeca forge. One such data sources can is JDBC.

To fetch data into CAS, we need to perform couple of operations:

· Create Data sources

· Crawl through the data source

Creating a Data Source: The data source creation is a simple task which can be primarily accomplished using CAS console in Endeca workbench

Click on Administrative tool dropdown.

Here you will see the various administrative options. Click on Data sources option.

Here you will see CAS console. You need to click on add data sources option to add new data source.

Select JDBC option to select JDBC data source

Provide all the details asked to connect to the JDBC data source

You will need to provide SQL query and the Key Column. This key column will correspond to record spec option.

But this UI option might not be as useful when trying to automate the process. So to create a data source using command line, do following:

To create the data sources from command line, we need a xml which follows the dtd as needed. Wait wait wait!! Don’t rush on google for the DTDs and xml formats. We have a way out:

1) As you have created a data source already using the UI way shown above , you can use following command to get the xml out of the data source created using workbench UI :

cd /endeca/CAS/3.1.2/bin

-bash-3.2$ ./cas-cmd.sh getCrawl --id <existing-job-name> > <some-location>/temp/jobJdbcCrawl.xml

<existing-job-name> is the same one created using the UI.

We will use this xml to get the format of the crawl

2) Make a copy of this xml file and make changes to the xml copy as per your needs like name of crawl, connections etc. If connection details and stuff you want to use is to be same as used for the data source in CAS console, need not change anything.

For JDBC connection, you will need to add the corresponding value for the ‘password’ key.

3) Run following command to create the crawl :

-bash-3.2$ ./cas-cmd.sh createCrawls -f /endeca/temp/jobJdbcCrawl.xml

Crawl through the data source: Crawling is even simpler. For UI it is as simple as a click of a button. Click on the start button under acquire data header in CAS console against the data sourec you want to crawl

Or command line :

cd /endeca/CAS/3.0.2/bin

cas-cmd startCrawl -id CrawlName [-full] [-h HostName] [-p PortNumber] [-l true|false]

./cas-cmd.sh startCrawl -id jobRecStore -full -h localhost -p 8500 -l false

Voila!! The data has moved from DB to CAS and ready to be ingested to the Pipeline.

-----------------------------------------------------------------------------------------------------------------------------------

Issue you might run into :

You might run into following issue while saving the Data store:

Resolve :
The resolve for this issue is to add the drivers ojdbc6.jar (latest version) at <endeca-installation-folder>/CASS/<version-num>/lib/cas-server-plugins/cas-jdbc-datasource and restart the server

Mayank Batra's Technology Blog

Friday, April 18, 2014

Read Data from JDBC connection into endeca CAS

1 comment: