Reading Data using JDBC
connection into CAS:
CAS is a component that comes as a separate installer with
Oracle Endeca. It stands for Content Acquisition System. As the name suggests,
the CAS is used to acquire data from vast data sources and then fetched to the
Endeca forge. One such data sources can is JDBC.
To fetch data into CAS, we need to perform couple of
operations:
·
Create Data sources
·
Crawl through the data source
Creating a Data Source:
The data source creation is a simple task which can be primarily accomplished using
CAS console in Endeca workbench
Click on Administrative tool dropdown.
Here you will see the various administrative options. Click
on Data sources option.
Here you will see CAS console. You need to click on add data
sources option to add new data source.
Select JDBC option to select JDBC data source
You will need to provide SQL query and the Key Column. This
key column will correspond to record spec option.
But this UI option might not be as useful when trying to
automate the process. So to create a data source using command line, do following:
To create the data sources from command line, we need a xml
which follows the dtd as needed. Wait wait wait!! Don’t rush on google for the DTDs and xml formats. We have a
way out:
1)
As you have created a data source already using
the UI way shown above , you can use following command to get the xml out of
the data source created using workbench UI :
cd /endeca/CAS/3.1.2/bin
-bash-3.2$ ./cas-cmd.sh getCrawl --id <existing-job-name>
> <some-location>/temp/jobJdbcCrawl.xml
<existing-job-name> is the same one
created using the UI.
We will use this xml to get the format of
the crawl
2)
Make a copy of this xml file and make changes to
the xml copy as per your needs like name of crawl, connections etc. If
connection details and stuff you want to use is to be same as used for the data
source in CAS console, need not change anything.
For JDBC connection, you will need to add
the corresponding value for the ‘password’ key.
3)
Run following command to create the crawl
:
-bash-3.2$ ./cas-cmd.sh createCrawls -f
/endeca/temp/jobJdbcCrawl.xml
Crawl through the data source: Crawling is even simpler. For
UI it is as simple as a click of a
button. Click on the start button under acquire data header in CAS console
against the data sourec you want to crawl
Or command line :
cd /endeca/CAS/3.0.2/bin
cas-cmd startCrawl -id CrawlName [-full] [-h HostName]
[-p PortNumber] [-l true|false]
./cas-cmd.sh startCrawl -id jobRecStore -full -h
localhost -p 8500 -l false
Voila!! The data has moved from DB to CAS and ready to be
ingested to the Pipeline.
Resolve :
The resolve for this issue is to add the drivers ojdbc6.jar (latest version) at <endeca-installation-folder>/CASS/<version-num>/lib/cas-server-plugins/cas-jdbc-datasource and restart the server






One question how to get all connectors in CAS console. What all configurations needs to be done. Please let me know.
ReplyDeleteSandeep