Tuesday, May 28, 2013

Oracle ATG – Endeca integration: Baseline Index process troubleshooting

This article contains troubleshooting steps for when an Endeca Baseline Index, as manually launched via the ATG Dyn Admin | ProductCatalogSimpleIndexingAdmin component, doesn't complete successfully.

All of this information is from my own experience and troubleshooting in trying to get an ATG-Endeca integration for a new ATG/Endeca application wired up and working successfully. (Which I did eventually accomplish, but not until after working through a fair number of unexpected issues, as should be apparent from the list below!) 

So, disclaimer: The suggested resolutions may or may not be “best practice” or otherwise work well for your own application. I’m just posting this information in hopes that it may be helpful; I certainly wished at points that there was more ATG-Endeca integration help available on the web at times, while I myself was initially troubleshooting my way through each one of these issues!


Q: The ProductCatalogSimpleIndexingAdmin doesn't exist in Dyn Admin. Searching for it returns 0 results. Browsing for it shows that the /atg/endeca subtree doesn't exist.

1. The "ATG-Endeca integration" feature might not be installed in ATG. This can be done with the atgdir/home/bin/sim.sh utility.

2. The application startup might not have included the required Nucleus modules for the Endeca indexing integration. In my situation, I found that I needed to add the following Make sure the following Nucleus components are included with the application server startup – these may or may not be the specific set of modules that are appropriate for your ATG application:

  • DAF.Endeca.Index
  • DCS.Endeca.Index

Q: In Dyn Admin, the ProductCatalogSimpleIndexingAdmin component does come up, but the special UI that normally appears at the top of that component's Dyn Admin page, that allows a Baseline Index to be launched and shows the status of the process, doesn't appear at all.

Try inspecting the application startup log for errors and warnings.  I saw this happen when my product-sku-output-config.xml file contained an extra “-->” token; this was reported as a parse error in the application startup logs.


Q: Errors about missing tables appear in the log, including table srch_site_content, both at server startup and when trying to run Baseline Index.

I ended up needing to manually run some ddl files in my local dev environment to resolve this. I ran:

On core:

  • search_site_ddl.sql

On cata and catb:

  • versioned_search_site_ddl.sql

These scripts are packaged with ATG. Look for them in one of the descendant directories of the ATG_HOME directory.


Q: Under JBoss, trying to run Baseline Index via ProductCatalogSimpleIndexingAdmin fails. Exception appears in the log: "atg.repository.search.indexing.IndexingException: java.lang.NoSuchMethodError: org.apache.cxf.transport.AbstractTransportFactory.<init>(Ljava/util/List;)"

This is apparently due to a version conflict between the version of CXF that comes packaged with JBoss, and the version of CXF that ATG assumes will be available to it.

I spent some time looking into how to disable or otherwise not use the CXF version bundled with JBoss (and use ATG's version of CXF instead), but never found a solution. I ended up switching my local environment to use WebLogic instead or JBoss as a workaround.

Here's a post on the Oracle forums with this same problem (and no solution, as of 5/2013): https://forums.oracle.com/forums/thread.jspa?messageID=10622176


Q: The second portion of the Baseline Index process (RepositoryExport) fails immediately.

Possible additional symptom in log: /atg/commerce/endeca/index/ProductCatalogSimpleIndexingAdmin ---java.lang.RuntimeException: org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 30000 ms

There might be a problem with the connectivity between ATG and Endeca CAS – possibly a configuration issue or networking issue. Check:

1. In the DimensionDocumentSubmitter, SchemaDocumentSubmitter, and DataDocumentSubmitter components, are the CASHostName and CASPort properties set properly? If not, make the changes in the corresponding .properties files for each of those components (in ATG 10.1.1), or in the /atg/endeca/index/IndexingApplicationConfiguration/ component (in ATG 10.1.2).

2. Try pinging the target Endeca CAS server from the command line on the ATG machine to verify that it is accessible on the network. 

3. Verify that DNS resolution isn’t misconfigured such that the CAS hostname is resolving to an IP address of some server other than the actual CAS server.


Q: The second portion of the Baseline Index process (RepositoryExport) fails immediately. The log includes "Response-Code: 500"

The "500" here appears to be an HTTP 500 ("Internal server error") being returned from Endeca CAS from the web service call. I've seen this happen several times when attempting to run a Baseline Update after a previous Baseline Update failed midway through, and when the indexing process is being run from the ATG CA server.

The only workaround I know of at this point is to bounce the CAS server. Quick instructions to bounce CAS (in my Endeca environment; specifics for your environment may vary):

  1. ssh to the Endeca indexing server.
  2. Change directory to endeca/CAS/version number/bin
  3. ./cas-service-shutdown.sh
  4. nohup ./cas-service.sh &

Q: The second portion of the Baseline Index process (RepositoryExport) hangs for a while, and then fails with no records successfully processed. An error related to network connectivity appears in the ATG server log.

1. Is CAS running? Log on to the Endeca indexing server and run:

ps -ef | grep cas

If there's no result, then start CAS.

2. Alternatively, this might indicate a DNS issue. Make sure that when you ping the Endeca indexing server from your ATG server, the correct IP address is being returned.


Q: java.io.IOException: http://[endeca host]/[endeca app name]en_en_dimvals/?wsdl returned response code 404

This appears to be caused by an incorrect defaultRecordStoreName and/or endecaBaseApplicationName entry in the DataDocumentSubmitter, DimensionDocumentSubmitter, and/or SchemaDocumentSubmitter components (ATG 10.1.1) or in the IndexingApplicationConfiguration component (ATG 10.1.2). Check on those values for those components in Dyn Admin and/or in the application’s .properties files.

One specific thing to check: If the language code (e.g. “en”) is being specified in the endecaBaseApplicationName property, this might result in a defaultRecordStoreName value with too many language code values, e.g. “MyAppenen_en_dimvals”, which isn't correct – too many "en".


Q: Exception in thread "index-/atg/endeca/index/commerce/ProductCatalogSimpleIndexingAdmin" java.lang.NullPointerException: Property value cannot be null (dimval.display_name)

This apparently indicates that one or more of the item-descriptor tags in customCatalog.xml (in the ATG project) are missing display-name attributes. (Note that some item-descriptor tags – particularly, the OOTB ones – inherit a display-name via the xml-combine="append" attribute value.)

Another symptom of this: In Dyn Admin, RepositoryTypeDimensionExporter component, under "show XML", one of the dimval.display_name entries in the XML shown has an empty PVAL.


Q: The ProductCatalogOutputConfig step reports success, but shows 0 records processed. The server log doesn't show any warnings.

Additional symptom: The Dyn Admin | Commerce Admin | Catalog Verification reports thousands of warnings like: "/atg/commerce/catalog/custom/CatalogVerificationService Product 434318 is not in any catalog. Cannot verify info objects."

Additional symptom: When looking the properties for any particular product via the ProductCatalog component in dyn admin, several properties are missing: ancestorCategories, ancestorCategoryIds, siteIds, parentCategoriesForCatalog, catalogs, computedCatalogs.

To fix:

  • In Dyn Admin | Commerce Admin, run Catalog Update and then Basic Maintenance. Then, run Catalog Verification; it should report 0 errors, 0 warnings.
  • If an error occurs during Basic Maintenance, it might be caused by an database deadlock. (You can look at the application log file to verify whether or not this is the case.) 

Q: ProductCatalogOutputConfig step fails after around 200 records processed. Application log shows: SQLException: Internal error: Cannot obtain XAConnection weblogic.common.resourcepool.ResourceLimitException: No resources currently available in pool

This apparently indicated that ATG was trying to use more connections than were available in WebLogic. I’m not sure what caused this when I saw it.  I ended up just retrying the baseline index (without changing any settings or bouncing the server), and it worked the second time.


Q: java.lang.OutOfMemoryError occurs after processing some, but not all, of the product records.

I worked around this by increasing the Java max heap size (for the ATG application) to 3G, and the MaxPermSize to 384m.


Q: Silent failure on the EndecaIndexing step (the last step). Nothing appears in the ATG application log after the RepositoryExport steps completed. The Dyn Admin GUI just reports "COMPLETE (Failed)".

1. This may indicate that the forge process failed on the Endeca side.  Try looking at the logs on the Endeca indexing server for details about the failure.

2. Verify that EAC is running on the Endeca indexing server.

3. Verify that in the EndecaScriptService (ATG 10.1.1) or ApplicationConfiguration (ATG 10.1.2) component, the endecaBaseApplicationName (10.1.1) or baseApplicationName (10.1.2) property has the correct value.


Q: Second portion of baseline index (RepositoryExport) fails midway through. Error in log: "/atg/search/repository/BulkLoader — atg.repository.search.indexing.IndexingException: com.endeca.itl.recordstore.ConcurrentWriteException: Write in progress with generation 12"

I found that a workaround for this was to restart CAS on the Endeca indexing server, then retry the baseline import from the ATG side.

Update 8/19/2013: Apparently this issue can be caused by an update lock not being released when a previous baseline index operation was manually canceled. See article "Cancelling the Baseline Indexing Job from SimpleIndexingAdmin does not Release Update Lock (Doc ID 1576472.1)" on the Oracle Support site for more information.

5 comments:

  1. This is very nice article.Thanks for give valuable information about oracle atg.

    ReplyDelete
  2. Muchas Gracias Jonathan, me ha ayudado a resolver un problema.
    Saludos.

    ReplyDelete
  3. This is helpful! Thanks John !!

    ReplyDelete
  4. This is helpful! Thanks John !!

    ReplyDelete

Non-spammers: Thanks for visiting! Please go ahead and leave a comment; I read them all!

Attention SPAMMERS: I review all comments before they get posted, and I REPORT 100% of spam comments to Google as spam! Why not avoid getting your account banned as quickly -- and save us both a little time -- by skipping this comment form and moving on to the next one on your list? Thanks, and I hope you have a great day!