Solr is not indexing assets

  • Updated

A query is run to pull up PDFs, but the assets do not appear in search results. This can also be an issue found in the Asset Transfer logs, that show an 'Host: Unknown' response.

Asset Transfer Logs
: \Program Files (x86)\Ektron\Search2.0\Asset Transfer Client

Or, the host name is found, but there is an "Error during content extraction" for a PDF file:

Manifold encounters errors when crawling assets (PDFs) such as the following:

ERROR - 2015-12-07 16:17:33.715; extractors.AssetExtractor; (Worker thread '1') - Could not retrieve Asset:

In search results you may see /DownloadAsset.aspx?id=xxxxx, missing /workarea. It should be /workarea/downloadasset.aspx?id=xxxxx

 

Solr cannot pull over the assets completely. This may be manifested in these symptoms.

  • Metadata/Taxonomies do not show up in search results, but the PDF with which they are associated do show up
  • PDFs do not appear in search results.
  • Assets are too large to transfer. The maximum is 524288000 bytes. 

 

Checking communication

  1. Ensure that the Ektron Asset Transfer Server service on the CMS server and the Ektron Asset Transfer Client service on the Solr server are both started.
  2. From the CMS server run Powershell as an administrator. 
  3. Run the following command after replacing SolrMachineName with the machine name of the Solr box. This is to ensure DNS is working properly.  
    test-netconnection -computername SolrMachineName -port 80
    If this fails, edit the host file of the CMS server and set the correct IP.

  4. In the same powershell window run the following command to ensure the CMS can communicate over Solr's asset transfer client service port(7605 by default).  
    test-netconnection -computername SolrMachineName -port 7605
    If this fails, ensure port 7605 inbound to the Solr machine is not blocked by the firewall. 

  5. Test this powershell again on the CMS server to ensure resolution of the issue. 
    test-netconnection -computername SolrMachineName -port 7605
  6. Less commonly there could be a communication issue from the Solr machine to the CMS machine. Run this command on the Solr machine and if it fails unblock port 8732 inbound on the CMS server.
    test-netconnection -computername CmsMachineName -port 8732

Fixing the AssetServertable

  1. If the communication is working you may need to check the AssetServerTable for erroneous entries. Remove extra entries as needed.

If asset size is more than 524 MB

  1.  Add the key "MaxAssetSize" with desired value into appSettings section of Ektron.Cms.Search.Assets.Server.exe.config on the Ektron CMS server. 

  2. Restart the Asset Transfer Server service on the same server. 

Check for remaining issues and confirming resolution

  1. Publish a change to an asset(for instance make a change to the title of DMS document or upload a new PDF).
  2. Review the file Ektron.Cms.Search.Assets.Server.log on the CMS server for any remaining errors and resolve them.
    C:\Program Files (x86)\Ektron\Asset Transfer Server\Ektron.Cms.Search.Assets.Server.log
  3. Review the file Ektron.Cms.Search.Assets.txt on the Solr server for any remaining errors and resolve them.
     C:\Program Files (x86)\Ektron\SearchX.0\Asset Transfer Client\Ektron.Cms.Search.Assets.txt