Solr ManifoldCF Database Maintenance

  • Updated

Over a period of time, the crawl history reports contained in the Search 1.0 Solr ManifoldCF crawl database can grow quite large. If not periodically cleaned out, it is possible that the crawl database could consume a significant portion of the disk. Ektron recommends that the reports be cleaned out and the database itself compacted on a weekly basis. The solution below provides the steps to accomplish the task. 

NOTE: Search 2.0 Solr ManifoldCF includes features to automatically clean the crawl database, so the process described in this KB is not needed.

The process to cleanup the database can be accomplished manually or through the Task Scheduler. Here are the manual steps.

Prerequisites

Ensure the files listed below exist on the Solr Server

  • C:\Program Files (x86)\Ektron\Search1.0\ManifoldCF\core
    • cleanup-crawlreporthistory.cmd
    • cleanuphistoryreport.sql
    • compactdb.cmd
    • compactdb.sql
  • C:\Program Files (x86)\Ektron\Search1.0\ManifoldCF\core\processes
    • largeexecutecommand.bat
    • largeoptions.env

A zip file of the files can be downloaded here.

Cleanup the Crawl History

The following steps execute the command to truncate the crawl history table:

  1. On the Solr Search server, extract the zip file contents into the core folder, typically at C:\Program Files (x86)\Ektron\Search1.0\ManifoldCF\core.
  2. Open a Command Window (Run as administrator).
  3. Change the path in the command window to the ManifoldCF core folder: C:\Program Files (x86)\Ektron\Search1.0\ManifoldCF\core
  4. Enter the command: cleanup-crawlreporthistory.cmd
  5. Click OK to the message box. There will be lots of output with the last line being zero (0), representing how many rows are now in the crawl history table.

Compact the Crawl Database

The next steps recover the space that was freed up by cleaning up the crawl history.

  1. In the same command window, enter the command: compactdb.cmd .
  2. Click OK to the message box.

Note: During the compacting operation, a new copy of the database is created from the original. Ensure there is enough free space available. If there is not enough room and free space cannot be increased, it may be necessary to delete the crawl database and re-create it. See the KB article,  Clearing the Solr Crawl Database.