Solr Not Indexing Smart Form Content with Large Fields

  • Updated

The following error is thrown if a smart form field larger than 32766 bytes is crawled. 

"Document contains at least one immense term in field="exact_estjournalarticlecontent" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[65, 108, 116, 104, 111, 117, 103, 104, 32, 97, 112, 112, 114, 111, 120, 105, 109, 97, 116, 101, 108, 121, 32, 53, -30, -128, -109, 56, 37, 32]...', original message: bytes can be at most 32766 in length; got 87278"

This is fixed in 9.2SP1(EKTR-52) but here is a workaround until that can be released: Mark the field to not be indexed, which should allow the rest of the fields to be indexed.

  1. Go to Setting > Configuration > Smart Form Configurations > click the appropriate smart form type > click the Data Design button > select the field causing the error -> click view field properties > uncheck the Indexed checkbox. (The content of this rich text field is already included in content field when Solr is indexing, so this field is not necessarily checked as Indexed). 
  2. Click Update
  3. The next screen shows the list of indexed smart form properties. Click Update again.
  4. Delete the crawl database(please note this will make search results unavailable).
  5. Make sure the old index data is deleted (on machine installed Solr, visit C:\Program Files (x86)\Ektron\Search2.0\Solr\server\solr\cores{core-name} and delete all existing files)
  6. Register the site with the start over option.
  7. Wait until the crawl process is completed.

Alternatively reduce the amount of content in the field until it is under 32KB, then do an incremental crawl.