Input Document Format🔗

Search! is flexible in regard to the input document format. The only required properties are:

Documents to be indexed are well-formed XML.
Documents to be indexed use the UTF-8 file-encoding.

There are no further requirements on the XML schema of input documents, however, in order to index field information documents need to contain fielded information within the XML elements specified in the configuration for your environment. The document format specification distinguishes two types of elements:

Sections marked as such in the Search! configuration allow full-text queries.
MetadataFields allow queries to be refined and additionally make it possible to be used as facets (including facets of type "cloud"), and also can be sent along with the document information in the query results. Metadata fields are configured to be interpreted as text, numbers, dates, or locations.

While sections can contain arbitrarily long texts, metadata fields should be used for short text segments only. Sections can be contained in other sections in the input document. The indexing service makes sure no text from embedded sections is duplicated and section-constrainted queries are restricted to the proper scope.

NOTE: Metadata fields should not be nested in either sections or metadata fields in the input document because that would duplicate the content of the fields and inadvertently boost their scores.