Skip to content
Search! & Match! API
Indexing Service
latest

Indexing Service

Indexing Service🔗

Common Error Handling🔗

Error Code Description
EMPTY_ARGUMENT One or more mandatory arguments is missing or has an empty value.
INVALID_PASSWORD The password is incorrect.
ENVIRONMENT_NOT_AVAILABLE The environment is not available (see log-file for possible errors).
INDEXING_ERROR There is an error in the search index.
DOCUMENT_STORAGE_ERROR The document store produced an error.

Method Add🔗

Method call🔗

add(environment, password, documentID, documentDate, document, accessRoles) : void

Description🔗

The add method adds a document with the given ID and date to the specified internal repository or updates the document within the repository if a document with the same identifier already exists. It overwrites all metadata fields. In addition to fields configured, Search adds the reserved tk_lastmodified field to the document with current time as value

Parameters🔗

Parameter Name Type Description
environment string identifier of a search environment
password string password for the search environment
documentID string unique document identifier within the search environment. Allows only letters (upper and lower case), numbers, dash (-), and underscore (_). Cannot be larger than 512 bytes, but it is recommended to keep this much smaller for performance reasons.
documentDate (optional) date-string document date used for ranking newer documents higher.
Supported syntax is YYYY-MM-DD['T'HH:MM:SS[.sss]], e.g. 2016-12-29T16:25:50.093 or 2016-12-29. If the date-string is not supplied in this format it will be regarded as null.
If the documentDate is not provided as an API parameter, it will be extracted from the document (using the selector defined in the metadata config for the special field called documentdate). If that does not yield a valid date, today is used.
document XML (TrXml) document XML document sent as string. Ensure proper XML encoding when constructing a SOAP message (use entity references for special characters or enclose the whole document in a CDATA section; encode non-ascii characters with the SOAP character set: UTF-8 by default). On the structure of XML see Input Document Format
accessRoles list of strings list of access roles that are allowed to retrieve the document. The accessRoles element should be repeated for each individual role.

Example🔗

Note that the actual document XML is omitted

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
                  xmlns:sear="http://home.textkernel.nl/search">
    <soapenv:Header/>
    <soapenv:Body>
        <sear:add>
            <environment>example</environment>
            <password>demo</password>
            <documentID>12</documentID>
            <documentDate>2016-10-10</documentDate>
            <document><![CDATA[ (...) ]]></document>
            <accessRoles>all</accessRoles>
        </sear:add>
    </soapenv:Body>
</soapenv:Envelope>

Returns🔗

Result Name Type Description
none    

Pre-Condition🔗

The environment is configured with a repository for indexing.

Post-Condition🔗

The document is added to the index, or if a document with the same ID already exists it is updated.

Error Handling🔗

See the description of common errors. Additional errors:

Error Code Description
INVALID_DOCUMENT_FORMAT The document format is invalid.

Method AddBulk🔗

Method call🔗

addBulk(environment, password, documents) : void

Description🔗

The add bulk method adds a list of documents to the specified internal repository or updates the document within the repository if a document with the same identifier already exists. It overwrites all metadata fields. Usually calling this method is faster than call add for every document because all documents are processed in one call.

In addition to fields configured, Search adds the reserved tk_lastmodified field to the documents with current time as value.

Hard limit of how many documents can be processed by one call exists and is equal to 100

Parameters🔗

Parameter Name Type Description
environment string identifier of a search environment
password string password for the search environment
document Document array of Document from chapter Object Structures

Example🔗

Note that the actual document XML is omitted

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
                  xmlns:sear="http://home.textkernel.nl/search">
    <soapenv:Header/>
    <soapenv:Body>
        <sear:addBulk>
            <environment>example</environment>
            <password>demo</password>
            <document>
                <id>1</id>
                <date>2019-06-22</date>
                <content><![CDATA[ (...) ]]></content>
                <accessRoles>all</accessRoles>
            </document>
            <document>
                <id>2</id>
                <date>2019-06-23</date>
                <content><![CDATA[ (...) ]]></content>
                <accessRoles>po</accessRoles>
            </document>
        </sear:addBulk>
    </soapenv:Body>
</soapenv:Envelope>

Returns🔗

Result Name Type Description
none    

Pre-Condition🔗

The environment is configured with a repository for indexing.

Post-Condition🔗

The documents are added to the index, or if a document with the same ID already exists it is updated.

Error Handling🔗

Method throws common errors and _ SearchBoxMultiErrorException_ in a addition. SearchBoxMultiErrorException is thrown when one (or many) documents are not added to the index. It contains list of _ errors_ object with those fields:

Field Name Type Description
docId string identifier of a failed document
errorCode string INVALID_DOCUMENT_FORMAT, INDEXING_ERROR, DOCUMENT_STORAGE_ERROR_ from common errors
Example🔗
<ns2:SearchBoxMultiErrorException xmlns:ns2="http://home.textkernel.nl/search">
    <errors>
        <errorCode>INVALID_DOCUMENT_FORMAT</errorCode>
        <docId>1</docId>
    </errors>
    <errors>
        <errorCode>INDEXING_ERROR</errorCode>
        <docId>2</docId>
    </errors>
    <errors>
        <errorCode>DOCUMENT_STORAGE_ERROR</errorCode>
        <docId>3</docId>
    </errors>
</ns2:SearchBoxMultiErrorException>

Method Delete🔗

Method call🔗

delete(environment, password, documentIDs) : DeleteResult

Description🔗

The delete method removes the documents with the given IDs from the search index if it exists. The method also removes the document from the document store if the repository is backed by a document store, and removes saved results related to this document from the database. It returns (see DeleteResult) which contains IDs of the document successfully deleted and documents it failed to delete.

Parameters🔗

Parameter Name Type Description
environment string identifier of a search environment
password string password for the search environment
documentIDs list of strings document identifiers indicating which documents to delete. One document ID per element. Element can be repeated to delete multiple documents in one call.

Example🔗

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
                  xmlns:sear="http://home.textkernel.nl/search">
    <soapenv:Header/>
    <soapenv:Body>
        <sear:delete>
            <environment>example</environment>
            <password>demo</password>
            <documentIDs>123</documentIDs>
            <documentIDs>456</documentIDs>
            <documentIDs>999</documentIDs>
        </sear:delete>
    </soapenv:Body>
</soapenv:Envelope>

Returns🔗

Result Name Type Description
deleteResult DeleteResult object See description of DeleteResult in chapter Object Structures

Pre-Condition🔗

In order for a document to be deleted it must have been added by a call to the index method.

Post-Condition🔗

Documents having any of the document IDs are removed.

Error Handling🔗

Trying to delete a document ID which does not exist does not lead to error, it is simply ignored. The delete operation is blocking so when the response is received, the operation has already completed. Whether it succeeded or failed can be seen in the response itself. If an exception occurs it will mention the particular document ID on which it failed. Note that the delete operation is not atomic nor transactional and will not be (partially) rolled back when an exception is encountered. It can be assumed that all earlier document IDs in the list have been deleted successfully and all later document IDs are skipped. See also the description of common errors. After correcting the source of the exception the same delete operation can be safely executed again. Already successfully deleted documents will simply be ignored.

Method Truncate🔗

Method call🔗

truncate(environment, password) : void

Description🔗

The truncate method removes all the documents from the repository. It empties the search index and deletes all documents from the corresponding document store if it exists. It also removes all saved results from the database related to the deleted documents.

NOTE: concurrent operations on the index may result in error while this operation is in progress.

Parameters🔗

Parameter Name Type Description
environment string identifier of a search environment
password string password for the search environment

Returns🔗

Result Name Type Description
none

Pre-Condition🔗

The environment must exist.

Post-Condition🔗

  • The index is empty (after being deleted and recreated).
  • The document store is empty (if enabled).

Error Handling🔗

See the description of common errors.

Method Update🔗

Method call🔗

update(environment, password, documentID, field, value, fieldValues) : void

Description🔗

The update method updates a certain metadata field of the given document ID. The value is a list, if the field is not multi-valued it should contain a single element. The value replaces the old list.

This endpoint also updates the reserved tk_lastmodified field to current time. And requests to modify tk_lastmodified field is forbidden and Search returns error when reserved field is used with this endpoint.

Parameters🔗

Parameter Name Type Description
environment string identifier of a search environment
password string password for the search environment
documentID string document identifiers of the document being updated.
field string The name of the metadata field, must be defined in the metadata definition of the environment or be either 'roles', 'documentdate', or 'projectid'.
value list of strings The list of values of this field. May be empty. Multiple values are only allowed on metadata fields of type multivalued. When the docstore is enabled, value cannot be longer than 65535 Bytes. For multiple values, the element should be repeated for each individual value.
fieldValues list of FieldValues Optional parameter. As an alternative to using the value argument, this argument accepts a list of FieldValue objects. Using this argument is required when updating object fields. Multiple values are only allowed on object fields of type multivalued.

Example🔗

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
                  xmlns:sear="http://home.textkernel.nl/search">
    <soapenv:Header/>
    <soapenv:Body>
        <sear:update>
            <environment>example</environment>
            <password>demo</password>
            <documentID>123</documentID>
            <field>city</field>
            <value>London</value>
        </sear:update>
    </soapenv:Body>
</soapenv:Envelope>
Update multivalued object field example🔗

NOTE: It is only possible to update a multivalued field completely, i.e. all its values get overwritten.

<x:Envelope xmlns:x="http://schemas.xmlsoap.org/soap/envelope/" xmlns:sear="http://home.textkernel.nl/search">
    <x:Header/>
    <x:Body>
        <sear:update>
            <environment>example</environment>
            <password>demo</password>
            <documentID>1</documentID>
            <field>jobexperience</field>
            <fieldValues>
                <subValues>
                    <entry>
                        <key>jobtitle</key>
                        <value>
                            <item>Accounts Administrator</item>
                        </value>
                    </entry>
                    <entry>
                        <key>startdate</key>
                        <value>
                            <item>2018-01-01</item>
                        </value>
                    </entry>
                    <entry>
                        <key>years</key>
                        <value>
                            <item>10</item>
                        </value>
                    </entry>
                </subValues>
            </fieldValues>
            <fieldValues>
                <subValues>
                    <entry>
                        <key>jobtitle</key>
                        <value>
                            <item>Accounts Assistant</item>
                        </value>
                    </entry>
                    <entry>
                        <key>years</key>
                        <value>
                            <item>5</item>
                        </value>
                    </entry>
                    <entry>
                        <key>startdate</key>
                        <value>
                            <item>2010-03-03</item>
                        </value>
                    </entry>
                    <entry>
                        <key>description</key>
                        <value>
                            <item>Very fun job experience</item>
                        </value>
                    </entry>
                </subValues>
            </fieldValues>
        </sear:update>
    </x:Body>
</x:Envelope>

Returns🔗

Result Name Type Description
none

Pre-Condition🔗

In order for a document to be updated it must have been added by a call to the index method, otherwise the update will be discarded.

Post-Condition🔗

The field value is updated.

Error Handling🔗

See the description of common errors. Additional errors:

Error Code Description
INVALID_DOCUMENT_FORMAT The field is invalid or the field value contains invalid syntax (for dates, numbers, etc.).

Method Recreate Tags🔗

Method call🔗

recreateTags(environment, password, projectID) : recreateTagsResult

Description🔗

The Recreate Tags method reads all tags from the environment's database and processes them as partial updates on the projectID field. This method is useful to restore tags' searchability after a full reindexing action from an external source. If projectID is provided, then the tags are processed only for the specified projectID. It'not possible to recreate tags only for default project, it can be done only as an operation for whole environment. As a synchronisation tool, the method will also remove orphaned saved results from the database having no index document anymore they are related to.

Parameters🔗

Parameter Name Type Description
environment string identifier of a search environment
password string password for the search environment
projectID string (optional) ID of a project for which to recreate tags

Returns🔗

Result Name Type Description
recreateTagsResult RecreateTagsResult object id of the documents that we could not update tags for

Pre-Condition🔗

  • The provided environment has a repository (i.e. it is indexing).
  • The provided environment has a database configured.
  • In order for documents to be tagged, they must already exist in the search repository, otherwise the tags will be discarded.

Post-Condition🔗

The documents that are found in the database as tagged are updated in the index so they can be searched by project tag.

Error Handling🔗

See the description of common errors.

Method deleteByQuery🔗

Method call🔗

deleteByQuery(environment, password, user, accessRoles, query, queryParts) : deleteByQueryResponse

Description🔗

The deleteByQuery method deletes all documents that satisfy search criteria in the given environment and search engine(only searchers of type elasticsearch are supported). This method dispatches a long running job and returns immediately. Response doesn't contain actual information about delete run (fields deleted and total are not present, status is SUBMITTED). It means that actual status can be retrieved by Method getDeleteRun, where the input field id has to be taken from deleteByQueryResponse.

NOTE: The query behavior is consistent with the Search service, ie. the same result set that would be returned by the Search service for the given query and accessRoles parameters, will be deleted by this method.

That implies that an empty query, as well as an invalid query that cannot be parsed, returns all the results. Also, hiddenQuery, if configured, is taken into account.

WARNING: It is not possible to delete more than 95% of the documents in the index. For that use case, truncate method should be used instead. Requests that involve more documents than the mentioned threshold will be rejected.

Parameters🔗

Parameter Name Type Description
environment string identifier of the search environment.
password string password for the search environment.
accessRoles list of strings list of access roles that are allowed to retrieve the document. The accessRoles element should be repeated for each individual role.
searchEngine string name of the search engine to use. Only searchers of type elasticsearch_ are supported. If not specified default for environment will be used.
query string the user typed keyword query. It may be joined with the queryParts if it is a single OR-combination part that can be joined. Query or at least 1 queryPart is required.
queryParts list of QueryPart list of query parts coming from parsing a previous query. Query or at least 1 queryPart is required.

Example🔗

Delete documents with text manager for all roles using default searcher:

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
                  xmlns:sear="http://home.textkernel.nl/search">
    <soapenv:Header/>
    <soapenv:Body>
        <sear:deleteByQuery>
            <environment>example</environment>
            <password>demo</password>
            <accessRoles>all</accessRoles>
            <query>manager</query>
        </sear:deleteByQuery>
    </soapenv:Body>
</soapenv:Envelope>

Delete documents older than 3 months:

<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
                  xmlns:sear="http://home.textkernel.nl/search">
    <soapenv:Header/>
    <soapenv:Body>
        <sear:deleteByQuery>
            <environment>example</environment>
            <password>demo</password>
            <accessRoles>all</accessRoles>
            <queryParts>
                <field>documentdate</field>
                <condition>REQUIRED</condition>
                <weight>1.0</weight>
                <items>
                    <value>&lt;today-91</value>
                    <label>older than 3 months</label>
                </items>
            </queryParts>
        </sear:deleteByQuery>
    </soapenv:Body>
</soapenv:Envelope>

Delete documents with text director and role role2 using searcher internal2:

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
                  xmlns:sear="http://home.textkernel.nl/search">
    <soapenv:Header/>
    <soapenv:Body>
        <sear:deleteByQuery>
            <environment>example</environment>
            <password>demo</password>
            <accessRoles>role2</accessRoles>
            <query>director</query>
            <searcher>internal2</searcher>
        </sear:deleteByQuery>
    </soapenv:Body>
</soapenv:Envelope>

Delete all documents by role agency1(we specify non empty query that returns all documents and filter them by _ accessRoles_):

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
                  xmlns:sear="http://home.textkernel.nl/search">
    <soapenv:Header/>
    <soapenv:Body>
        <sear:deleteByQuery>
            <environment>example</environment>
            <password>demo</password>
            <accessRoles>agency1</accessRoles>
            <query>documentdate:>1900-01-01</query>
        </sear:deleteByQuery>
    </soapenv:Body>
</soapenv:Envelope>

If you want to delete all documents for all roles then use Method truncate which is faster and synchronous.

Returns🔗

Result Name Type Description
deleteByQueryResponse DeleteRun object represents status of Delete Run

Pre-Condition🔗

  • The document store is enabled.
  • The provided environment has a repository (i.e. it is indexing).
  • The provided query (search request) does not return more than 95% of the total documents in the environment index.

Post-Condition🔗

The documents matching the query are deleted.

Error Handling🔗

All the errors listed in common errors. The service only starts a job and returns before any documents are actually deleted. Actual deletion errors end up only in the application log. Additional errors:

Error Code Description
METHOD_NOT_AVAILABLE The provided environment is not indexed(doesn't have repository) or Cassandra Doc store is not configured on the application level.
DELETE_RUN_REJECTED Delete run rejected because the provided query returns more than 95% of the total documents in the environment index.

Method getDeleteRun🔗

Method call🔗

getDeleteRun(environment, password, id) : deleteRunResponse

Description🔗

Gets the status of the delete by query job

Parameters🔗

Parameter Name Type Description
environment string identifier of a search environment.
password string password for the search environment.
id string unique id of the delete run.

Returns🔗

Result Name Type Description
deleteByQueryResponse DeleteRun object object

Error Handling🔗

All the errors listed in common errors plus several additional:

Error Code Description
METHOD_NOT_AVAILABLE The document store is not enabled.
INVALID_REQUEST id isn't in correct format
DELETE_RUN_NOT_FOUND Delete run with given id is not found

Method Reindex🔗

Method call🔗

reindex(environment, password, deltaTime) : reindexResponse

Description🔗

The Reindex method reads all documents and their partial updates from the document store and adds them to the search index.

NOTE: Only the documents that were indexed after the document store became enabled will be reindexed.

Reindex does not change tk_lastmodified field value, the date is set from the document store.

If a reindexing operation is already in progress on the given environment then the reindexing process is restarted. In any case this method returns an object of type ReindexingStatus containing the details of the new reindex run.

Parameters🔗

Parameter Name Type Description
environment string identifier of a search environment
password string password for the search environment
deltaTime date optional delta time to reindex documents newer than this time (based on last add/update time, not documentdate).
Supported syntax is YYYY-MM-DD['T'HH:MM:SS[.sss]], e.g. 2016-12-29T16:25:50.093 or 2016-12-29. If the date-string is not supplied in this format it will be regarded as null.

Returns🔗

Result Name Type Description
reindexResponse ReindexingStatus object See format of the ReindexingStatus object

Pre-Condition🔗

  • The document store is enabled on the application level.
  • The provided environment has a repository (i.e. it is indexing).
  • The document store is not disabled on the repository.

Post-Condition🔗

A reindex job has started that will reindex all documents in the document store for this environment.

Error Handling🔗

All the errors listed in common errors, except for " INDEXING_ERROR". The service only starts a job and returns before any documents are actually sent to the index so actual indexing errors end up only in the application log. Additional errors:

Error Code Description
METHOD_NOT_AVAILABLE The document store is not enabled.

Method getReindexingStatus🔗

Method call🔗

getReindexingStatus(environment, password) : getReindexingStatusResponse

Description🔗

This method returns an object of type ReindexingStatus which describes the current status of a reindexing operation for the given environment.

In case the status is READY the total number of documents in the document store is calculated and included in the response.

Parameters🔗

Parameter Name Type Description
environment string identifier of a search environment
password string password for the search environment

Returns🔗

Result Name Type Description
getReindexingStatusResponse ReindexingStatus object See format of the ReindexingStatus object

Pre-Condition🔗

  • The document store is enabled on the application level.
  • The document store is not disabled on the environment level.
  • The provided environment has a repository (i.e. it is indexing).

Post-Condition🔗

None.

Error Handling🔗

See the description of common errors. Additional errors:

Error Code Description
METHOD_NOT_AVAILABLE The document store is not enabled.