Indexing Service
Indexing Service🔗
Common Error Handling🔗
Error Code | Description |
---|---|
EMPTY_ARGUMENT | One or more mandatory arguments is missing or has an empty value. |
INVALID_PASSWORD | The password is incorrect. |
ENVIRONMENT_NOT_AVAILABLE | The environment is not available (see log-file for possible errors). |
INDEXING_ERROR | There is an error in the search index. |
DOCUMENT_STORAGE_ERROR | The document store produced an error. |
Method Add🔗
Method call🔗
add(environment, password, documentID, documentDate, document, accessRoles) : void
Description🔗
The add method adds a document with the given ID and date to the specified internal repository or updates the document
within the repository if a document with the same identifier already exists. It overwrites all metadata fields.
In addition to fields configured, Search adds the reserved tk_lastmodified
field to the document with current time as
value
Parameters🔗
Parameter Name | Type | Description |
---|---|---|
environment | string | identifier of a search environment |
password | string | password for the search environment |
documentID | string | unique document identifier within the search environment. Allows only letters (upper and lower case), numbers, dash (-), and underscore (_). Cannot be larger than 512 bytes, but it is recommended to keep this much smaller for performance reasons. |
documentDate (optional) | date-string | document date used for ranking newer documents higher. Supported syntax is YYYY-MM-DD['T'HH:MM:SS[.sss]] , e.g. 2016-12-29T16:25:50.093 or 2016-12-29 . If the date-string is not supplied in this format it will be regarded as null . If the documentDate is not provided as an API parameter, it will be extracted from the document (using the selector defined in the metadata config for the special field called documentdate ). If that does not yield a valid date, today is used. |
document | XML (TrXml) document | XML document sent as string. Ensure proper XML encoding when constructing a SOAP message (use entity references for special characters or enclose the whole document in a CDATA section; encode non-ascii characters with the SOAP character set: UTF-8 by default). On the structure of XML see Input Document Format |
accessRoles | list of strings | list of access roles that are allowed to retrieve the document. The accessRoles element should be repeated for each individual role. |
Example🔗
Note that the actual document XML is omitted
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:sear="http://home.textkernel.nl/search">
<soapenv:Header/>
<soapenv:Body>
<sear:add>
<environment>example</environment>
<password>demo</password>
<documentID>12</documentID>
<documentDate>2016-10-10</documentDate>
<document><![CDATA[ (...) ]]></document>
<accessRoles>all</accessRoles>
</sear:add>
</soapenv:Body>
</soapenv:Envelope>
Returns🔗
Result Name | Type | Description |
---|---|---|
none |
Pre-Condition🔗
The environment is configured with a repository for indexing.
Post-Condition🔗
The document is added to the index, or if a document with the same ID already exists it is updated.
Error Handling🔗
See the description of common errors. Additional errors:
Error Code | Description |
---|---|
INVALID_DOCUMENT_FORMAT | The document format is invalid. |
Method AddBulk🔗
Method call🔗
addBulk(environment, password, documents) : void
Description🔗
The add bulk method adds a list of documents to the specified internal repository or updates the document within the repository if a document with the same identifier already exists. It overwrites all metadata fields. Usually calling this method is faster than call add for every document because all documents are processed in one call.
In addition to fields configured, Search adds the reserved tk_lastmodified
field to the documents with current time as
value.
Hard limit of how many documents can be processed by one call exists and is equal to 100
Parameters🔗
Parameter Name | Type | Description |
---|---|---|
environment | string | identifier of a search environment |
password | string | password for the search environment |
document | Document | array of Document from chapter Object Structures |
Example🔗
Note that the actual document XML is omitted
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:sear="http://home.textkernel.nl/search">
<soapenv:Header/>
<soapenv:Body>
<sear:addBulk>
<environment>example</environment>
<password>demo</password>
<document>
<id>1</id>
<date>2019-06-22</date>
<content><![CDATA[ (...) ]]></content>
<accessRoles>all</accessRoles>
</document>
<document>
<id>2</id>
<date>2019-06-23</date>
<content><![CDATA[ (...) ]]></content>
<accessRoles>po</accessRoles>
</document>
</sear:addBulk>
</soapenv:Body>
</soapenv:Envelope>
Returns🔗
Result Name | Type | Description |
---|---|---|
none |
Pre-Condition🔗
The environment is configured with a repository for indexing.
Post-Condition🔗
The documents are added to the index, or if a document with the same ID already exists it is updated.
Error Handling🔗
Method throws common errors and _ SearchBoxMultiErrorException_ in a addition. SearchBoxMultiErrorException is thrown when one (or many) documents are not added to the index. It contains list of _ errors_ object with those fields:
Field Name | Type | Description |
---|---|---|
docId | string | identifier of a failed document |
errorCode | string | INVALID_DOCUMENT_FORMAT, INDEXING_ERROR, DOCUMENT_STORAGE_ERROR_ from common errors |
Example🔗
<ns2:SearchBoxMultiErrorException xmlns:ns2="http://home.textkernel.nl/search">
<errors>
<errorCode>INVALID_DOCUMENT_FORMAT</errorCode>
<docId>1</docId>
</errors>
<errors>
<errorCode>INDEXING_ERROR</errorCode>
<docId>2</docId>
</errors>
<errors>
<errorCode>DOCUMENT_STORAGE_ERROR</errorCode>
<docId>3</docId>
</errors>
</ns2:SearchBoxMultiErrorException>
Method Delete🔗
Method call🔗
delete(environment, password, documentIDs) : DeleteResult
Description🔗
The delete method removes the documents with the given IDs from the search index if it exists. The method also removes the document from the document store if the repository is backed by a document store, and removes saved results related to this document from the database. It returns (see DeleteResult) which contains IDs of the document successfully deleted and documents it failed to delete.
Parameters🔗
Parameter Name | Type | Description |
---|---|---|
environment | string | identifier of a search environment |
password | string | password for the search environment |
documentIDs | list of strings | document identifiers indicating which documents to delete. One document ID per element. Element can be repeated to delete multiple documents in one call. |
Example🔗
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:sear="http://home.textkernel.nl/search">
<soapenv:Header/>
<soapenv:Body>
<sear:delete>
<environment>example</environment>
<password>demo</password>
<documentIDs>123</documentIDs>
<documentIDs>456</documentIDs>
<documentIDs>999</documentIDs>
</sear:delete>
</soapenv:Body>
</soapenv:Envelope>
Returns🔗
Result Name | Type | Description |
---|---|---|
deleteResult | DeleteResult object | See description of DeleteResult in chapter Object Structures |
Pre-Condition🔗
In order for a document to be deleted it must have been added by a call to the index method.
Post-Condition🔗
Documents having any of the document IDs are removed.
Error Handling🔗
Trying to delete a document ID which does not exist does not lead to error, it is simply ignored. The delete operation is blocking so when the response is received, the operation has already completed. Whether it succeeded or failed can be seen in the response itself. If an exception occurs it will mention the particular document ID on which it failed. Note that the delete operation is not atomic nor transactional and will not be (partially) rolled back when an exception is encountered. It can be assumed that all earlier document IDs in the list have been deleted successfully and all later document IDs are skipped. See also the description of common errors. After correcting the source of the exception the same delete operation can be safely executed again. Already successfully deleted documents will simply be ignored.
Method Truncate🔗
Method call🔗
truncate(environment, password) : void
Description🔗
The truncate method removes all the documents from the repository. It empties the search index and deletes all documents from the corresponding document store if it exists. It also removes all saved results from the database related to the deleted documents.
NOTE: concurrent operations on the index may result in error while this operation is in progress.
Parameters🔗
Parameter Name | Type | Description |
---|---|---|
environment | string | identifier of a search environment |
password | string | password for the search environment |
Returns🔗
Result Name | Type | Description |
---|---|---|
none |
Pre-Condition🔗
The environment must exist.
Post-Condition🔗
- The index is empty (after being deleted and recreated).
- The document store is empty (if enabled).
Error Handling🔗
See the description of common errors.
Method Update🔗
Method call🔗
update(environment, password, documentID, field, value, fieldValues) : void
Description🔗
The update method updates a certain metadata field of the given document ID. The value is a list, if the field is not multi-valued it should contain a single element. The value replaces the old list.
This endpoint also updates the reserved tk_lastmodified
field to current time. And requests to
modify tk_lastmodified
field is forbidden and Search returns error when reserved field is used with this endpoint.
Parameters🔗
Parameter Name | Type | Description |
---|---|---|
environment | string | identifier of a search environment |
password | string | password for the search environment |
documentID | string | document identifiers of the document being updated. |
field | string | The name of the metadata field, must be defined in the metadata definition of the environment or be either 'roles', 'documentdate', or 'projectid'. |
value | list of strings | The list of values of this field. May be empty. Multiple values are only allowed on metadata fields of type multivalued. When the docstore is enabled, value cannot be longer than 65535 Bytes. For multiple values, the element should be repeated for each individual value. |
fieldValues | list of FieldValues | Optional parameter. As an alternative to using the value argument, this argument accepts a list of FieldValue objects. Using this argument is required when updating object fields. Multiple values are only allowed on object fields of type multivalued. |
Example🔗
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:sear="http://home.textkernel.nl/search">
<soapenv:Header/>
<soapenv:Body>
<sear:update>
<environment>example</environment>
<password>demo</password>
<documentID>123</documentID>
<field>city</field>
<value>London</value>
</sear:update>
</soapenv:Body>
</soapenv:Envelope>
Update multivalued object field example🔗
NOTE: It is only possible to update a multivalued field completely, i.e. all its values get overwritten.
<x:Envelope xmlns:x="http://schemas.xmlsoap.org/soap/envelope/" xmlns:sear="http://home.textkernel.nl/search">
<x:Header/>
<x:Body>
<sear:update>
<environment>example</environment>
<password>demo</password>
<documentID>1</documentID>
<field>jobexperience</field>
<fieldValues>
<subValues>
<entry>
<key>jobtitle</key>
<value>
<item>Accounts Administrator</item>
</value>
</entry>
<entry>
<key>startdate</key>
<value>
<item>2018-01-01</item>
</value>
</entry>
<entry>
<key>years</key>
<value>
<item>10</item>
</value>
</entry>
</subValues>
</fieldValues>
<fieldValues>
<subValues>
<entry>
<key>jobtitle</key>
<value>
<item>Accounts Assistant</item>
</value>
</entry>
<entry>
<key>years</key>
<value>
<item>5</item>
</value>
</entry>
<entry>
<key>startdate</key>
<value>
<item>2010-03-03</item>
</value>
</entry>
<entry>
<key>description</key>
<value>
<item>Very fun job experience</item>
</value>
</entry>
</subValues>
</fieldValues>
</sear:update>
</x:Body>
</x:Envelope>
Returns🔗
Result Name | Type | Description |
---|---|---|
none |
Pre-Condition🔗
In order for a document to be updated it must have been added by a call to the index method, otherwise the update will be discarded.
Post-Condition🔗
The field value is updated.
Error Handling🔗
See the description of common errors. Additional errors:
Error Code | Description |
---|---|
INVALID_DOCUMENT_FORMAT | The field is invalid or the field value contains invalid syntax (for dates, numbers, etc.). |
Method Recreate Tags🔗
Method call🔗
recreateTags(environment, password, projectID) : recreateTagsResult
Description🔗
The Recreate Tags method reads all tags from the environment's database and processes them as partial updates on the projectID field. This method is useful to restore tags' searchability after a full reindexing action from an external source. If projectID is provided, then the tags are processed only for the specified projectID. It'not possible to recreate tags only for default project, it can be done only as an operation for whole environment. As a synchronisation tool, the method will also remove orphaned saved results from the database having no index document anymore they are related to.
Parameters🔗
Parameter Name | Type | Description |
---|---|---|
environment | string | identifier of a search environment |
password | string | password for the search environment |
projectID | string | (optional) ID of a project for which to recreate tags |
Returns🔗
Result Name | Type | Description |
---|---|---|
recreateTagsResult | RecreateTagsResult object | id of the documents that we could not update tags for |
Pre-Condition🔗
- The provided environment has a repository (i.e. it is indexing).
- The provided environment has a database configured.
- In order for documents to be tagged, they must already exist in the search repository, otherwise the tags will be discarded.
Post-Condition🔗
The documents that are found in the database as tagged are updated in the index so they can be searched by project tag.
Error Handling🔗
See the description of common errors.
Method deleteByQuery🔗
Method call🔗
deleteByQuery(environment, password, user, accessRoles, query, queryParts) : deleteByQueryResponse
Description🔗
The deleteByQuery method deletes all documents that satisfy search criteria in the given environment and search engine(only searchers of type elasticsearch are supported). This method dispatches a long running job and returns immediately. Response doesn't contain actual information about delete run (fields deleted and total are not present, status is SUBMITTED). It means that actual status can be retrieved by Method getDeleteRun, where the input field id has to be taken from deleteByQueryResponse.
NOTE: The query behavior is consistent with the Search service, ie. the same result set that would be returned by the Search service for the given query and accessRoles parameters, will be deleted by this method.
That implies that an empty query, as well as an invalid query that cannot be parsed, returns all the results. Also, hiddenQuery, if configured, is taken into account.
WARNING: It is not possible to delete more than 95% of the documents in the index. For that use case, truncate method should be used instead. Requests that involve more documents than the mentioned threshold will be rejected.
Parameters🔗
Parameter Name | Type | Description |
---|---|---|
environment | string | identifier of the search environment. |
password | string | password for the search environment. |
accessRoles | list of strings | list of access roles that are allowed to retrieve the document. The accessRoles element should be repeated for each individual role. |
searchEngine | string | name of the search engine to use. Only searchers of type elasticsearch_ are supported. If not specified default for environment will be used. |
query | string | the user typed keyword query. It may be joined with the queryParts if it is a single OR-combination part that can be joined. Query or at least 1 queryPart is required. |
queryParts | list of QueryPart | list of query parts coming from parsing a previous query. Query or at least 1 queryPart is required. |
Example🔗
Delete documents with text manager for all roles using default searcher:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:sear="http://home.textkernel.nl/search">
<soapenv:Header/>
<soapenv:Body>
<sear:deleteByQuery>
<environment>example</environment>
<password>demo</password>
<accessRoles>all</accessRoles>
<query>manager</query>
</sear:deleteByQuery>
</soapenv:Body>
</soapenv:Envelope>
Delete documents older than 3 months:
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:sear="http://home.textkernel.nl/search">
<soapenv:Header/>
<soapenv:Body>
<sear:deleteByQuery>
<environment>example</environment>
<password>demo</password>
<accessRoles>all</accessRoles>
<queryParts>
<field>documentdate</field>
<condition>REQUIRED</condition>
<weight>1.0</weight>
<items>
<value><today-91</value>
<label>older than 3 months</label>
</items>
</queryParts>
</sear:deleteByQuery>
</soapenv:Body>
</soapenv:Envelope>
Delete documents with text director and role role2 using searcher internal2:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:sear="http://home.textkernel.nl/search">
<soapenv:Header/>
<soapenv:Body>
<sear:deleteByQuery>
<environment>example</environment>
<password>demo</password>
<accessRoles>role2</accessRoles>
<query>director</query>
<searcher>internal2</searcher>
</sear:deleteByQuery>
</soapenv:Body>
</soapenv:Envelope>
Delete all documents by role agency1(we specify non empty query that returns all documents and filter them by _ accessRoles_):
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:sear="http://home.textkernel.nl/search">
<soapenv:Header/>
<soapenv:Body>
<sear:deleteByQuery>
<environment>example</environment>
<password>demo</password>
<accessRoles>agency1</accessRoles>
<query>documentdate:>1900-01-01</query>
</sear:deleteByQuery>
</soapenv:Body>
</soapenv:Envelope>
If you want to delete all documents for all roles then use Method truncate which is faster and synchronous.
Returns🔗
Result Name | Type | Description |
---|---|---|
deleteByQueryResponse | DeleteRun object | represents status of Delete Run |
Pre-Condition🔗
- The document store is enabled.
- The provided environment has a repository (i.e. it is indexing).
- The provided query (search request) does not return more than 95% of the total documents in the environment index.
Post-Condition🔗
The documents matching the query are deleted.
Error Handling🔗
All the errors listed in common errors. The service only starts a job and returns before any documents are actually deleted. Actual deletion errors end up only in the application log. Additional errors:
Error Code | Description |
---|---|
METHOD_NOT_AVAILABLE | The provided environment is not indexed(doesn't have repository) or Cassandra Doc store is not configured on the application level. |
DELETE_RUN_REJECTED | Delete run rejected because the provided query returns more than 95% of the total documents in the environment index. |
Method getDeleteRun🔗
Method call🔗
getDeleteRun(environment, password, id) : deleteRunResponse
Description🔗
Gets the status of the delete by query job
Parameters🔗
Parameter Name | Type | Description |
---|---|---|
environment | string | identifier of a search environment. |
password | string | password for the search environment. |
id | string | unique id of the delete run. |
Returns🔗
Result Name | Type | Description |
---|---|---|
deleteByQueryResponse | DeleteRun object | object |
Error Handling🔗
All the errors listed in common errors plus several additional:
Error Code | Description |
---|---|
METHOD_NOT_AVAILABLE | The document store is not enabled. |
INVALID_REQUEST | id isn't in correct format |
DELETE_RUN_NOT_FOUND | Delete run with given id is not found |
Method Reindex🔗
Method call🔗
reindex(environment, password, deltaTime) : reindexResponse
Description🔗
The Reindex method reads all documents and their partial updates from the document store and adds them to the search index.
NOTE: Only the documents that were indexed after the document store became enabled will be reindexed.
Reindex does not change tk_lastmodified
field value, the date is set from the document store.
If a reindexing operation is already in progress on the given environment then the reindexing process is restarted. In any case this method returns an object of type ReindexingStatus containing the details of the new reindex run.
Parameters🔗
Parameter Name | Type | Description |
---|---|---|
environment | string | identifier of a search environment |
password | string | password for the search environment |
deltaTime | date | optional delta time to reindex documents newer than this time (based on last add/update time, not documentdate). Supported syntax is YYYY-MM-DD['T'HH:MM:SS[.sss]] , e.g. 2016-12-29T16:25:50.093 or 2016-12-29 . If the date-string is not supplied in this format it will be regarded as null . |
Returns🔗
Result Name | Type | Description |
---|---|---|
reindexResponse | ReindexingStatus object | See format of the ReindexingStatus object |
Pre-Condition🔗
- The document store is enabled on the application level.
- The provided environment has a repository (i.e. it is indexing).
- The document store is not disabled on the repository.
Post-Condition🔗
A reindex job has started that will reindex all documents in the document store for this environment.
Error Handling🔗
All the errors listed in common errors, except for " INDEXING_ERROR". The service only starts a job and returns before any documents are actually sent to the index so actual indexing errors end up only in the application log. Additional errors:
Error Code | Description |
---|---|
METHOD_NOT_AVAILABLE | The document store is not enabled. |
Method getReindexingStatus🔗
Method call🔗
getReindexingStatus(environment, password) : getReindexingStatusResponse
Description🔗
This method returns an object of type ReindexingStatus which describes the current status of a reindexing operation for the given environment.
In case the status is READY the total number of documents in the document store is calculated and included in the response.
Parameters🔗
Parameter Name | Type | Description |
---|---|---|
environment | string | identifier of a search environment |
password | string | password for the search environment |
Returns🔗
Result Name | Type | Description |
---|---|---|
getReindexingStatusResponse | ReindexingStatus object | See format of the ReindexingStatus object |
Pre-Condition🔗
- The document store is enabled on the application level.
- The document store is not disabled on the environment level.
- The provided environment has a repository (i.e. it is indexing).
Post-Condition🔗
None.
Error Handling🔗
See the description of common errors. Additional errors:
Error Code | Description |
---|---|
METHOD_NOT_AVAILABLE | The document store is not enabled. |