Skip to content
Sourcebox API Reference
Manage Document get
latest

Manage Document get🔗

The get action retrieves a document by ID. This is only available for persisting accounts.

Manage Document get parameters

Parameter Description Content type
account the Sourcebox account name text
username the username of the user text
password the password associated to the user text
trxmlid the trxml id of the document to be retrieved number
externalid the external id of the document to be retrieved, as an alternative to trxmlid. text
format the requested output format template, or original

Note

trxmlid is a unique ID automatically assigned by Sourcebox to a document when sent for parsing or indexing to Search. To link a document to an ID from an external system, such as an ATS, use externalid.

The input document selector is either:

  • trxmlid or
  • externalid

The service accepts only one of these arguments. When externalid is supplied and there are multiple documents in Sourcebox containing this externalid then the latest document is returned: the one with the highest trxmlid.

The possible output formats are:

  • template: The templated output (as defined by the Textractor configured for templating and the template name in Sourcebox account settings).
  • original: The uploaded CV in the original format (Word, PDF, etc.), including metadata. See below for encoding.

Example using curl🔗

POST request with application/x-www-form-urlencoded parameters

curl "https://home.textkernel.nl/sourcebox/manageDocument"
    --data account=test --data username=test --data password=xyz --data trxmlid=5
    --data action=get --data format=template

Original Output Format🔗

The output of format original contains both the encoded binary and the file metadata.

<document>
   <bytes>e1xydGYxXGFuc2lcZGVmZjBcYWBhc...</bytes>
   <trxmlID>320</trxmlID>
   <fileName>textkerneldemo--44-1378213610745.rtf</fileName>
   <contentType>text/richtext</contentType>
</document>

The document root element contains the following elements:

  • bytes: base64 encoded binary.
  • trxmlID: the trxml ID.
  • fileName: the file name as stored on the Sourcebox filesystem.
  • contentType: the MIME type of the document.

Download Service🔗

As an alternative to the get action there is a separate REST download service that returns the original document as a binary. The REST service is located here:

https://home.textkernel.nl/sourcebox/downloadDocument

The service is identical to the get action, given that the format is assumed to be the original document and that the output format is binary instead of consisting of an encoded base64 string with metadata.

Example curl POST request with application/x-www-form-urlencoded parameters:

curl "https://home.textkernel.nl/sourcebox/downloadDocument"
    --data account=test --data username=test --data password=xyz --data trxmlid=5

Error codes🔗

Below are the possible error codes returned by the REST service. In case of the REST service the servlet returns different HTTP status codes. In case of SOAP the HTTP status code on error is always 500.

Manage Document get Errors

Error Code Description Servlet status code
INVALID_ACTION Invalid HTTP method for this action 400
INVALID_ARGUMENT Only for the REST service: action must be get, getAttachments or delete. 400
INVALID_CREDENTIALS Combination of account, username and password is invalid. 401
DOCUMENT_NOT_FOUND No document found with the given trxml ID or external ID. 404
DOCUMENT_ACCESS_DENIED Access to the document is prohibited to the account/username. 403
FILE_NOT_FOUND The original document cannot be found on the file system. 404
TEMPLATING_ERROR The Textractor templator encountered an error. 500
OTHER A system fault. 500