Manage Document get🔗
The get action retrieves a document by ID. This is only available for persisting accounts.
Manage Document get parameters
Parameter | Description | Content type |
---|---|---|
account | the Sourcebox account name | text |
username | the username of the user | text |
password | the password associated to the user | text |
trxmlid | the trxml id of the document to be retrieved | number |
externalid | the external id of the document to be retrieved, as an alternative to trxmlid. | text |
format | the requested output format | template, or original |
Note
trxmlid is a unique ID automatically assigned by Sourcebox to a document when sent for parsing or indexing to Search. To link a document to an ID from an external system, such as an ATS, use externalid.
The input document selector is either:
- trxmlid or
- externalid
The service accepts only one of these arguments. When externalid is supplied and there are multiple documents in Sourcebox containing this externalid then the latest document is returned: the one with the highest trxmlid.
The possible output formats are:
- template: The templated output (as defined by the Textractor configured for templating and the template name in Sourcebox account settings).
- original: The uploaded CV in the original format (Word, PDF, etc.), including metadata. See below for encoding.
Example using curl🔗
POST request with application/x-www-form-urlencoded parameters
curl "https://home.textkernel.nl/sourcebox/manageDocument"
--data account=test --data username=test --data password=xyz --data trxmlid=5
--data action=get --data format=template
Original Output Format🔗
The output of format original contains both the encoded binary and the file metadata.
<document>
<bytes>e1xydGYxXGFuc2lcZGVmZjBcYWBhc...</bytes>
<trxmlID>320</trxmlID>
<fileName>textkerneldemo--44-1378213610745.rtf</fileName>
<contentType>text/richtext</contentType>
</document>
The document root element contains the following elements:
- bytes: base64 encoded binary.
- trxmlID: the trxml ID.
- fileName: the file name as stored on the Sourcebox filesystem.
- contentType: the MIME type of the document.
Download Service🔗
As an alternative to the get action there is a separate REST download service that returns the original document as a binary. The REST service is located here:
https://home.textkernel.nl/sourcebox/downloadDocument
The service is identical to the get action, given that the format is assumed to be the original document and that the output format is binary instead of consisting of an encoded base64 string with metadata.
Example curl POST request with application/x-www-form-urlencoded parameters:
curl "https://home.textkernel.nl/sourcebox/downloadDocument"
--data account=test --data username=test --data password=xyz --data trxmlid=5
Error codes🔗
Below are the possible error codes returned by the REST service. In case of the REST service the servlet returns different HTTP status codes. In case of SOAP the HTTP status code on error is always 500.
Manage Document get Errors
Error Code | Description | Servlet status code |
---|---|---|
INVALID_ACTION | Invalid HTTP method for this action | 400 |
INVALID_ARGUMENT | Only for the REST service: action must be get, getAttachments or delete. | 400 |
INVALID_CREDENTIALS | Combination of account, username and password is invalid. | 401 |
DOCUMENT_NOT_FOUND | No document found with the given trxml ID or external ID. | 404 |
DOCUMENT_ACCESS_DENIED | Access to the document is prohibited to the account/username. | 403 |
FILE_NOT_FOUND | The original document cannot be found on the file system. | 404 |
TEMPLATING_ERROR | The Textractor templator encountered an error. | 500 |
OTHER | A system fault. | 500 |