HTTP Form POST🔗
The document to be processed must be passed along with login information and any other data to the Sourcebox Extract! service as an HTTP POST multipart request with content type multipart/form-data at the following URLs:
https://home.textkernel.nl/match/extract.do
Note: The URL prefix (https://home.textkernel.nl/match
in the example) may be different for you. Please check with your Textkernel consultant.
Extract service🔗
The request contains at least the following form parameters:
Parameter | Description | Parameter MIME type |
---|---|---|
uploaded_file | the binary file data of the document to be processed | see note below |
account | the account name of the user | text/plain |
username | the username of the user | text/plain |
password | the password of the user | text/plain |
Required Extract service HTTP POST parameters
!!! note The content of 'uploaded_file' must be the raw document. The Content-Type (MIME type) of the file is automatically detected by Extract and so can simply be 'application/octet-stream'.
On success, the request returns the templated result. The output can be in XML or JSON depending on the pipeline configuration. Section Error Handling below explains Sourcebox behaviour in case of error.
See below for a simple curl example for processing the file 'petra.doc':
curl https://home.textkernel.nl/match/extract.do \
--form account=exampleaccount \
--form username=exampleuser \
--form password=examplepassword \
--form uploaded_file=@/home/user/petra.doc
Error Handling (extract.do)🔗
By default, in case of error, the Extract service returns an HTML error page with HTTP status 200. Information about the error and error description is included in the title section of the HTML, in the meta tags error-code and error-desc.
Alternatively, it is possible to enable HTTP protocol compliance, by adding useHttpErrorCodes=true to the POST URL. It is also possible to return error information as a JSON string by adding parameter useJsonErrorMsg=true to the POST URL. Note that useJsonErrorMsg=true implies useHttpErrorCodes=true.
To enable both JSON error messages and HTTP status compliance, POST to:
.../match/extract.do?useJsonErrorMsg=true
To only enable HTTP status compliance, POST to:
.../match/extract.do?useHttpErrorCodes=true
Sourcebox Error Codes🔗
General Errors
Error Code | Description |
---|---|
DEFAULT_EXCEPTION | An unexpected exception occurred. |
UNSUPPORTED_OPERATION_ERROR | The requested method is not supported by Sourcebox. |
Access Errors
Error Code | Description |
---|---|
INVALID_CREDENTIALS_ERROR | The login credentials are invalid. |
LIMIT_EXCEEDED | The allotted amount of usable processing units for the account has been exceeded. |
NO_ACCESS | The resource is not visible to the user with the given credentials. |
SESSION_INVALID | The session related to the users cookie has already been invalidated. |
TRXML_LOCKED | The document cannot be accessed because it is currently edited by another user/process. |
Configuration Errors
Error Code | Description |
---|---|
CODETABLE_ERROR | Sourcebox encountered a malformed code table or could not upload the code table to the normalizer. |
CONFIGURATION_ERROR | The configuration for the account or user is invalid. |
MODEL_FILE_INVALID | The model file contains invalid configuration options or the XML is not well formed. |
Email Errors
Error Code | Description |
---|---|
EMAIL_PARSING_ERROR | An error occurred processing an email. |
EMAIL_SENDING_ERROR | An error occurred when Sourcebox attempted to send an email. |
TEMPLATE_FORMAT_ERROR | An email template could not be processed. |
TEMPLATE_READING_ERROR | An email template could not be found in the templater configured for the account. |
Product Errors
Error Code | Description |
---|---|
AGENT_ERROR | Sourcebox could not download or process a URL. |
EXTERNAL_SYSTEM_ERROR | A request issued from Sourcebox to an external system could not be completed. |
HTTP_ERROR | When communicating with an external system via HTTP an HTTP error occurred. |
OAUTH_ERROR | An error occured when trying to retrieve a resource protected by OAUTH. |
STORE_ERROR | The templating result could not be stored with the configured exportmethod (product). |
Sourcebox Internal Errors
Error Code | Description |
---|---|
DATABASE_ERROR | An error occurred when accessing the Sourcebox database. |
TRXML_INVALID | The document cannot be edited because it has become corrupted. |
TRXML_RETRIEVAL_ERROR | The current trxml cannot be edited because it is corrupted. |
IO_ERROR | Sourcebox could not access a resource on the file system. |
OCR Errors
Error Code | Description |
---|---|
OCR_ERROR | An error occurred when processing an image with OCR functionality. |
INVALID_HIGHLIGHT_COORDINATES | Highlights for a OCRed document as returned from textractor cannot be parsed. |
Processing Errors
Error Code | Description |
---|---|
ATOMIC_POST_ERROR | An unexpected error occurred when processing the HTTP POST request. |
FILE_UPLOAD_ERROR | An error occurred processing a file upload request. |
TEXTRACTOR_ERROR | A textractor error occurred when processing a document. |
ENCODING_ERROR | A code table or a trxml contains a byte sequence that is not UTF-8 encoded. |
MIME_TYPE_MISSING | The mime type of the uploaded document could not be determined from the document or was missing from the request. |
INPUT_MISSING | A required parameter has been omitted from a request. |
INVALID_INPUT | A required parameter contains a invalid value. |
PROCESSING_INPUT_MISSING | No data were received from textractor. |
REPROCESSING_ERROR | An error occurred when reprocessing an already processed document. |
VALIDATION_ERROR | The document is invalid according to the validation rules configured in the model file. |
REQUEST_LIMIT_EXCEEDED | Too many processing requests are being executed simultaneously by this account. |
TMF Errors
Error Code | Description |
---|---|
TMF_MERGE_ERROR | The supplied TMF could not be merged into the document. |
TMF_VALIDATION_ERROR | The supplied TMF does not conform to the TMF XSD. |
UI Errors
Error code | Description |
---|---|
AJAX_ERROR | An Ajax request from the CV editing UI caused an error. |
EDITING_ERROR | A error occurred when editing a trxml document in the Sourcebox user interface. |
Troubleshooting🔗
In this section, some common HTML error messages associated with requests to the HTTP POST Extract service are clarified.
Authentication error🔗
Message: Page requires authentication (401). Access denied at login.
Solution: Make sure you are using the correct Sourcebox URL and provide a valid account/user/password combination.
Empty document🔗
Message: com.textkernel.txtor_ui.actions.ActionException: Error occurred while processing atomic post: No suitable processing strategy exists for this request!
Solution: Submit a document that is not empty.
Unsupported document type🔗
Message: com.textkernel.txtor_ui.actions.ActionException: Error occurred while processing atomic post: Preprocessing failed; ERROR: 500 Failed to convert document of type ...
Solution: An unsupported document type was submitted.