Skip to content
Sourcebox API Reference
HTTP Form POST
latest

HTTP Form POST🔗

The document to be processed must be passed along with login information and any other data to the Sourcebox Extract! service as an HTTP POST multipart request with content type multipart/form-data at the following URLs:

https://home.textkernel.nl/match/extract.do

Note: The URL prefix (https://home.textkernel.nl/match in the example) may be different for you. Please check with your Textkernel consultant.

Extract service🔗

The request contains at least the following form parameters:

Parameter Description Parameter MIME type
uploaded_file the binary file data of the document to be processed see note below
account the account name of the user text/plain
username the username of the user text/plain
password the password of the user text/plain

Required Extract service HTTP POST parameters

!!! note The content of 'uploaded_file' must be the raw document. The Content-Type (MIME type) of the file is automatically detected by Extract and so can simply be 'application/octet-stream'.

On success, the request returns the templated result. The output can be in XML or JSON depending on the pipeline configuration. Section Error Handling below explains Sourcebox behaviour in case of error.

See below for a simple curl example for processing the file 'petra.doc':

curl https://home.textkernel.nl/match/extract.do \
      --form account=exampleaccount \
      --form username=exampleuser \
      --form password=examplepassword \
      --form uploaded_file=@/home/user/petra.doc  

Error Handling (extract.do)🔗

By default, in case of error, the Extract service returns an HTML error page with HTTP status 200. Information about the error and error description is included in the title section of the HTML, in the meta tags error-code and error-desc.

Alternatively, it is possible to enable HTTP protocol compliance, by adding useHttpErrorCodes=true to the POST URL. It is also possible to return error information as a JSON string by adding parameter useJsonErrorMsg=true to the POST URL. Note that useJsonErrorMsg=true implies useHttpErrorCodes=true.

To enable both JSON error messages and HTTP status compliance, POST to:

.../match/extract.do?useJsonErrorMsg=true

To only enable HTTP status compliance, POST to:

.../match/extract.do?useHttpErrorCodes=true

Sourcebox Error Codes🔗

General Errors

Error Code Description
DEFAULT_EXCEPTION An unexpected exception occurred.
UNSUPPORTED_OPERATION_ERROR The requested method is not supported by Sourcebox.

Access Errors

Error Code Description
INVALID_CREDENTIALS_ERROR The login credentials are invalid.
LIMIT_EXCEEDED The allotted amount of usable processing units for the account has been exceeded.
NO_ACCESS The resource is not visible to the user with the given credentials.
SESSION_INVALID The session related to the users cookie has already been invalidated.
TRXML_LOCKED The document cannot be accessed because it is currently edited by another user/process.

Configuration Errors

Error Code Description
CODETABLE_ERROR Sourcebox encountered a malformed code table or could not upload the code table to the normalizer.
CONFIGURATION_ERROR The configuration for the account or user is invalid.
MODEL_FILE_INVALID The model file contains invalid configuration options or the XML is not well formed.

Email Errors

Error Code Description
EMAIL_PARSING_ERROR An error occurred processing an email.
EMAIL_SENDING_ERROR An error occurred when Sourcebox attempted to send an email.
TEMPLATE_FORMAT_ERROR An email template could not be processed.
TEMPLATE_READING_ERROR An email template could not be found in the templater configured for the account.

Product Errors

Error Code Description
AGENT_ERROR Sourcebox could not download or process a URL.
EXTERNAL_SYSTEM_ERROR A request issued from Sourcebox to an external system could not be completed.
HTTP_ERROR When communicating with an external system via HTTP an HTTP error occurred.
OAUTH_ERROR An error occured when trying to retrieve a resource protected by OAUTH.
STORE_ERROR The templating result could not be stored with the configured exportmethod (product).

Sourcebox Internal Errors

Error Code Description
DATABASE_ERROR An error occurred when accessing the Sourcebox database.
TRXML_INVALID The document cannot be edited because it has become corrupted.
TRXML_RETRIEVAL_ERROR The current trxml cannot be edited because it is corrupted.
IO_ERROR Sourcebox could not access a resource on the file system.

OCR Errors

Error Code Description
OCR_ERROR An error occurred when processing an image with OCR functionality.
INVALID_HIGHLIGHT_COORDINATES Highlights for a OCRed document as returned from textractor cannot be parsed.

Processing Errors

Error Code Description
ATOMIC_POST_ERROR An unexpected error occurred when processing the HTTP POST request.
FILE_UPLOAD_ERROR An error occurred processing a file upload request.
TEXTRACTOR_ERROR A textractor error occurred when processing a document.
ENCODING_ERROR A code table or a trxml contains a byte sequence that is not UTF-8 encoded.
MIME_TYPE_MISSING The mime type of the uploaded document could not be determined from the document or was missing from the request.
INPUT_MISSING A required parameter has been omitted from a request.
INVALID_INPUT A required parameter contains a invalid value.
PROCESSING_INPUT_MISSING No data were received from textractor.
REPROCESSING_ERROR An error occurred when reprocessing an already processed document.
VALIDATION_ERROR The document is invalid according to the validation rules configured in the model file.
REQUEST_LIMIT_EXCEEDED Too many processing requests are being executed simultaneously by this account.

TMF Errors

Error Code Description
TMF_MERGE_ERROR The supplied TMF could not be merged into the document.
TMF_VALIDATION_ERROR The supplied TMF does not conform to the TMF XSD.

UI Errors

Error code Description
AJAX_ERROR An Ajax request from the CV editing UI caused an error.
EDITING_ERROR A error occurred when editing a trxml document in the Sourcebox user interface.

Troubleshooting🔗

In this section, some common HTML error messages associated with requests to the HTTP POST Extract service are clarified.

Authentication error🔗

Message: Page requires authentication (401). Access denied at login.

Solution: Make sure you are using the correct Sourcebox URL and provide a valid account/user/password combination.

Empty document🔗

Message: com.textkernel.txtor_ui.actions.ActionException: Error occurred while processing atomic post: No suitable processing strategy exists for this request!

Solution: Submit a document that is not empty.

Unsupported document type🔗

Message: com.textkernel.txtor_ui.actions.ActionException: Error occurred while processing atomic post: Preprocessing failed; ERROR: 500 Failed to convert document of type ...

Solution: An unsupported document type was submitted.