Skip to content
Sourcebox API Reference
SOAP Web Interface
latest

SOAP Web Interface🔗

Sourcebox also provides its Extract CV and job processing service as a SOAP 1.2 web service. The SOAP interface is an alternative way to access the HTTP POST functionality described in the previous section.

The API namespace of this service is:

https://home.textkernel.nl/sourcebox/soap/extract

Note: The URL prefix (https://home.textkernel.nl/sourcebox in the example) may be different for you. Please check with your Textkernel consultant.

The location of the WSDL file is:

https://home.textkernel.nl/sourcebox/soap/extract?wsdl

The service provides three alternative methods:

  1. extract

  2. extractAdvanced

The extract method expects the following parameters:

  1. account (string): The Sourcebox account, required

  2. username (string): The username under that account, required

  3. password (string): The password for the username, required

  4. fileName (string): The name of the file to be processed, required

  5. fileContent (base64binary): The file’s binary, required

  6. tmfFileContent (base64binary): The TMF file binary (additional fields in Textkernel's internal format), optional

  7. apimap (base64binary): Customer specific vacancy xml, optional

  8. options (key - value pair): allows specification of processing parameters like skipStore, doValidation, etc

An example how to specify 2 parameters/options:

<soapenv:Envelope 
...
   <options>
       <key>skipStore</key>
       <value>true</value>
   </options>
   <options>
       <key>doValidation</key>
       <value>false</value>
   </options>
...
</soapenv:Envelope>

The service supports MTOM optimization which can be enabled by setting the request header.

Content-Type: application/xop+xml;charset=UTF-8;type="text/xml"

In this case fileContent, tmfFileContent and apimap will not be base64 binaries but they will be sent as MIME messages

Method extractAdvanced is an extension of method extract. It allows the caller to specify clientSpecificArguments, which can be injected into the processing result.

<soapenv:Envelope 
...
   <clientSpecificArguments>
       <key>myParameter</key>
       <value>true</value>
   </clientSpecificArguments>
...
</soapenv:Envelope>

Additional parameters are placed in the extraInfo section in the result, e.g. it will contain a field with key myParameter and value true. How injected values are used depends on the configuration of the account.

To specify multiple arguments please repeat the clientSpecificArguments tag:

<soapenv:Envelope 
...
   <clientSpecificArguments>
       <key>myParameter</key>
       <value>true</value>
   </clientSpecificArguments>
   <clientSpecificArguments>
       <key>mySecondParameter</key>
       <value>false</value>
   </clientSpecificArguments>
...
</soapenv:Envelope>    

If the processing is successful the method will synchronously return the templated result as CDATA in the body of the SOAP envelope. The data is included in the <return> element that is embedded in the <extractResponse> element:

<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/">
   <S:Body>
      <ns2:extractResponse 
          xmlns:ns2="http://home.textkernel.nl/sourcebox/soap/extract">
            <return>
               <![CDATA[<?xml version="1.0" encoding="UTF-8" ?>
                          <TextractorResult lang="dutch" user="1">
                             <Document account="test"  
                                filename="afile.doc" iscv="yes" lang="dutch"  
                                useoutputenc="utf-8-strict" username="user">
                                {{ MORE DATA }}
                              </Document>
                              <DocumentStructure>
                                {{ MORE DATA }}                                                              
                              </DocumentStructure>
                           </TextractorResult>
               ]]>
            </return>
      </ns2:extractResponse>
   </S:Body>
</S:Envelope>

Should the parsing mechanism detect an error, it will throw an exception of type ExtractException, the interface of which provides further detailed information about the error in question. The table below presents the information wrapped by ExtractException.

Exception field Description Example
description The description of the problem encountered The credentials given at login are not correct. Please make sure that the account, user and password are correct. If the problem repeats please contact your service vendor. For faster service please supply this output XML, the document which caused the error and your credentials.
tkURL Context URL https://home.textkernel.nl/sourcebox/
severity Problem severity integer range 0-3, 0 means no error. See details about retrying strategies below
id Problem ID Integer id related to the error description.

Retrying strategies

Severity Error type Advice
1 Temporary error Retry later, no human intervention required. Retry can be attempted following these guidelines: retry 10 times with incremental time between requests. After 10 requests stop the attempts.
2 Recoverable error It can be fixed with human intervention on configuration (e.g. wrong credentials, wrong endpoint URL). Retrying will not solve the issue.
3 Permanent error The document cannot be parsed, invalid input, no access, or requires development or new release either on customer or the Textkernel side (e.g. TMF issues). Retrying will not solve the issue.

SOAP exception types

Exception ID Description Severity
0 successful extraction 0
2 invalid credentials error 2
3 database error 1
4 preprocessor not running error 3
5 processor not running error 3
6 normalizer not running error 3
7 pre-merge normalizer not running error 3
10 preprocessor engine error 1
11 exception error 3
11 request limit exceeded 1
11 processing input missing 1
11 textractor error 1
11 file upload error 1
11 atomic post error 1
11 io error 1
11 trxml retrieval error 1
11 http error 1
11 external system error 1
11 agent error 1
11 trxml locked 1
11 session invalid 1
12 template reading error 3
13 unidenified output 0
14 tmf validation error 3
15 templator not running error 1
15 .eml file is can not be parsed 1
16 validation failed 3
17 unsupported operation error 3
18 limit exceeded 2
19 no access 3
20 codetable error 3
21 configuration error 2
22 invalid model file 2
23 email parsing error 3
24 email sending error 3
25 template format error 3
26 template reading error 3
27 oauth error 2
28 store error 3
29 invalid trxml 3
30 ocr error 3
31 invalid highlight coordinates 3
32 encoding error 3
33 mimetype missing 3
34 input missing 3
35 invalid input 3
36 reprocessing error 3
37 validation error 2
38 tmf merge error 3

Note: The alternative SOAP names for extract: processDocument and documentProcessor are deprecated and only kept for backwards compatibility. Using these names is not advised.