Skip to content
Tx Platform
Parse API

Parse a Job🔗︎

HTTP Verb Path
POST /v9/parser/joborder

Parse a single Job.

Info

  • You can try this endpoint out at our Swagger page ( US Data Center | EU Data Center | AU Data Center )
  • This service is designed to parse jobs. It assumes that all files passed to it are jobs. It does not attempt to detect whether a document is a job or not. It should not be used to try to extract information from other types of documents.
  • Always send the original file, not the result of copy/paste, not a conversion by some other software, not a scanned image, and not a version marked up with recruiter notes or other non-job information. Be aware that if you pass garbage into the service, then you are likely to get garbage out. The best results are always obtained by parsing the original job file.
  • In order to provide parsing for a wide range of languages, the parser does not provide the full data model for some languages.
  • If you are running batch transactions (i.e. iterating through files in a folder), make sure that you do not try to reparse a file if you get an exception back from the service since you will get the same result each time and credits will be deducted from your account.
  • Batch transactions must adhere to our Acceptable Use Policy.

Request Body🔗︎

DocumentAsBase64String 🔗︎ string required

DocumentAsBase64String🔗︎

A Base64 encoded string of the job file bytes. This should use the standard 'base64' encoding as defined in RFC 4648 Section 4 (not the 'base64url' variant). .NET users can use the Convert.ToBase64String(byte[]) method.

RevisionDate 🔗︎ string required

RevisionDate🔗︎

Mandatory date, in YYYY-MM-DD format, representing the "current" or "as of" date used during parsing. This is useful when parsing older documents. Read more about this here.

OutputHtml 🔗︎ boolean

OutputHtml🔗︎

When true, the original file is converted to HTML and stored in the Html property.

OutputRtf 🔗︎ boolean

OutputRtf🔗︎

When true, the original file is converted to RTF and stored in the Rtf property.

OutputPdf 🔗︎ boolean

OutputPdf🔗︎

When true, the original file is converted to PDF and stored in the Pdf property as a byte array.

Configuration 🔗︎ object Deprecated

Configuration🔗︎

Deprecated

SkillsData 🔗︎ string[] Deprecated

SkillsData🔗︎

This feature is not recommended and only available as an add-on. Please reach out to sales@textkernel.com.

String[] of your custom skills list names and the Textkernel "builtin" skills list. If no list is provided the Textkernel builtin skills list will be used. The parser automatically detects language and looks for a corresponding skills list in that language, if no match is found this list is ignored.

NormalizerData 🔗︎ string

NormalizerData🔗︎

Will be used in a future release.

GeocodeOptions 🔗︎ object

GeocodeOptions🔗︎

Get or insert geocode coordinate values (latitude/longitude) during the parse transaction.


GeocodeOptions properties

IncludeGeocoding 🔗︎ bool

IncludeGeocoding🔗︎

When set to true we will automatically geocode the address that is parsed out leveraging an api call to our/geocode endpoint, and thus will be charged accordingly. This parameter defaults to false.

Provider 🔗︎ string

Provider🔗︎

The Provider you wish to use to geocode the postal address (current options are "Google", "Bing", or "None"). If not specified, we will default to Google. If you are just trying to update the postal address in the document, please set this to "None". If passing "Google" or "Bing", ProviderKey is requried.

ProviderKey 🔗︎ string

ProviderKey🔗︎

The Provider Key for the specified Provider. If using Bing you must specify your own provider key.

PostalAddress 🔗︎ object

PostalAddress🔗︎

The postal address you wish to geocode. For best results, specify as many of the PostalAddress fields as possible. If provided, this address will be used to get the geocode coordinates instead of the address included in the ParsedDocument (if present), however, the address in the ParsedDocument will not be modified.


PostalAddress properties

CountryCode 🔗︎ string

CountryCode🔗︎

The ISO 3166-1 alpha-2 code indicating the country for the postal address.

PostalCode 🔗︎ string

PostalCode🔗︎

The postal code (or zip code) for the postal address

Region 🔗︎ string

Region🔗︎

The region (i.e. State for U.S. addresses) for the postal address.

Municipality 🔗︎ string

Municipality🔗︎

The municipality (i.e. City for U.S. addresses) for the postal address

AddressLine 🔗︎ string

AddressLine🔗︎

The address line (i.e. Street address for U.S. address) for the postal address

GeoCoordinates 🔗︎ object

GeoCoordinates🔗︎

The geographic coordinates (latitude/longitude) for your postal address. Use this if you already have latitude/longitude coordinates and simply wish to add them to your parsed document. If provided, these values will be inserted into your ParsedDocument and the address included in the ParsedDocument (if present), will not be modified.


GeoCoordinates properties

Latitude 🔗︎ float

Latitude🔗︎

The latitude coordinate value.

Longitude 🔗︎ float

Longitude🔗︎

The longitude coordinate value.

IndexingOptions 🔗︎ object

IndexingOptions🔗︎

When your account is enabled for Matching/Searching you can automatically index documents during the parse transactions.

Skills Normalization must be included to index documents using V2 Skills Taxonomy. These algorithms ignore raw skills and only consider the normalized skill concepts for skills category scoring. This leads to improved scoring and ranking because normalization produces less false negatives than simple exact keyword matching.


IndexingOptions properties

IndexId 🔗︎ string

IndexId🔗︎

When your account is enabled for Matching/Searching you can automatically index documents during the parse transactions. This determines what index to place the parsed document in. This is case-insensitive.

DocumentId 🔗︎ string

DocumentId🔗︎

When your account is enabled for Matching/Searching you can automatically index documents during the parse transactions. This determines what id to give to the parsed document. This is restricted to alphanumeric with dashes and underscores. All values will be converted to lower-case.

CustomIds 🔗︎ string[]

CustomIds🔗︎

The custom ids you want the document to have.

SkillsSettings 🔗︎ object

SkillsSettings🔗︎

Enable skills normalization and enhanced candidate summarization, and specify the version of the skills taxonomy for this parsing transaction.


SkillsSettings properties

Normalize 🔗︎ bool

Normalize🔗︎

When true:

  • Raw skills will be normalized. These will be output under Value.ResumeData.Skills.Normalized. Read moreabout the benefits of using a skills taxonomy.- An enhanced candidate summary is generated, leveraging the taxonomy structure to relate skills to profession groups.

When using TaxonomyVersion V2 (see below), additional charges apply. when normalization is enabled.

When you have access to TaxonomyVersion V1, and did not set the taxonomy to V2 explicitly (see below), normalization is enabled by default and the candidate summary is generated using the V1 taxonomy structure.

TaxonomyVersion 🔗︎ string

TaxonomyVersion🔗︎

Specifies the version of the skills taxonomy to use. One of:

  • V1 (Deprecated) - This is the default for old accounts. Will be removed in a future release.
  • V2 - This is the default for new accounts, and must be explicitly set if you have access to V1 and V2.

Benefits of V2 include:

  • 2x larger skills taxonomy, updated frequently based on real-world data
  • 15-40% higher accuracy of extracted skills
  • Better clustering of skill synonyms
  • Distinguish skill types (IT / Professional / Soft)
  • Improved candidate summary
  • Compatibility with the taxonomy used in Textkernel's Skills Intelligence APIs and Jobfeed, enabling standardization of taxonomies across all of your data and benchmarking against jobs posted online.
ProfessionsSettings 🔗︎ object

ProfessionsSettings🔗︎

Enable normalization of job titles using our proprietary taxonomy and international standards.


ProfessionsSettings properties

Normalize 🔗︎ bool

Normalize🔗︎

When true, the most recent 3 job titles will be normalized. This includes a proprietary value from our profession taxonomy, plus ONET and ISCO mappings. Read more about the benefits of using a professions taxonomy.

When enabling professions normalization, additional charges apply.

The following languages are supported: English, Chinese (Simplified), Dutch, French, German, Italian, Polish, Portuguese, and Spanish. For documents in other languages, no normalized values will be returned.

For Textkernel Search & Match, normalized professions are automatically indexed and used when profession normalization is enabled during parsing (through IndexingOptions). To leverage profession normalization for user-created searches, enable profession normalization at query time.

The profession taxonomy and the mappings are compatible with the taxonomies used in Textkernel's Skills Intelligence APIs and Jobfeed, enabling standardization of taxonomies across all of your data and benchmarking against jobs posted online.

Version 🔗︎ object

Version🔗︎

Specifies the versions to use when normalizing professions if more than one is available for a taxonomy.


Version properties

ONET 🔗︎ string

ONET🔗︎

The ONET Version to use when normalizing professions. One of:

  • 2010
  • 2019

This parameter defaults to "2010".

2010 is deprecated and this will be defaulted to 2019 as of January 2025

Sample JSON
{
  "DocumentAsBase64String": "",
  "RevisionDate": "",
  "OutputHtml": false,
  "OutputRtf": false,
  "OutputPdf": false,
  "Configuration": {
    "CountryCode": "",
    "Language": "",
    "KnownType": "",
    "IncludeRecruitingTerms": false,
    "IncludeSupplementalText": false,
    "PreferShorterJobTitles": false
  },
  "SkillsData": [
    ""
  ],
  "NormalizerData": "",
  "GeocodeOptions": {
    "IncludeGeocoding": false,
    "Provider": "",
    "ProviderKey": "",
    "PostalAddress": {
      "CountryCode": "",
      "PostalCode": "",
      "Region": "",
      "Municipality": "",
      "AddressLine": ""
    },
    "GeoCoordinates": {
      "Latitude": 0,
      "Longitude": 0
    }
  },
  "IndexingOptions": {
    "IndexId": "",
    "DocumentId": "",
    "CustomIds": [
      ""
    ]
  },
  "SkillsSettings": {
    "Normalize": false,
    "TaxonomyVersion": ""
  },
  "ProfessionsSettings": {
    "Normalize": false,
    "Version": {
      "ONET": "2019"
    }
  }
}

Response Body🔗︎

Info 🔗︎ object

Info🔗︎

Information explaining the outcome of the transaction.


Info properties

Code 🔗︎ string

Code🔗︎

Code Description
Success Successful transaction
PossibleTruncationFromTimeout The timeout occurred before the document was finished parsing which can result in truncation
ConversionException There was an issue converting the document
MissingParameter A required parameter wasn't provided
InvalidParameter A parameter was incorrectly specified
AuthenticationError An error occurred with the credentials provided
Message 🔗︎ string

Message🔗︎

This message further describes the code providing additional detail.

Value 🔗︎ object

Value🔗︎

Contains response data for the transaction.


Value properties

ParsedDocument 🔗︎ string

ParsedDocument🔗︎

The parser results in JSON string format.

FileType 🔗︎ string

FileType🔗︎

The input file type that was submitted through the ParseJobOrderRequest.

Text 🔗︎ string

Text🔗︎

The plain text of the parsed job order.

TextCode 🔗︎ string

TextCode🔗︎

A response code indicating the status of the conversion to Text.

Html 🔗︎ string

Html🔗︎

HTML version of the input file, if OutputHtml was set to true.

HtmlCode 🔗︎ string

HtmlCode🔗︎

A response code indicating the status of the conversion to HTML.

Rtf 🔗︎ string

Rtf🔗︎

RTF version of the input file, if OutputRtf was set to true.

RtfCode 🔗︎ string

RtfCode🔗︎

A response code indicating the status of the conversion to RTF.

Pdf 🔗︎ string

Pdf🔗︎

The PDF version of the input file, if OutputPdf was set to true. The file bytes are in a Base64-encoded string.

PdfCode 🔗︎ string

PdfCode🔗︎

A response code indicating the status of the conversion to PDF.

FileExtension 🔗︎ string

FileExtension🔗︎

The recommended file extension of the input file.

CreditsRemaining 🔗︎ decimal

CreditsRemaining🔗︎

The number of remaining credits is returned with every response. Please ensure that you set up monitoring of this value to ensure that you don't experience an outage by letting your credits reach 0.

GeocodeResponse 🔗︎ object

GeocodeResponse🔗︎

If Request.GeocodeOptions.IncludeGeocoding is set to true (thus geocoding is executed), this object will be populated with a response.


GeocodeResponse properties

Code 🔗︎ string

Code🔗︎

Maps to the Response.Code parameter of a Geocode Transaction.

Message 🔗︎ string

Message🔗︎

Maps to the Response.Message parameter of a Geocode Transaction.

IndexingResponse 🔗︎ object

IndexingResponse🔗︎

If Request.IndexingOptions contains any specified parameters, this object will be populated with a response.


IndexingResponse properties

Code 🔗︎ string

Code🔗︎

Maps to the Response.Code parameter of a Index a Document Transaction.

Message 🔗︎ string

Message🔗︎

Maps to the Response.Message parameter of a Index a Document Transaction.

ProfessionNormalizationResponse 🔗︎ object

ProfessionNormalizationResponse🔗︎

If profession normalization was requested in the ProfessionsSettings.Normalize the status of the profession normalization transaction will be output here.


ProfessionNormalizationResponse properties

Code 🔗︎ string

Code🔗︎

Code Description
Success Successful transaction
Unhandled Exception Unhandled Exception
Message 🔗︎ string

Message🔗︎

A short human-readable description explaining the Code value.

Sample JSON
{
  "Info": {
    "Code": "",
    "Message": ""
  },
  "Value": {
    "ParsedDocument": "",
    "FileType": "",
    "Text": "",
    "TextCode": "",
    "Html": "",
    "HtmlCode": "",
    "Rtf": "",
    "RtfCode": "",
    "Pdf": "",
    "PdfCode": "",
    "FileExtension": "",
    "CreditsRemaining": 0,
    "GeocodeResponse": {
      "Code": "",
      "Message": ""
    },
    "IndexingResponse": {
      "Code": "",
      "Message": ""
    },
    "ProfessionNormalizationResponse": {
      "Code": "Success",
      "Message": "string"
    }
  }
}