Parse a Job🔗︎
HTTP Verb | Path |
---|---|
POST | /v10/parser/joborder |
Parse a single Job.
Info
- You can try this endpoint out at our Swagger page ( US Data Center | EU Data Center | AU Data Center )
- This service is designed to parse jobs. It assumes that all files passed to it are jobs. It does not attempt to detect whether a document is a job or not. It should not be used to try to extract information from other types of documents.
- Always send the original file, not the result of copy/paste, not a conversion by some other software, not a scanned image, and not a version marked up with recruiter notes or other non-job information. Be aware that if you pass garbage into the service, then you are likely to get garbage out. The best results are always obtained by parsing the original job file.
- In order to provide parsing for a wide range of languages, the parser does not provide the full data model for some languages.
- If you are running batch transactions (i.e. iterating through files in a folder), make sure that you do not try to reparse a file if you get an exception back from the service since you will get the same result each time and credits will be deducted from your account.
- Batch transactions must adhere to our Acceptable Use Policy.
Request Body🔗︎
DocumentAsBase64String 🔗︎ string
required
DocumentAsBase64String🔗︎
A Base64 encoded string of the job file bytes. This should use the standard 'base64' encoding as defined in RFC 4648 Section 4 (not the 'base64url' variant). .NET users can use the Convert.ToBase64String(byte[])
method.
SkillsSettings 🔗︎ object
SkillsSettings🔗︎
Enable skills normalization and specify the version of the skills taxonomy for this parsing transaction.
SkillsSettings properties
Normalize 🔗︎ bool
Normalize🔗︎
When true:
- Raw skills will be normalized. These will be output under
Value.JobData.Skills.Normalized
. Read moreabout the benefits of using a skills taxonomy.- WhenTaxonomyVersion
(see below) is set to (or defaults to)V2
, additional charges apply.
This setting has no effect when TaxonomyVersion
is set to (or defaults to) V1
.
TaxonomyVersion 🔗︎ string
TaxonomyVersion🔗︎
Specifies the version of the skills taxonomy to use. One of:
V1
(Deprecated) - This is the default for old accounts. Will be removed in a future release.V2
- This is the default for new accounts, and must be explicitly set if you have access to V1 and V2.
Benefits of V2 include:
- 2x larger skills taxonomy, updated frequently based on real-world data.
- 15-40% higher accuracy of extracted skills.
- Better clustering of skill synonyms.
- Distinguish skill types (IT / Professional / Soft).
- Compatibility with the taxonomy used in Textkernel's Skills Intelligence APIs and Jobfeed, enabling standardization of taxonomies across all of your data and benchmarking against jobs posted online.
ProfessionsSettings 🔗︎ object
ProfessionsSettings🔗︎
Enable normalization of job titles using our proprietary taxonomy and international standards.
ProfessionsSettings properties
Normalize 🔗︎ string
Normalize🔗︎
When true, the job title will be normalized. This includes a proprietary value from our profession taxonomy, plus ONET and ISCO mappings. Read more about the benefits of using a professions taxonomy.
When enabling professions normalization, additional charges apply.
The following languages are supported: English, Chinese (Simplified), Dutch, French, German, Italian, Polish, Portuguese, and Spanish. For documents in other languages, no normalized values will be returned.
For Textkernel Search & Match, normalized professions are automatically indexed and used when profession normalization is enabled during parsing (through IndexingOptions). To leverage profession normalization for user-created searches, enable profession normalization at query time.
The profession taxonomy and the mappings are compatible with the taxonomies used in Textkernel's Skills Intelligence APIs and Jobfeed, enabling standardization of taxonomies across all of your data and benchmarking against jobs posted online.
DocumentLastModified 🔗︎ string
required
DocumentLastModified🔗︎
Mandatory date, in YYYY-MM-DD format, representing the "current" or "as of" date used during parsing. This is useful when parsing older documents. Read more about this here.
OutputHtml 🔗︎ boolean
OutputHtml🔗︎
When true, the original file is converted to HTML and stored in the Html property.
OutputRtf 🔗︎ boolean
OutputRtf🔗︎
When true, the original file is converted to RTF and stored in the Rtf property.
OutputPdf 🔗︎ boolean
OutputPdf🔗︎
When true, the original file is converted to PDF and stored in the Pdf property as a byte array.
SkillsData 🔗︎ string[]
Deprecated
SkillsData🔗︎
This feature is not recommended and only available as an add-on. Please reach out to sales@textkernel.com .
String[] of your custom skills list names and the Textkernel "builtin" skills list. If no list is provided the Textkernel builtin skills list will be used. The parser automatically detects language and looks for a corresponding skills list in that language, if no match is found this list is ignored.
GeocodeOptions 🔗︎ object
GeocodeOptions🔗︎
Get or insert geocode coordinate values (latitude/longitude) during the parse transaction.
GeocodeOptions properties
IncludeGeocoding 🔗︎ bool
IncludeGeocoding🔗︎
When set to true we will automatically geocode the address that is parsed out leveraging an api call to our /geocode
endpoint, and thus will be charged accordingly . This parameter defaults to false.
Provider 🔗︎ string
Provider🔗︎
The Provider you wish to use to geocode the postal address (current options are "Google", "Bing", or "None"). If not specified, we will default to Google. If you are just trying to update the postal address in the document, please set this to "None". If passing "Google" or "Bing", ProviderKey is requried.
ProviderKey 🔗︎ string
ProviderKey🔗︎
The Provider Key for the specified Provider. If using Bing you must specify your own provider key.
PostalAddress 🔗︎ object
PostalAddress🔗︎
The postal address you wish to geocode. For best results, specify as many of the PostalAddress fields as possible. If provided, this address will be used to get the geocode coordinates instead of the address included in the ParsedDocument (if present), however, the address in the ParsedDocument will not be modified.
PostalAddress properties
CountryCode 🔗︎ string
CountryCode🔗︎
The ISO 3166-1 alpha-2 code indicating the country for the postal address.
GeoCoordinates 🔗︎ object
GeoCoordinates🔗︎
The geographic coordinates (latitude/longitude) for your postal address. Use this if you already have latitude/longitude coordinates and simply wish to add them to your parsed document. If provided, these values will be inserted into your ParsedDocument and the address included in the ParsedDocument (if present), will not be modified.
GeoCoordinates properties
IndexingOptions 🔗︎ object
IndexingOptions🔗︎
When your account is enabled for Matching/Searching you can automatically index documents during the parse transactions.
Skills Normalization must be included to index documents using V2 Skills Taxonomy. These algorithms ignore raw skills and only consider the normalized skill concepts for skills category scoring. This leads to improved scoring and ranking because normalization produces less false negatives than simple exact keyword matching.
IndexingOptions properties
IndexId 🔗︎ string
IndexId🔗︎
When your account is enabled for Matching/Searching you can automatically index documents during the parse transactions. This determines what index to place the parsed document in. This is case-insensitive.
DocumentId 🔗︎ string
DocumentId🔗︎
When your account is enabled for Matching/Searching you can automatically index documents during the parse transactions. This determines what id to give to the parsed document. This is restricted to alphanumeric with dashes and underscores. All values will be converted to lower-case.
Sample JSON
{
"DocumentAsBase64String": "",
"SkillsSettings": {
"Normalize": false,
"TaxonomyVersion": ""
},
"ProfessionsSettings": {
"Normalize": false,
"Version": {
"ONET": "2019"
}
},
"DocumentLastModified": "",
"OutputHtml": false,
"OutputRtf": false,
"OutputPdf": false,
"Configuration": {
"CountryCode": "",
"Language": "",
"KnownType": "",
"IncludeRecruitingTerms": false,
"IncludeSupplementalText": false,
"PreferShorterJobTitles": false
},
"GeocodeOptions": {
"IncludeGeocoding": false,
"Provider": "",
"ProviderKey": "",
"PostalAddress": {
"CountryCode": "",
"PostalCode": "",
"Region": "",
"Municipality": "",
"AddressLine": ""
},
"GeoCoordinates": {
"Latitude": 0,
"Longitude": 0
}
},
"IndexingOptions": {
"IndexId": "",
"DocumentId": "",
"UserDefinedTags": [
""
]
},
"SkillsData": [
""
]
}
Response Body🔗︎
Info 🔗︎ object
Info🔗︎
Information explaining the outcome of the transaction.
Info properties
Code 🔗︎ string
Code🔗︎
Code | Description |
---|---|
Success |
Successful transaction |
PossibleTruncationFromTimeout |
The timeout occurred before the document was finished parsing which can result in truncation |
ConversionException |
There was an issue converting the document |
MissingParameter |
A required parameter wasn't provided |
InvalidParameter |
A parameter was incorrectly specified |
AuthenticationError |
An error occurred with the credentials provided |
TransactionId 🔗︎ string
TransactionId🔗︎
The (GUID) id for a specific API transaction. Use this when contacting support@textkernel.com about issues.
EngineVersion 🔗︎ string
EngineVersion🔗︎
The version of the parsing/matching engine running under-the-hood.
TotalElapsedMilliseconds 🔗︎ integer
TotalElapsedMilliseconds🔗︎
How long the transaction took on Textkernel's server, in milliseconds. If the transaction takes longer to complete on the client side, that extra duration is solely network latency.
TransactionCost 🔗︎ decimal
TransactionCost🔗︎
How many credits the transaction costs.How many credits the transaction costs.
CustomerDetails 🔗︎ object
CustomerDetails🔗︎
Information about the customer who made the API call.
CustomerDetails properties
CreditsRemaining 🔗︎ decimal
CreditsRemaining🔗︎
The number of credits remaining to be used by the account.
Value 🔗︎ object
Value🔗︎
Contains response data for the transaction.
Value properties
ParsingResponse 🔗︎ object
ParsingResponse🔗︎
The status of the parse transaction.
ParsingResponse properties
GeocodeResponse 🔗︎ object
GeocodeResponse🔗︎
If geocoding was requested in the ParseOptions.GeocodeOptions
the status of the geocode transaction will be output here.
GeocodeResponse properties
IndexingResponse 🔗︎ object
IndexingResponse🔗︎
If indexing was requested in the ParseOptions.IndexingOptions
the status of the index transaction will be output here.
IndexingResponse properties
ProfessionNormalizationResponse 🔗︎ object
ProfessionNormalizationResponse🔗︎
If profession normalization was requested in the ProfessionsSettings.Normalize
the status of the profession normalization transaction will be output here.
ProfessionNormalizationResponse properties
JobData 🔗︎ object
JobData🔗︎
The main output from the Textkernel Job Parser.
JobData properties
CurrentJobIsManagement 🔗︎ bool
CurrentJobIsManagement🔗︎
Whether or not the job is a management position. Used by Textkernel for Search & Match.
HighestManagementScore 🔗︎ object
HighestManagementScore🔗︎
The management score, or null. Used by Textkernel for Search & Match.
HighestManagementScore properties
ManagementLevel 🔗︎ string
ManagementLevel🔗︎
The management level. Used by Textkernel for Search & Match. One of:
- None
- Low
- Mid
- High
ExecutiveType 🔗︎ string
ExecutiveType🔗︎
What kind of executive position the job is, if any. Used by Textkernel for Search & Match. One of:
- NONE
- ADMIN
- ACCOUNTING
- BUSINESS_DEV
- EXECUTIVE
- FINANCIAL
- GENERAL
- IT
- LEARNING
- MARKETING
- OPERATIONS
MinimumYears 🔗︎ object
MinimumYears🔗︎
The minimum years experience for the job, or null. Used by Textkernel for Search & Match.
MinimumYears properties
MaximumYears 🔗︎ object
MaximumYears🔗︎
The maximum years experience for the job, or null. Used by Textkernel for Search & Match.
MaximumYears properties
MinimumYearsManagement 🔗︎ object
MinimumYearsManagement🔗︎
The minimum years of management experience, or null. Used by Textkernel for Search & Match.
MinimumYearsManagement properties
MaximumYearsManagement 🔗︎ object
MaximumYearsManagement🔗︎
The maximum years of management experience, or null. Used by Textkernel for Search & Match.
MaximumYearsManagement properties
RequiredDegree 🔗︎ string
RequiredDegree🔗︎
The required educational degree, if listed. Used by Textkernel for Search & Match.
JobDescription 🔗︎ string
JobDescription🔗︎
Section containing information about the job. Job description strictly includes duties, tasks, and responsibilities for the role with as little irrelevant text as possible.
EmployerDescription 🔗︎ string
EmployerDescription🔗︎
Full text of any employer description listed by the job.
JobTitles 🔗︎ object
JobTitles🔗︎
The job titles found in the job. Used by Textkernel for Search & Match.
JobTitles properties
NormalizedProfession 🔗︎ object
NormalizedProfession🔗︎
If ProfessionsSettings.Normalize
was set to true, this will be populated for the most recent 3 positions.
NormalizedProfession properties
Profession 🔗︎ object
Profession🔗︎
Object containing the details of the profession concept.
Profession properties
Group 🔗︎ object
Group🔗︎
The object of the group to which the profession concept belongs.
Group properties
Class 🔗︎ object
Class🔗︎
The object of the class to which the profession concept belongs.
Class properties
EmployerNames 🔗︎ object
EmployerNames🔗︎
The employer names found in the job.
EmployerNames properties
Degrees 🔗︎ object[]
Degrees🔗︎
The educational degrees found listed in the job. Used by Textkernel for Search & Match.
Degrees properties
CertificationsAndLicenses 🔗︎ string[]
CertificationsAndLicenses🔗︎
Any certifications/licenses listed in the job. Used by Textkernel for Search & Match.
Skills 🔗︎ object
Skills🔗︎
Skills output when version 2 of the taxonomy is utilized.
Skills properties
Raw 🔗︎ object[]
Raw🔗︎
Array of skills exactly as found in the plain text of the document.
Raw properties
Normalized 🔗︎ object
Normalized🔗︎
Normalized skills output when version 2 of the taxonomy is utilized and SkillsSettings.Normalize
is set to true.
Normalized properties
RelatedProfessionClasses 🔗︎ object
RelatedProfessionClasses🔗︎
Professions most related to the document. Only output if version 2 of the skills taxonomy is utilized and SkillsSettings.Normalize
is set to true.
RelatedProfessionClasses properties
LanguageCodes 🔗︎ string[]
LanguageCodes🔗︎
Any languages listed in the job. Used by Textkernel for Search & Match.
CurrentLocation 🔗︎ object
CurrentLocation🔗︎
The location of the job, if listed. If no job location is found, this is the location of the company, if listed.
CurrentLocation properties
ApplicationDetails 🔗︎ object
ApplicationDetails🔗︎
Information about the application process.
ApplicationDetails properties
ApplicationDescription 🔗︎ string
ApplicationDescription🔗︎
Full text description of the application process.
ContactPhone 🔗︎ string
ContactPhone🔗︎
Normalized phone of the organization with international calling prefix. Can contain multiple values (concatenated by comma).
ContactEmail 🔗︎ string
ContactEmail🔗︎
Displayable email of the organization. Can contain multiple values (concatenated by comma).
Salary 🔗︎ object
Salary🔗︎
The salary found for the position If no lexical cues are available from the vacancy, the time scale is guessed based on predefined salary ranges. Here are some rough salary ranges (note: country-specific conditions may apply):
- 1 or 2 digits salary (9, 12): hourly
- 3 or 4 digits salary (3800, 5000): monthly
- 5 digit salary (38000, 50000): yearly
If a monthly salary is extracted, to get the annual salary it is multiplied by 14 (if country = AT) or 12 (all other countries).
Salary properties
RawMinimum 🔗︎ string
RawMinimum🔗︎
The raw, un-normalized, minimum value. This is returned as is in the text, so there is no guarantee that it will evaluate to a valid number and not a string.
MinimumWorkingHours 🔗︎ object
MinimumWorkingHours🔗︎
The minimum number of working hours per week, or null.
MinimumWorkingHours properties
MaximumWorkingHours 🔗︎ object
MaximumWorkingHours🔗︎
The maximum number of working hours per week, or null.
MaximumWorkingHours properties
IsRemote 🔗︎ bool
IsRemote🔗︎
Whether or not the position is remote. Includes fulltime, partial and temporary remote working opportunities.
EmploymentType 🔗︎ string
EmploymentType🔗︎
The type of employment. One of:
- unspecified
- fulltime
- parttime
- fulltime/parttime
ContractType 🔗︎ string
ContractType🔗︎
The contract type. One of:
- unspecified
- permanent
- temporary
- possibly_permanent
- interim
- franchise
- side
- internship
- voluntary
- freelance
- apprenticeship
- assisted
SkillsData 🔗︎ object[]
Deprecated
SkillsData🔗︎
Deprecated. Use v2 skills taxonomy and its associated Skills output.
JobMetadata 🔗︎ object
JobMetadata🔗︎
Metadata about the parsed job
JobMetadata properties
DocumentLanguage 🔗︎ string
DocumentLanguage🔗︎
The two-letter ISO 639-1 code for the language the document was written in.
UserDefinedTags 🔗︎ string[]
UserDefinedTags🔗︎
A list of User-Defined Tags that are assigned to this resume. These are used to filter search/match queries in the Search & Match Engine.
NOTE: you may add/remove these prior to indexing. This is the only property you may modify prior to indexing.
ConversionMetadata 🔗︎ object
ConversionMetadata🔗︎
Information about converting the document to plain text
ConversionMetadata properties
SuggestedFileExtension 🔗︎ string
SuggestedFileExtension🔗︎
The suggested extension based on the DetectedType.
OutputValidityCode 🔗︎ string
OutputValidityCode🔗︎
The computed validity based on the source text. This will indicate whether a document looks like a legitimate resume or not. See here for more details.
Conversions 🔗︎ object
Conversions🔗︎
Any additional conversions you requested will be here (eg: PDF or HTML).
Conversions properties
PDF 🔗︎ string
PDF🔗︎
If requested by ParseOptions.OutputPdf
, this is the document converted to a PDF. This is a byte[] as a Base64-encoded string.
HTML 🔗︎ string
HTML🔗︎
If requested by ParseOptions.OutputHtml
, this is the document converted to HTML.
ParsingMetadata 🔗︎ object
ParsingMetadata🔗︎
Information about the parsing transaction.
ParsingMetadata properties
ElapsedMilliseconds 🔗︎ string
ElapsedMilliseconds🔗︎
How long it took to parse the document, in milliseconds.
TimedOut 🔗︎ bool
TimedOut🔗︎
Whether or not the transaction timed out. If this is true, the returned data may be incomplete.
Sample JSON
{
"Info": {
"Code": "string",
"Message": "string",
"TransactionId": "string",
"EngineVersion": "string",
"ApiVersion": "string",
"TotalElapsedMilliseconds": 0,
"TransactionCost": 0,
"CustomerDetails": {
"AccountId": "string",
"Name": "string",
"IPAddress": "string",
"Region": "string",
"CreditsRemaining": 0,
"CreditsUsed": 0,
"ExpirationDate": "2021-12-31",
"MaximumConcurrentRequests": 0
}
},
"Value": {
"JobData": {
"CurrentJobIsManagement": true,
"HighestManagementScore": {
"Value": 0
},
"ManagementLevel": "string",
"ExecutiveType": "string",
"MinimumYears": {
"Value": 0
},
"MaximumYears": {
"Value": 0
},
"MinimumYearsManagement": {
"Value": 0
},
"MaximumYearsManagement": {
"Value": 0
},
"RequiredDegree": "string",
"JobDescription": "string",
"JobRequirements": "string",
"Benefits": "string",
"EmployerDescription": "string",
"StartDate": {
"Value": "2020-11-02"
},
"JobTitles": {
"MainJobTitle": "string",
"JobTitle": [
"string"
],
"NormalizedProfession": {
"Profession": {
"CodeId": 0,
"Description": ""
},
"Group": {
"CodeId": 0,
"Description": ""
},
"Class": {
"CodeId": 0,
"Description": ""
},
"ISCO": {
"Version": "",
"CodeId": 0,
"Description": ""
},
"ONET": {
"Version": "",
"CodeId": "",
"Description": ""
},
"Confidence": 0.0
}
},
"EmployerNames": {
"MainEmployerName": "string",
"EmployerName": [
"string"
]
},
"Degrees": [
{
"Name": "string",
"Type": "string",
"LocalEducationLevel": "string",
"InternationalEducationLevel": "string"
}
],
"SchoolNames": [
"string"
],
"CertificationsAndLicenses": [
"string"
],
"Skills": {
"Raw": [
{
"Name": "",
"Required": false
}
],
"Normalized": [
{
"Name": "",
"Type": "",
"Id": "",
"Required": false,
"RawSkills": [
""
]
}
],
"RelatedProfessionClasses": [
{
"Name": "",
"Id": "",
"Percent": 0,
"Groups": [
{
"Name": "",
"Id": "",
"Percent": 0,
"NormalizedSkills": [
""
]
}
]
}
]
},
"LanguageCodes": [
"string"
],
"CurrentLocation": {
"CountryCode": "string",
"PostalCode": "string",
"Regions": [
"string"
],
"Municipality": "string",
"StreetAddressLines": [
"string"
],
"GeoCoordinates": {
"Latitude": 0,
"Longitude": 0,
"Source": "string"
}
},
"ApplicationDetails": {
"ApplicationDescription": "string",
"ContactPerson": "string",
"ContactPhone": "string",
"ContactEmail": "string",
"Website": "string",
"ApplicationDeadline": {
"Value": "2020-11-02"
},
"PostedDate": {
"Value": "2020-11-02"
},
"ReferenceNumber": "string"
},
"Salary": {
"Minimum": {
"Value": 0
},
"Maximum": {
"Value": 0
},
"RawMinimum": "string",
"RawMaximum": "string",
"Currency": "string"
},
"MinimumWorkingHours": {
"Value": 0
},
"MaximumWorkingHours": {
"Value": 0
},
"WorkingHours": "string",
"IsRemote": true,
"DriversLicenses": [
"string"
],
"EmploymentType": "string",
"ContractType": "string",
"TermsOfInterest": [
"string"
],
"Owners": [
"string"
],
"SkillsData": [
{
"Root": "string",
"Taxonomies": [
{
"Id": "string",
"Name": "string",
"PercentOfOverall": 0,
"SubTaxonomies": [
{
"PercentOfOverall": 0,
"PercentOfParent": 0,
"SubTaxonomyId": "string",
"SubTaxonomyName": "string",
"Skills": [
{
"Id": "string",
"Name": "string",
"ExistsInText": true,
"Required": true,
"Variations": [
{
"Id": "string",
"Name": "string",
"ExistsInText": true,
"Required": true
}
]
}
]
}
]
}
]
}
],
"JobMetadata": {
"PlainText": "string",
"DocumentLanguage": "string",
"DocumentCulture": "string",
"ParserSettings": "string",
"DocumentLastModified": "2020-11-02"
},
"UserDefinedTags": [
"string"
]
},
"ParsingResponse": {
"Code": "Success",
"Message": "string"
},
"GeocodeResponse": {
"Code": "Success",
"Message": "string"
},
"IndexingResponse": {
"Code": "Success",
"Message": "string"
},
"ProfessionNormalizationResponse": {
"Code": "Success",
"Message": "string"
},
"ConversionMetadata": {
"DetectedType": "string",
"SuggestedFileExtension": "string",
"OutputValidityCode": "string",
"ElapsedMilliseconds": 0,
"DocumentHash": "string"
},
"Conversions": {
"PDF": "string",
"HTML": "string",
"RTF": "string",
"CandidateImage": "string",
"CandidateImageExtension": "string"
},
"ParsingMetadata": {
"ElapsedMilliseconds": 0,
"TimedOut": true,
"TimedOutAtMilliseconds": {
"Value": 0
}
}
}
}