Skip to content
Skills Intelligence
Extract skills
latest

Extract skillsπŸ”—

The /extract endpoint is a REST service offered via a POST method.
The service expects JSON as input, with up to four input fields: the input text, the text's language, the skill validation threshold and the output_language.
The input text and language are mandatory fields.
The input text must be UTF-8 encoded.

This endpoint will

  • extract the known skills (defined in the skill taxonomy) from the input text
  • validate the extracted skills in context
  • return the validated skills (with their position, confidence and normalized description)

Remember to send your authentication token with each request (see Authentication page).

EndpointπŸ”—

Method Media URL Description
POST application/json {{domain}}/extract Extract skills from input text

Input parametersπŸ”—

Parameter Type Default Description
text str None The text to extract skills from
language str None The language of the input text in ISO 639-1 code
threshold float 0.5 The minimum confidence threshold for including a skill in the response
output_language str same as language The language (ISO639-1 code) or locale (ISO639-1_ISO3166-1 code) of the normalized skills

Specify in the language field one of the supported languages (ISO 639-1 code format).

The input text should be written in one of the supported languages.
There is a 50 000 characters limit on the input text, any text longer than the limit will be truncated.

Setting the threshold to a lower value than the default will result in more skill extractions per document, but also increases the chance that the results contain ambiguity-related erorrs (e.g., erroneously normalizing the word "access" to Microsoft Access). A higher value of the threshold will have the opposite effect: fewer skills will be extracted, but there is also a lower chance of false positives. The recommended value is 0.5 (default) for CVs and vacancies, and 0.3 for other HR documents. If the input is a single skill or list of skills, use the Normalize Endpoint instead.

Set the output_language field only if you want to get normalized skill descriptions in a different language than the document language.
If so, set it to one of the supported languages (ISO639-1 code format) or locales (ISO639-1_ISO3166-1 code format).
Whenever a skill can't be normalized in the requested language, it will be normalized by default in English.

See the Overview for the list of supported languages.

ResponseπŸ”—

Status Content type Content description
200 (OK) application/json A JSON object containing:
  • skills: an array of skill objects
  • truncated: boolean value indicating if the input text has been truncated
  • version: the API version
  • meta: an object including the taxonomy version (release date)
400 (Bad request) The input request body is incorrect
404 (Not Found) The language is not supported

ExampleπŸ”—

$ curl -X POST https://api.textkernel.nl/skills/v2/extract \
    -H "Authorization: Bearer $TOKEN" \
    -H "accept: application/json" -H "Content-Type: application/json" \
    -d '{ "text":"I am a Java/J2EE developer.", "language": "en", "threshold": 0.5 }'

{
  "meta": {
    "taxonomy_version": "2020-12-04T16:54:07.021406"
  },
  "skills": [
    {
      "category": "IT Skill",
      "code_id": "KS123KG6DL8N3D5ZW036",
      "confidence": 1.0,
      "description": "Java Platform Enterprise Edition (J2EE)",
      "matches": [
        {
          "begin_span": 7,
          "end_span": 15,
          "likelihood": 1.0,
          "surface_form": "Java/J2EE"
        }
      ]
    }
  ],
  "truncated": false,
  "version": "1.15.1"
}

Response fieldsπŸ”—

Fields on each extracted skill:

Field Type Value
code_id str The code id of the normalized skill from the Taxonomy (unique across all languages)
description str The description of the normalized skill concept from the taxonomy
category str The category of the extracted skill.
See the Overview for the list of supported categories.
confidence float Overall confidence that the extracted term actually refers to a skill in the context of the text (gets the average value of the β€˜likelihood’ values of the individual match scores
iso_code str The language ISO 639-1 code (only for language skills)
match.begin_span int Start position
match.end_span int End position
match.surface_form str The skill description as found in the input text. The evidence of the normalized skill.
match.likelihood float Confidence that the extracted term actually refers to a skill in the context of the text.

Rate limitsπŸ”—

Accounts have a limited request rate. If you exceed the limit you will receive 429 Too Many Requests HTTP responses.

Plan Limit Units
Standard 500 Minute
Demo 30 Minute