Extract skillsπ
The /extract
endpoint is a REST service offered via a POST method.
The service expects JSON as input, with up to four input fields: the input text
, the text's language
, the skill validation threshold
and the output_language
.
The input text
and language
are mandatory fields.
The input text
must be UTF-8 encoded.
This endpoint will
- extract the known skills (defined in the skill taxonomy) from the input text
- validate the extracted skills in context
- return the validated skills (with their position, confidence and normalized description)
Remember to send your authentication token with each request (see Authentication page).
Endpointπ
Method | Media | URL | Description |
---|---|---|---|
POST | application/json |
{{domain}}/extract |
Extract skills from input text |
Input parametersπ
Parameter | Type | Default | Description |
---|---|---|---|
text |
str |
None | The text to extract skills from |
language |
str |
None | The language of the input text in ISO 639-1 code |
threshold |
float |
0.5 | The minimum confidence threshold for including a skill in the response |
output_language |
str |
same as language |
The language (ISO639-1 code) or locale (ISO639-1_ISO3166-1 code) of the normalized skills |
Specify in the language
field one of the supported languages (ISO 639-1 code format).
The input text
should be written in one of the supported languages.
Setting the threshold
to a lower value than the default will result in more skill extractions per document, but also increases the chance that the results contain ambiguity-related erorrs (e.g., erroneously normalizing the word "access" to Microsoft Access).
A higher value of the threshold will have the opposite effect: fewer skills will be extracted, but there is also a lower chance of false positives. The recommended value is 0.5
(default) for CVs and vacancies, and 0.3
for other HR documents.
If the input is a single skill or list of skills, use the Normalize Endpoint instead.
Set the output_language
field only if you want to get normalized skill descriptions in a different language than the document language.
If so, set it to one of the supported languages (ISO639-1
code format) or locales (ISO639-1_ISO3166-1
code format).
Whenever a skill can't be normalized in the requested language, it will be normalized by default in English.
See the Overview for the list of supported languages.
Note that there is a limit of 50 000 characters on the full request body size. If you exceed the limit you will receive 403 Forbidden Request
HTTP responses.
Responseπ
Status | Content type | Content description |
---|---|---|
200 (OK) |
application/json |
A JSON object containing:
|
400 (Bad request) |
The input request body is incorrect | |
404 (Not Found) |
The language is not supported |
Exampleπ
$ curl -X POST https://api.textkernel.nl/skills/v2/extract \
-H "Authorization: Bearer $TOKEN" \
-H "accept: application/json" -H "Content-Type: application/json" \
-d '{ "text":"I am a Java/J2EE developer.", "language": "en", "threshold": 0.5 }'
{
"meta": {
"taxonomy_version": "2020-12-04T16:54:07.021406"
},
"skills": [
{
"category": "IT Skill",
"code_id": "KS123KG6DL8N3D5ZW036",
"confidence": 1.0,
"description": "Java Platform Enterprise Edition (J2EE)",
"matches": [
{
"begin_span": 7,
"end_span": 15,
"likelihood": 1.0,
"surface_form": "Java/J2EE"
}
]
}
],
"truncated": false,
"version": "1.15.1"
}
Response fieldsπ
Fields on each extracted skill:
Field | Type | Value |
---|---|---|
code_id |
str |
The code id of the normalized skill from the Taxonomy (unique across all languages) |
description |
str |
The description of the normalized skill concept from the taxonomy |
category |
str |
The category of the extracted skill. See the Overview for the list of supported categories. |
confidence |
float |
Overall confidence that the extracted term actually refers to a skill in the context of the text (gets the average value of the βlikelihoodβ values of the individual match scores |
iso_code |
str |
The language ISO 639-1 code (only for language skills) |
match.begin_span |
int |
Start position |
match.end_span |
int |
End position |
match.surface_form |
str |
The skill description as found in the input text. The evidence of the normalized skill. |
match.likelihood |
float |
Confidence that the extracted term actually refers to a skill in the context of the text. |
Rate limitsπ
Accounts have a limited request rate. If you exceed the limit you will receive 429 Too Many Requests
HTTP responses.
Plan | Limit | Units |
---|---|---|
Standard | 500 | Minute |
Demo | 30 | Minute |