Querying
Textkernel's Search & Match Engine supports two types of queries, searching and matching.
Searching🔗︎
Searching is where a human specifies the criteria and tells the engine what to find. It's great for situations such as a recruiter needing to find experienced Java programmers and doesn't have a specific job requisition with other criteria. The recruiter can do a quick search for candidates with Java as a skill currently and select low, medium, or high level of experience and in return will get all of the candidates that possess these few data points.
For more details on how make a searching API call refer to the API Documentation.
Sort Order🔗︎
Results in searching are sorted in descending order based on a relevancy score. This score is calculated using a traditional TF-IDF statistical measure. We take the query passed in a step further and look for individual and groups of terms found in specific context on the resume/job. For example, when searching resumes for "Software Developer", we want to ensure that a resume with a current position of Software Developer is at the top of the list. We use a series of boosted terms to prefer candidates with recent Software Development experience, or even variations of that job title. The same is done for skills. This allows users to just add keywords, skip more complicated boolean syntax, and still be presented with the most relevant candidates at the top of the results set. This term expansion happens behind the scenes and keeps in tact the original meaning of the full-text boolean query.
Matching🔗︎
Matching is where a document is provided and Textkernel determines the criteria and returns the best candidates. Matching allows for humans to tell the engine what types of data are important using category weights while still letting Textkernel do the heavy lifting of generating the query.
Ideally, you'd have so many great matching resumes or jobs that those are all you'd see, but anyone that's done recruiting knows that there are at best a small number of great matches followed by a huge number of partial or weak matches. In some cases, depending on the actual contents of your index, the best match may be a weak match. The job of the Search & Match Engine is to show you the best matches, ranked best to worst by absolute score, so that your users can spend their time more efficiently than wading through huge volumes of bad matches.
Since our matching engine excels at sorting and scoring large document sets, it's not recommended to ever return more than 100 documents from a transaction. This is more results than a human has time to look through, and by going that deep in the dataset the results will be significantly worse. If you need to narrow the focus of a transaction, use the filtering layer to restrict the document set to a smaller subset of the index.
There are two endpoints you can use to perform a match.
- Specify the id (case-insensitive) of a document that's already indexed (API Documentation)
- Specify the parsed document as a string (API Documentation)
Scores🔗︎
Scores in matching are absolute and aren't influenced by filters or other data in the index. If the best fit you have scored a 1 (very low score) then it will be reported as a 1. Our engine doesn't use flawed density calculations or other common full-text scoring algorithms. We developed our own scoring calculation that evaluates two documents as a human would. Our matching returns those scores broken down by category with suggested weights for each category. This allows us to give suggested scores without removing users' ability to influence the calculation.
SovScore🔗︎
SovScore is the overall score that represents the bidirectional fit between the source and target documents. This score is calculated from the Weighted Score and Reverse Compatibility Score using a proprietary algorithm. This considers both directions of the match.
Weighted Score🔗︎
An integer score from 0-100 representing how well the target document matched the source document. This calculation is the sum of the unweighted category scores multiplied by their respective suggested weight.
Reverse Compatibility Score (RCS)🔗︎
An integer score from 0-100 which represents how well the source document matched to the target document. This isn't the same as WeightedScore because when doing the reverse calculation, we are analyzing for all of the data from the target document to be found in the source document. For example, a position for a software developer that needs C#, Web Api, and SQL Server would have a high weighted score when matched to a full-stack engineer that knows C#, Web API, SQL Server, Html, CSS, JavaScript, typescript, and many other skills; however, the RCS score would be much lower, because from the candidates perspective, this job only requires a very small subset of the breadth of skills the candidate has.
Categories🔗︎
Category | Explanation |
---|---|
Certifications | List of certifications required or attained. |
Education | Highest degree level required or attained. |
Executive Type | If an executive, then the type of executive (such as Executive, Operations, Financial, IT). |
Job Titles | Exact and partial match position titles. |
Languages | Foreign languages required or attained. |
Management Level | The level of management required or attained, from low-level leadership and supervisory positions, to mid-level managers and directors, to high-level VPs, to C-level executives. |
Skills | List of skills required or attained. |
Industries | Best Fit Taxonomy (industry) as calculated by an analysis of skills to fit a resume or job into categories such as "Information Technology → Programming" or "Finance → Accounting". |
Filtering🔗︎
There are many cases where you need to restrict the result set by some mandatory criteria. For example, you need to find a Java developer for a position in Dallas, Texas. You only want to see candidates that are a logistical fit for this position, so you only consider candidates within 30 miles of Dallas, Texas. This can be accomplished with a filter. For matching, filters are executed like this: (filter) AND (scoredMatchQuery)
. For the match to return a document as a result, it must satisfy criteria from both the filter and the scoredMatchQuery. If you expect your query to return results but you are not receiving any, try making the filter less restrictive or remove it altogether. Filters restrict the document set that matching evaluates doesn't impact the Bimetric Scoring calculation. Filters can be specified on both search and match API calls.
Semantic Filter🔗︎
Filter a result set using an object-based representation of your query. These are the easiest to build, but don't have the same level of boolean flexibility as a full text filter. All of the properties queries are joined together by AND, but the terms in each property are joined together by OR by default. Some examples are provided below, but for full explanation of the FilterCriteria object and all of the available properties refer to the Search and Match endpoints. For example, to filter the result set to documents containing document id 1 or document id 2 AND employer Google AND (skill java or skill c#) use:
{
"FilterCriteria": {
"DocumentId": [ "1", "2" ],
"Employers": ["Google"],
"Skills": [
{
"SkillName": "java",
},
{
"SkillName": "c#",
}
]
}
}
Location Filtering🔗︎
Our engine allows you to filter searches and matches based on an exact location, or a radius from that location. For example, we can filter to an exact match to Dallas, Texas, or say 25 miles from Dallas, Texas. When using a distance, our API will call out to Google to get the geo coordinates of the address you specify. Just like the Geocode endpoint, you can specify your account Google or Bing account, or you can pass the latitude and longitude directly into the search or match call. Geocoding in a search or match follows the same cost structure as the Geocode endpoint and is documented here.
Filter an exact address
{
"FilterCriteria": {
"LocationCriteria": {
"Locations": [
{
"CountryCode": "US",
"Region": "Texas",
"Municipality": "Dallas"
}
]
}
}
}
Filter 25 miles from an exact address using the built in Google account
{
"FilterCriteria": {
"LocationCriteria": {
"Locations": [
{
"CountryCode": "US",
"Region": "Texas",
"Municipality": "Dallas"
}
],
"Distance": 25,
"DistanceUnit": "Miles"
}
}
}
Filter 25 miles from an exact address using predefined latitude and longitude
{
"FilterCriteria": {
"LocationCriteria": {
"Locations": [
{
"GeoPoint": {
"Latitude": 32.780115,
"Longitude": -96.7851757
},
}
],
"Distance": 25,
"DistanceUnit": "Miles"
}
}
}
Full-text Filter🔗︎
Filter a result set using a custom query expression. For example, to filter the result set to documents containing the word "Entity Framework" and currently have the skill c# use:
This is just a simple example of what full-text filtering can accomplish. This field supports standard full-text searches, semantic searches, or a combination of both types of searches.
Boolean Syntax🔗︎
A Boolean search request consists of a group of words or phrases linked by connectors such as AND, OR, NOT
that indicate the relationship between them.
Expression | Explanation |
---|---|
apple AND pear | Both words must be present |
apple OR pear | Either word can be present |
apple AND NOT pear | apple must be present and pear must not be present |
If you use more than one connector, you should use parentheses to indicate precisely what you want to search for. For example, apple AND pear OR orange
could mean (apple AND pear) or orange
, or it could mean apple AND (pear OR orange)
.
Special Operators🔗︎
Search terms may include the following special characters:
Character | Explanation |
---|---|
( ) | Parentheses for precedence and grouping |
? | Matches any single character. Example: appl? matches apply or apple. |
* | Matches any number of characters. Example: appl* matches application |
~ | Fuzzy search. Example: managang~ matches managing. |
Semantic Expressions🔗︎
Search expressions support Semantic Clauses, which are Textkernel extensions to the underlying search engine syntax that can be placed anywhere within the Boolean expression. Semantic Clauses take the following form: type:(term; parameter1=value; parameter2=value; ...)
Each parameter is separated by semicolons. If a term or parameter value contains an equal sign, semicolon or parentheses, then it must be surrounded by double-quotes or those characters must be escaped by a backslash character. When inside double-quotes, only double-quote characters must be escaped by a backslash. If the values come from user input, then make sure that you escape those values to prevent search syntax errors or unexpected results.
The following semantic clauses are supported:
Document Id🔗︎
A DocumentId value. This is often used when scoring a single document or a small collection of documents. Syntax docid:(term)
User-Defined Tag🔗︎
An alphanumeric token in one of the user-defined tags injected into Resumes or Jobs. These are often used for filtering or partitioning the data within an index by statuses or other custom fields. Syntax id:(term)
for a range use [id:(term) TO id:(term)]
.
Examples🔗︎
Term | Explanation |
---|---|
id:(CFZAvailable20150116) |
Filter the user-defined tag CFZAvailable20190116 |
[id:(CFZAvailable20190101) TO id:(CFZAvailable20190131)] |
Filter the user-defined tag is a range from CFZAvailable20190101 to CFZAvailable20190131 |
Taxonomy/Industry🔗︎
There is an implicit AND between each parameter and an implicit OR between each comma-delimited value. If you need more control over the Boolean logic, then use multiple taxonomy clauses. Syntax taxonomy:(parameters)
Parameters🔗︎
Term | Explanation |
---|---|
bestFit | Comma-delimited list of ids of the best fit top-level taxonomy. |
bestFitSub | Comma-delimited list of ids of the best fit sub-taxonomy within the best fit top-level taxonomy. |
secondBestFit | Comma-delimited list of ids of the second best fit top-level taxonomy. |
secondBestFitSub | Comma-delimited list of ids of the best fit sub-taxonomy within the second best fit top-level taxonomy. |
Skill🔗︎
A skill term. Syntax: skill:(term;parameters)
. Supports (*, ?
) wildcard characters after the third character in the term as defined in Special Operators.
Optional Parameters🔗︎
Term | Explanation |
---|---|
experienceLevel | Level of experience with this skill. Supported values: low , mid , and high . |
monthsAgo | Limit results to skills held within this number of months before the RevisionDate |
Examples🔗︎
Term | Explanation |
---|---|
skill:(java) |
Filter the skill java |
skill:(java;experienceLevel=low) |
Filter the skill java with low experience |
skill:(java;monthsAgo=0) |
Filter documents that have the skill java currently |
skill:(java;experienceLevel=low;monthsAgo=0) |
Filter documents that have the skill java currently and low experience |
Certification/License🔗︎
A certification or license term. Syntax certification:(term)
or license:(term)
. Supports (*, ?
) wildcard characters after the third character in the term as defined in Special Operators.
Job Title🔗︎
A position title in the candidate's employment history or describing a job. Syntax title:(term;parameters)
. Supports (*, ?
) wildcard characters after the third character in the term as defined in Special Operators.
Optional Parameters🔗︎
Term | Explanation |
---|---|
monthsAgo | Limit results to job titles held within this number of months before the RevisionDate |
includeVariations | Determines whether or not to include variations of the original job title (For example, Developer instead of Web Developer - defaults to true) |
Examples🔗︎
Term | Explanation |
---|---|
title:(Web Developer) | Filter the job title Web Developer |
title:(Web Developer;monthsAgo=0) | Filter documents that have the job title Web Developer currently |
title:(Web Developer;includeVariations=false) | Filter documents that have the exact job title Web Developer |
Assistant To🔗︎
A position to which the candidate was an assistant, such as assistant to the "CEO". Syntax assistantTo:(term)
Employer🔗︎
An employer/organization name in the candidate's employment history or describing a job. Syntax employer:(term;parameters)
. Supports (*, ?
) wildcard characters after the third character in the term as defined in Special Operators.
Optional Parameters🔗︎
Term | Explanation |
---|---|
monthsAgo | Limit results to employers held within this number of months before the RevisionDate |
Examples🔗︎
Term | Explanation |
---|---|
employer:(Google) | Filter the employer Google |
employer:(Google;monthsAgo=0) | Filter for current employer Google |
Executive Type🔗︎
The type of executive experience the candidate must have. Term must be one of the following list of supported executive types: NONE, EXECUTIVE, ADMIN, ACCOUNTING, OPERATIONS, FINANCIAL, MARKETING, BUSINESS_DEV, IT, GENERAL, LEARNING
. Syntax executiveType:(term)
Current Management Level🔗︎
The management level that a job requires or that a candidate has in the most recent position. Term must be one of the following list of supported management levels: None, Low, Mid, High
. Syntax currentManagementLevel:(term)
Author🔗︎
When true, resume must have at least one publication. When false, resume must not have any publications. This clause is only valid when resumes were parsed with PublicationHistory coverage enabled. Syntax isAuthor:(term)
Public Speaker🔗︎
When true, resume must have at least one speaking event. When false, resume must not have any speaking events. This clause is only valid when resumes were parsed with SpeakingEventsHistory coverage enabled. Syntax isPublicSpeaker:(term)
Has Military History🔗︎
When true, resume must have some military history. When false, resume must not have any military history. This clause is only valid when resumes were parsed with MilitaryHistory coverage enabled. Syntax isMilitary:(term)
Has Been Self Employed🔗︎
When true, resume must have some self-employment history. When false, resume must not have any self-employment history. Syntax hasBeenSelfEmployed:(term)
Has Patents🔗︎
When true, resume must have at least one patent. When false, resume must not have any patents. This clause is only valid when resumes were parsed with PatentHistory coverage enabled. Syntax hasPatents:(term)
Has Security Credentials🔗︎
When true, resume must have at least one security credential. When false, resume must not have any security credentials. This clause is only valid when resumes were parsed with SecurityCredentials coverage enabled. Syntax hasSecurityCredentials:(term)
Security Credential🔗︎
A security credential name. This clause is only valid when resumes were parsed with SecurityCredentials coverage enabled. Syntax securityCredential:(term)
. Supports (*, ?
) wildcard characters after the third character in the term as defined in Special Operators.
School Name🔗︎
Either an exact match or a normalized version of a school name. For example, a search for "Purdue" will match "Purdue University" and vice versa. Syntax schoolName:(term)
. Supports (*, ?
) wildcard characters after the third character in the term as defined in Special Operators.
Degree Name🔗︎
An exact match of a degree name. Syntax degreeName:(term)
. Supports (*, ?
) wildcard characters after the third character in the term as defined in Special Operators.
Degree Major🔗︎
An exact match of a degree major. Syntax degreeMajor:(term)
. Supports (*, ?
) wildcard characters after the third character in the term as defined in Special Operators.
Degree Type🔗︎
A specific type of degree that a resume has or that a job requires. The term can be a common name for a degree like "Bachelor of Science" or one of the following list of supported degree types defined by the HR-XML standard: ged, secondary, highSchoolOrEquivalent, certification, vocational, someCollege, HND_HNC_OrEquivalent, associates, international, bachelors, somePostgraduate, masters, intermediategraduate, professional, postprofessional, doctorate, postdoctorate
Syntax degreeType:(term)
Minimum Degree Level🔗︎
A specific type of degree that is the minimum that a resume must have or that a job requires. The term can be a common name for a degree like "Bachelor of Science" or one of the following list of supported degree types defined by the HR-XML standard: ged, secondary, highSchoolOrEquivalent, certification, vocational, someCollege, HND_HNC_OrEquivalent, associates, international, bachelors, somePostgraduate, masters, intermediategraduate, professional, postprofessional, doctorate, postdoctorate
Syntax minimumDegreeLevel:(term)
Minimum GPA🔗︎
A normalized GPA value from 0.0 to 1.0, with 1.0 being the top mark. For example, 3.5 on a scale of 4.0 would have a value of 0.875. Syntax minimumGPA:(term)
Location🔗︎
Location that takes multiple parameters AND'd together. Syntax location:(parameters)
Parameters🔗︎
Term | Explanation |
---|---|
country | ISO 3166 2-letter country code. |
postalCode | Postal Code. |
region | Name or abbreviation of a state or province or region. |
municipality | Name of a city. |
Postal Code🔗︎
A postal code value. Syntax postalcode:(term)
Municipality🔗︎
A city name. Syntax municipality:(term)
Region🔗︎
The name or standardized postal abbreviation for a state or province. Syntax region:(term)
Country🔗︎
ISO 3166 2-letter alpha code for the country. For example, "US" for United States, and "CA" for Canada. Syntax country:(term)
Language🔗︎
A language known by the candidate or required by a job. The value is an ISO 639-1 two letter alpha code. For example, "en" for English, and "fr" for French. Syntax language:(term)
Document Language🔗︎
The language of the document (resume or job). The value is an ISO 639-1 two letter alpha code. For example, "en" for English, and "fr" for French. Syntax docLanguage:(term)