Parser Output
Output Overviewπ︎
Default Sectionsπ︎
By default, the resume parser will output the following sections:
Section Type | Description |
---|---|
Contact Info | The Contact Info section represents all contact related information such as name, phone number, address, etc. |
Objective | Job Objective that was found in the resume |
Position History | A list of all of the positions held by the candidate including employer, dates, descriptions, and a user area with metadata |
Education | A list of all education related information including school type, school name, degree type, etc... |
Licenses & Certifications | A list of all certifications and licenses found in the resume |
Skills | A list of all of the skills found in the enabled sections of the resume. Output includes skill name, total months of use, last used date, where it was found in the document, and information about the taxonomy. |
Languages | Includes information about the language the document was written in as well as the languages that a candidate can write/speak/read |
Personal Information | This section includes date of birth, gender, mother tongue, nationality, visa, etc... |
Training | A list of trainings specified in the resume |
Achievements | A list of the achievements specified by the candidate |
Associations | A list of the associations specified by the candidate along with their role |
References | A list of references specified in the resume including contact info if specified |
Hobbies | Outputs the text found pertaining to hobbies |
Optional Sectionsπ︎
- There is very rarely a reason to parse for this data. If you don't have a specific use-case for this data, don't enable these sections.
- Don't expect this data to be accurate. Expect this data to give you a sense that this person has a lot of speaking engagements, or this candidate has a lot of patents, or none at all. Don't expect this to give usable data to go find a publication in the library of congress.
By default, the resume parser won't output the following sections, but they can be enabled with a configuration setting that's documented in the configuration options link:
Section Type | Description |
---|---|
Patents | A list of patents specified in the document |
Publications | A list of publications specified in the document |
Speaking Engagements | A list of speaking engagements specified in the document |
Security Credentials | A list of security credentials specified in the document |
Military History | A list of military history specified in the document |
Contact Infoπ︎
The Parser does not standardize addresses. Address standardization services are available, including for example the Google Maps API, that can take the Parser's contact info fields and standardize/geocode the data.
Contact Methodsπ︎
Each ContactMethod element allows one of each of the following sub-elements:
- Use
- Location
- WhenAvailable
- Telephone
- Mobile
- Fax
- Pager
- TTYTDD
- InternetEmailAddress
- InternetWebAddress
- PostalAddress
If a resume contains more than one of the same type of these items, such as two Telephone numbers, then they must be reported in a separate ContactMethod object. For example:
"ContactMethod": [
{
"Use": "personal",
"Location": "onPerson",
"WhenAvailable": "anytime",
"Mobile": {
"FormattedNumber": "(858) 353-6553"
}
},
{
"Use": "business",
"Location": "office",
"Telephone": {
"FormattedNumber": "(858) 678-8765"
}
},
{
"Use": "personal",
"Location": "onPerson",
"WhenAvailable": "anytime",
"InternetEmailAddress": "missmadams@yahoo.com"
},
{
"Use": "personal",
"Location": "onPerson",
"WhenAvailable": "anytime",
"InternetEmailAddress": "missmadams@tdiff.com"
},
{
"Use": "twitterHandle",
"Location": "onPerson",
"WhenAvailable": "anytime",
"InternetWebAddress": "@twitQueen"
}
]
Phone Numbersπ︎
The Parser outputs phone numbers in one of two forms: Formatted or Structured. Unfortunately, a single number cannot be represented in both forms in the current schema, so you must choose which to use. By default, the Parser only outputs FormattedNumber elements.
Textkernel provides the config string setting OutputFormat.TelcomNumber.Style to control the phone number output format. This setting accepts the following values:
Raw (default)π︎
Output the number in a FormattedNumber element exactly as it appeared in the original document.
Formattedπ︎
Output the number in a FormattedNumber element in a normalized format, if possible; otherwise fallback to Raw. US/Canadian phone numbers are normalized to this format: (NNN) NNN-NNNN, or (NNN) NNN-NNNN x NNN when an extension is included.
Structuredπ︎
Output in the multi-element structured format, if possible; otherwise fallback to Formatted.
"Telephone": {
"InternationalCountryCode": "1",
"AreaCityCode": "858",
"SubscriberNumber": "678-8765"
}
The Formatted and Structured settings currently only apply to US/Canadian numbers. Due to the hugely varied colloquial formats of phone numbers in other countries, we have been unable to reliably normalize the number parts. As a consequence, even if you set the style to Structured , you will still get some FormattedNumber elements in the output, so your code will need to handle both cases.
Normalize Regionπ︎
By default, the Parser reports the Region as it was detected in the document. When this setting is turned on ( OutputFormat.NormalizeRegions = True
), the parser normalizes Region values to the standard postal abbreviations. For example, 'Texas' to 'TX'. This setting currently only applies to US states and Canadian provinces.
Position Historyπ︎
Job Categoriesπ︎
The following type of output is always generated for each PositionHistory element:
"JobCategory": [
{
"TaxonomyName": "Skills taxonomy",
"CategoryCode": "Information Technology β Internet",
"Comments": "Information Technology describes 79% of this job"
},
{
"TaxonomyName": "Job Level",
"CategoryCode": "Executive (VP, Dept Head)"
}
]
For Job Level , the CategoryCode is one of the following values, based on the length of experience and job titles:
- Low Level
- Entry Level
- Experienced (non-manager)
- Senior (more than 5 years experience)
- Manager
- Senior Manager (more than 5 years management experience)
- Executive (VP, Dept. Head)
- Senior Executive (President, C-level)
Stripping Out Reported Data from Jobsπ︎
By default, the PositionHistory/Description element includes the descriptive text that is related to a particular PositionHistory element, but not including the portion which contains the title, company, location and date. If you want the Description element to have all of the text associated with a position, including the parsed data points, then set this option to false.
See below that the default behavior strips this text from the βDescriptionβ node:
While this works well for most resumes, it can cause problems with some resumes that do not have all the data points together. Some data may be buried far away from other data, or at the end of the description, and in such cases, more data will be stripped out than expected, leaving an incomplete Description.
Strip Parsed Dataπ︎
OutputFormat.StripParsedDataFromPositionHistoryDescription = true
- Default Value
"PositionHistory": [
{
"@positionType": "directHire",
"@currentEmployer": "true",
"Title": "Director of Web Applications Development",
"OrgName": {
"OrganizationName": "Technical Difference"
},
...,
"Description": "β’ Add new technology to website to manage leads, increase response time and provide pertinent information...",
}
Include Parsed Dataπ︎
OutputFormat.StripParsedDataFromPositionHistoryDescription = false
"PositionHistory": [
{
"@positionType": "directHire",
"@currentEmployer": "true",
"Title": "Director of Web Applications Development",
"OrgName": {
"OrganizationName": "Technical Difference"
},
...,
"Description": "Technical Difference Solana Beach, California\tOctober 2004 - Current Director of Web Applications Development β’ Add new technology to website to manage leads, increase response time and provide pertinent information...",
}
Reformat PositionHistory Descriptionπ︎
By default, the PositionHistory/Description element retains as much of the original formatting as possible. For example:
β’ Add new technology to website to manage leads, increase response time and provide pertinent information to new customers.
β’ Convert current HRIS from VB to ASP to create complete web based solution.
β’ Added custom encryption coding to SQL and ASP web applications.
β’ Designed custom applicant tracking ASP program for large client.
β’ Designed customer support application to receive requests/files from clients, divert to appropriate support staff, and track issue from open to resolve.
When this settings is enabled (OutputFormat.ReformatPositionHistoryDescription = True
) the Parser will remove blank lines, split long paragraphs into separate lines, and other reformatting techniques intended to place each achievement on a separate line. Example:
Add new technology to website to manage leads, increase response time and provide pertinent information to new customers.
Convert current HRIS from VB to ASP to create complete web based solution.
Added custom encryption coding to SQL and ASP web applications.
Designed custom applicant tracking ASP program for large client.
Designed customer support application to receive requests/files from clients, divert to appropriate support staff, and track issue from open to resolve.
Prefer Shorter Position Titlesπ︎
By default, this setting is turned off and the parser reports position titles exactly as they are found in the document. When true (OutputFormat.PreferShorterPositionTitles = True
), titles may be truncated if the additional phrase does not include Job words. For example, VICE PRESIDENT, INFORMATION SYSTEMS would be reported as just VICE PRESIDENT if this switch is set to true.
Position History User Areaπ︎
UserArea elements throughout the schema are populated with Textkernel generated metadata. These sections are documented in this document and defined in the SovrenResumeExtensions.xsd file.
The UserArea content for PositionHistory elements is located at Resume.StructuredXMLResume.EmployerOrg.PositionHistory.UserArea.sov:PositionHistoryUserArea
. This is what a typical PositionHistoryUserArea element looks like:
"sov:PositionHistoryUserArea": {
"sov:Id": "POS-1",
"sov:CompanyNameProbabilityInterpretation": {
"@internalUseOnly": "SP",
"#text": "Confident"
},
"sov:PositionTitleProbabilityInterpretation": {
"@internalUseOnly": "TT",
"#text": "Confident"
},
"sov:NormalizedOrganizationName": "Technical Difference",
"sov:NormalizedTitle": "Director of Web Applications Development",
"sov:Subtitles": {
"sov:Subtitle": [
"Director"
]
}
}
Idπ︎
Id is a unique identifier assigned to each PositionHistory. Competency elements list the identifier of each PositionHistory element they were found within. The format of the identifier is POS-#, where # is a number that starts at 1 for the first PositionHistory and increments by 1 for each subsequent PositionHistory.
CompanyNameProbabilityInterpretationπ︎
CompanyNameProbabilityInterpretation represents the degree of certainty that the OrganizationName element value is accurate. The following scale is used:
Value | Recommended Actions |
---|---|
VeryUnlikely | Recommend Discarding |
Unlikely | Recommend Discarding |
Probable | Recommend Review |
Confident | No Action Needed |
The Parser only reports names having a probability of 'Probable' or 'Confident', thus if the CompanyNameProbabilityInterpretation is 'Unlikely' or 'VeryUnlikely', then the OrganizationName will not be reported.
PositionTitleProbabilityInterpretationπ︎
PositionTitleProbabilityInterpretation represents the degree of certainty that the Title element value is accurate. This value uses the same scale described above for CompanyNameProbabilityInterpretation.
IsSelfEmployedπ︎
IsSelfEmployed is true when this is a self-employed position; otherwise it is false.
SelfEmploymentPhraseπ︎
When IsSelfEmployed is true, SelfEmploymentPhrase contains the exact text from the resume that indicates this is a self-employed position.
NumberOfEmployeesSupervisedπ︎
NumberOfEmployeesSupervised is the number of employees that the candidate supervised in this position.
NormalizedOrganizationNameπ︎
The normalized OrganizationName.
NormalizedTitleπ︎
The normalized PositionTitle.
Subtitlesπ︎
Any number of subtitles that could be used to categorize the position title. These are useful for grouping positions that have similar titles into buckets for searching and matching.
Bulletsπ︎
When OutputFormat.CreateBullets = true
in the config string, the UserArea will include a "bullet" based interpretation of the Description text in which each significant sentence/line/paragraph is reported as a separate sov:Bullet element. This can be useful when transforming the output into a standard resume document format and you want each major point to be a bullet.
The type attribute of each sov:Bullet element is one of the following values:
- creativeTerm: Bullet text contains one of the phrases from the CREATIVE_ACTION_WORDS data list (such as βimplementedβ, βinitiatedβ, and βdeveloper onβ).
- sentence: This is the default when the type is not creativeTerm.
Here is an example of the output with this feature turned on:
"sov:PositionHistoryUserArea": {
"sov:Id": "POS-1",
...,
"sov:Bullets": {
"sov:Bullet": [
{
"@type": "sentence",
"#text": "Add new technology to website to manage leads, increase response time and provide pertinent information to new customers"
},
{
"@type": "sentence",
"#text": "Convert current HRIS from VB to ASP to create complete web based solution"
},
{
"@type": "sentence",
"#text": "Added custom encryption coding to SQL and ASP web applications"
},
{
"@type": "creativeTerm",
"#text": "Designed custom applicant tracking ASP program for large client"
},
{
"@type": "creativeTerm",
"#text": "Designed customer support application to receive requests/files from clients, divert to appropriate support staff, and track issue from open to resolve"
}
]
}
}
Educationπ︎
Info
There are no configuration options for this section type. Here is an explanation of the output.
Degreesπ︎
The Parser reports the level of education in the degreeType field of the Degree element.
These values are not very global-friendly, but the Parser does normalize all degrees to one of these pre-defined degreeTypes. This list is sorted, as well as possible, by increasing level of education. Although, there are certainly ambiguities from one discipline to another, such as whether professional is above or below masters Here are the possible values:
- specialeducation
- some high school or equivalent
- ged
- secondary
- high school or equivalent
- certification
- vocational
- some college
- HND/HNC or equivalent
- associates
- international
- bachelors
- some post-graduate
- masters
- intermediategraduate
- professional
- postprofessional
- doctorate
- postdoctorate
School Typesπ︎
"EducationHistory": {
"SchoolOrInstitution": [
{
"@schoolType": "university",
"School": [
{
"SchoolName": "California State University"
}
],
...
}
],
...
}
The Parser uses an enum with the following values to represent school type:
- UNSPECIFIED
- lowerSchool
- highschool
- secondary
- trade
- community
- college
- university
- professional
- vocational
Degree User Areaπ︎
The Parser outputs additional metadata for the degree section. These sections are documented in this document and defined in the SovrenResumeExtensions.xsd file.
The UserArea content for Degree elements is located at Resume.StructuredXMLResume.EducationHistory.Degree.UserArea.sov:DegreeUserArea
. This is what a typical DegreeUserArea element looks like:
"sov:DegreeUserArea": {
"sov:Id": "DEG-1",
"sov:Graduated": false,
"sov:NormalizedGPA": "0.915",
"sov:NormalizedDegreeName": "BSc",
"sov:NormalizedDegreeType": "BSc"
}
Idπ︎
Id is a unique identifier assigned to each Degree. Competency elements list the identifier of each Degree element they were found within. The format of the identifier is DEG-#, where # is a number that starts at 1 for the first Degree and increments by 1 for each subsequent Degree.
Graduatedπ︎
Graduated is a Boolean value that indicates whether the degree was completed. It is not always safe to assume that just because a degree is listed it was completed, and there is usually not enough information to determine graduation status from the resume itself, but some candidates do report that they didnβt finish (or havenβt yet finished) the degree. Possible values:
- Element is not output, indicating that the Parser has no information.
- false: Indicating that the degree was not completed or the candidate is still pursuing the degree.
- true: Indicates that the degree was completed.
NormalizedGPAπ︎
NormalizedGPA is a decimal value that is output only when a GPA has been provided. This value is normalized from 0.0 to 1.0, with 1.0 being the top mark, so that all GPAs across all scales can be compared, taking into account different min/max values and whether high or low numbers are ranked higher. For example:
- USA degree with GPA of 3.5 / 4.0 = 0.875
- German degree with 1.5 / 6.0 = 0.916
Licenses & Certificationsπ︎
Info
There are no configuration options for this section type. Here is an explanation of the output.
Licenses and certifications are reported in LicenseOrCertification elements found within Resume.StructuredXMLResume.LicensesAndCertifications
.
"LicensesAndCertifications": {
"LicenseOrCertification": [
{
"Name": "Project Management Professional",
"Description": "certification; found in CERTIFICATIONS",
"EffectiveDate": {
"FirstIssuedDate": {
"YearMonth": "2020-09"
}
}
}
]
}
Nameπ︎
The name or phrase that describes the license or certification. This value is not standardized or mapped to any pre-defined list.
Descriptionπ︎
This element reports additional information about the license or certification. It is one of the following values, where the text in square brackets is conditionally output depending on the context:
- license[; found in LICENSES][; matched to list]
- certification[; found in CERTIFICATIONS][; matched to list]
The βfound in LICENSESβ note indicates that the license was found when parsing the text of a LICENSES section.
The βfound in CERTIFICATIONSβ note indicates that the certification was found when parsing the text of a CERTIFICATIONS section.
The βmatched to listβ note indicates that the license was found anywhere within the text of the resume/CV based on matching a specific keyword, key phrase, or pattern as defined in one of the Parserβs data lists.
EffectiveDate.FirstIssuedDateπ︎
The date of the license or certification, if any.
EffectiveDate.ValidFrom & EffectiveDate.ValidToπ︎
The effective date range, if any.
Skillsπ︎
Where To Look For Skillsπ︎
By default, the parser looks in the following sections for skills:
Section Type | Config String To Turn Section Off |
---|---|
Achievements | Coverage.FindSkillsInAchievements = False |
Certifications | Coverage.FindSkillsInCertifications = False |
Cover Letter | Coverage.FindSkillsInCoverLetter = False |
Education | Coverage.FindSkillsInEducationHistory = False |
Executive Summary | Coverage.FindSkillsInExecutiveSummary = False |
Languages | Coverage.FindSkillsInLanguages = False |
Licenses | Coverage.FindSkillsInLicenses = False |
Also Report These As Skillsπ︎
By default, the parser doesn't report any of these data types as skills. To report any of the following data types as skills refer to the config string value in the table.
Section Type | Config String To Report Data Type as Skill |
---|---|
Position Titles | Coverage.AddPositionTitlesToSkills = True |
Languages | Coverage.AddLanguagesToSkills = True |
Licenses & Certifications | Coverage.AddCertificationsAndLicensesToSkills = True |
Skills Taxonomy Outputπ︎
This section contains the skill/competency data in the Textkernel-preferred format. You may prefer to consume this data rather than the data in the Competencies section or use a combination of both. Note that both sections contain the same data, only the format is different.
"sov:SkillsTaxonomyOutput": {
"sov:TaxonomyRoot": [
{
"@name": "Sovren",
"sov:Taxonomy": [
{
"@name": "Information Technology",
"@id": "10",
"@percentOfOverall": "80",
"sov:Subtaxonomy": [
{
"@name": "Programming",
"@id": "204",
"@percentOfOverall": "23",
"@percentOfParentTaxonomy": "29",
"sov:Skill": [
{
"@name": "APPLICATIONS DEVELOPMENT",
"@id": "021803",
"@existsInText": "true",
"@totalMonths": "191",
"@lastUsed": "2020-09-09",
"@whereFound": "Found in WORK HISTORY; POS-1"
},
{
"@name": "CODING",
"@id": "013739",
"@existsInText": "true",
"@totalMonths": "191",
"@lastUsed": "2020-09-09",
"@whereFound": "Found in WORK HISTORY; POS-1"
},
{
"@name": "HTML",
"@id": "019115",
"@existsInText": "true",
"@whereFound": "Found in WORK HISTORY"
},
{
"@name": "JAVASCRIPT",
"@id": "025394",
"@existsInText": "true",
"@whereFound": "Found in WORK HISTORY"
},
{
"@name": "PHP",
"@id": "004736",
"@existsInText": "true",
"@whereFound": "Found in WORK HISTORY"
},
{
"@name": "VBSCRIPT",
"@id": "010438",
"@existsInText": "true",
"@whereFound": "Found in WORK HISTORY"
},
{
"@name": "XML",
"@id": "011476",
"@existsInText": "true",
"@whereFound": "Found in WORK HISTORY"
}
]
}
]
}
],
...
}
]
}
As you can see above, this view of the skills is structured in the hierarchical manner that matches the Taxonomy > Subtaxonomy > Skill > Child Skill structure that the parser understands. By default, there will only be one TaxonomyRoot, "Sovren".
The following table lists the elements and attributes associated with each of the elements above.
Element.Attribute | Meaning |
---|---|
*.name | Name of the root data list/taxonomy/subtaxonomy/skill. |
(Taxonomy | Subtaxonomy).id |
(Taxonomy | Subtaxonomy).percentOfOverall |
Subtaxonomy.percentOfParentTaxonomy | The weight of a specific subtaxonomy (and its children) divided by the weight of its parent taxonomy, expressed as a percentage. The sum of all percentOfParent values for all siblings (subtaxonomies with the same parent) equals 100%. |
(Skill | ChildSkill).existsInText |
(Skill | ChildSkill).whereFound |
(Skill | ChildSkill).lastUsed |
(Skill | ChildSkill).totalMonths |
Skill.childrenLastUsed | Most recent date that any of the skill's children were used. |
Skill.childrenTotalMonths | Sum of all the ChildSkill.totalMonths (accounting for overlaps) for all of this skill's children. |
Languages & Localesπ︎
The Parser includes a language and locale analyzer that is able to accurately detect all supported Parser languages and can detect and set most supported locales based on an analysis of language, phone numbers, and email addresses. It is NEVER necessary or advisable to manually override the Parser's language detection, and it is rarely advisable to override the Parser's locale detection.
For a listing of languages and regions supported, you can refer here.
So, when might it be advisable to override the default locale detection? In some cases, you may be certain that you are parsing a CV from a particular locale and you want to ensure that the Parser "knows" about that locale even if the CV does not have any information on it that would readily tell it that it is from that locale (for example, if the CV contains no contact info).
Here is an example: if you are processing CVs in or from Australia, Australia uses a four-digit postal code. You may desire to set Culture.DefaultCountryCode = AU
in the config string. This will give better results on a few Australian CVs that lack enough contact info for the Parser to detect that the CV contains Australian locale data. HOWEVER, a side effect is that, when that switch is "on" and a non-Australian CV is parsed, the Parser may erroneously report Australian contact info rather than the correct locale's contact info. For instance:
This is actually a USA address, and will possibly be reported by the Parser as being an address in postal code 3017 in Sydney, Australia rather than at 3017 Sydney Street in Dallas, Texas, USA in postal code 75225.
Our general recommendation is that only the following locale switches are advisable to set "on", and then only when the CV is almost certain to contain that localeβs data:
- Set
Culture.DefaultCountryCode = IN
if you are parsing in India - Set
Culture.DefaultCountryCode = AU
if parsing in Australia or New Zealand (you can use either AU or NZ) and you have Australian or New Zealand locale CVs - Set
Culture.DefaultCountryCode = ZA
if you are parsing in South Africa
Again, setting these switches assumes that you really have a CV flow that is almost completely from those regions.
Please note that the Parser always outputs a "CountryCode" every time it reports any location information. Unfortunately, it is not always possible to accurately determine the correct country code (Boston, UK or Boston, USA?), so at times the Parser must make an educated guess since it is required by that standard to report a CountryCode.
Personal Informationπ︎
The PersonalInformation element contains a variety of information that is commonly used in some cultures and not in other cultures such as the United States. The parser will output the following data fields:
- Ancestor (FathersName and MothersMaidenName)
- Availability
- Birthplace
- DateOfBirth
- DrivingLicense
- FamilyComposition
- Gender
- Hukou (HukouCity and HukouArea)
- Location (CurrentLocation and PreferredLocation)
- MaritalStatus
- MessagingAddresses
- MotherTongue
- NationalIdentityNumber
- Nationality
- Passport
- Politics
- Salary (CurrentSalary and RequiredSalary)
- Visa
Some of the personal information can be inferred from other information within the resume. For example, Gender may be inferred from βMr.β being part of the name.
Here is a sample PersonalInformation element containing every element that is supported:
"sov:PersonalInformation": {
"sov:DateOfBirth": {
"@inferred": "false",
"#text": "1977-10-20"
},
"sov:Birthplace": "Los Angeles, CA",
"sov:Nationality": {
"@inferred": "false",
"#text": "US"
},
"sov:NationalIdentities": {
"sov:NationalIdentity": [
{
"sov:NationalIdentityNumber": "111-22-3333",
"sov:NationalIdentityPhrase": "SSN"
}
]
},
"sov:Gender": {
"@inferred": "false",
"#text": "Female"
},
"sov:MaritalStatus": {
"@inferred": "false",
"#text": "Married"
},
"sov:DrivingLicense": "CA-123123123",
"sov:CurrentLocation": "Solana Beach, CA",
"sov:PreferredLocation": "Boston, MA",
"sov:WillingToRelocate": "Yes",
"sov:FamilyComposition": "Family Composition: Husband and 2 children",
"sov:FathersName": "John Adams, II",
"sov:MothersMaidenName": "Angela Harris",
"sov:Availability": "Immediate, with 2 weeks notice",
"sov:VisaStatus": "Green Card, expires march 2022",
"sov:PassportNumber": "US-456456456",
"sov:CurrentSalary": {
"@currency": "USD",
"#text": "100000.00"
},
"sov:RequiredSalary": {
"@currency": "USD",
"#text": "110000.00"
},
"sov:HukouCity" : "ζΉζ±εΈ",
"sov:HukouArea" : "ζ΅·ε",
"sov:MessagingAddress" : {
"@type": "ICQ",
"#text": "john3@adams.com"
},
"sov:MotherTongue": "en"
}
DateOfBirthπ︎
Date of birth in yyyy-MM-dd format. If the optional inferred attribute (Boolean) is true then the DateOfBirth was inferred from an Age using the following formula: [RevisionDate] - [Age years] - [6 months]
Birthplaceπ︎
Freeform text that identifies the candidateβs place of birth.
Nationalityπ︎
Freeform text that identifies the candidateβs country of citizenship. If the optional inferred attribute (Boolean) is true then the Nationality was inferred rather than explicitly stated.
NationalIdentitiesπ︎
Zero or more NationalIdentity elements.
NationalIdentityNumberπ︎
Country-specific national identity number. In order to prevent false positives, the Parser requires that the numbers be in specific formats. If numbers are not being reported, it may be due to the number being in an unsupported format. We will continue adding support for new formats, so please submit any examples to support@textkernel.com.
NationalIdentityPhraseπ︎
An optional phrase associated with the NationalIdentityNumber to help identify it.
NationalIdentityTypeπ︎
Currently only βDNIβ or βNIEβ if issued by Spain.
Genderπ︎
Male or Female. If the optional inferred attribute (Boolean) is true then the Gender was inferred from the name affix, marital status, national identity number, given name, or some other means. To customize the inference by given name, customize the MALE_GIVEN_NAMES and FEMALE_GIVEN_NAMES data lists.
MaritalStatusπ︎
Married, Single, Divorced, Separated, or Unknown. If the optional inferred attribute (Boolean) is true then the MaritalStatus was inferred from the name affix, family composition, national identity number, or some other means.
DrivingLicenseπ︎
Freeform text that identifies the candidateβs license to drive. May include a license number, type, qualifications, restrictions or any other explanation.
CurrentLocationπ︎
Freeform text that identifies the candidateβs current location(s), if specifically stated as such. This value is NOT derived from the contact information postal address.
PreferredLocationπ︎
Freeform text that identifies the candidateβs preferred location(s).
WillingToRelocateπ︎
One of the following values indicating the candidateβs willingness to relocate: Yes, No, or Unknown.
FamilyCompositionπ︎
Freeform text that describes the candidateβs family, such as spouse and children.
FathersNameπ︎
Freeform text that identifies the name of the candidateβs father.
MothersMaidenNameπ︎
Freeform text that identifies the maiden name of the candidateβs mother.
Availabilityπ︎
Freeform text that describes when the candidate is available to work.
VisaStatusπ︎
Freeform text that describes the candidateβs current visa status, expiry date, etc.
PassportNumberπ︎
Freeform text that identifies the candidateβs passport number, expiry date, etc.
CurrentSalaryπ︎
The candidateβs current salary expressed as a monetary amount. The element value is a number. The type attribute is a 3-letter ISO 4217 currency code. For a complete list of codes, search the web for "ISO 4217 currency codes". This element does not specify whether the monetary amount is annually, monthly, or hourly, however that information can usually be inferred from the value.
RequiredSalaryπ︎
The salary the candidate expects for any new position, expressed as a monetary amount. The element value is a number. The type attribute is a 3-letter ISO 4217 currency code. For a complete list of codes, search the web for "ISO 4217 currency codes". This element does not specify whether the monetary amount is annually, monthly, or hourly, however that information can usually be inferred from the value.
HukouCityπ︎
Name of City for Chinese household registration (hukou record).
HukouAreaπ︎
Area/Province for Chinese household registration (hukou record).
MessagingAddressπ︎
Zero or more MessagingAddress elements. The type attribute identifies the messaging system, such as ICQ, MESSENGER, QQ, etc. The element value is the candidateβs address within that messaging system.
MotherTongueπ︎
The mother tongue (also known as primary language, native language, or first language) of the candidate. The value is one of the ISO 639-1 codes. For example: Dutch (nl), English (en), French (fr), or the special value Invariant/Unknown (iv).
Trainingπ︎
The Parser will report training elements that are found in the document. For example, this text appearing within a Position Description will also be reported in the Training element of the UserArea as shown in the box below:
Training:
Project Management Professional, Project Management Institute, 2004-2005
Microsoft Visual Basic .NET, 2001
"sov:TrainingHistory": {
"sov:Text": "Project Management Professional, Project Management Institute, 2004-2005 Microsoft Visual Basic .NET, 2001",
"sov:Training": [
{
"sov:Type": "Unknown",
"sov:TrainingName": null,
"sov:Qualifications": {
"sov:Qualification": [
"Project Management Professional"
]
},
"sov:Entity": null,
"sov:Description": "Project Management Professional, Project Management Institute, 2004-2005",
"sov:StartDate": {
"Year": "2004"
},
"sov:EndDate": {
"Year": "2005"
}
},
{
"sov:Type": "Unknown",
"sov:TrainingName": null,
"sov:Entity": null,
"sov:Description": "Microsoft Visual Basic .NET, 2001",
"sov:EndDate": {
"Year": "2001"
}
}
]
}
Each distinct item of training is reported as an Item element within Training.
Typeπ︎
Reserved for future use.
TrainingNameπ︎
Reserved for future use.
Qualificationsπ︎
Any text within Description that is recognized as a qualification (such as DDS), degree (such as B.S.), or a certification (such as Project Management Professional). Each qualification is listed separately.
Entityπ︎
Name of school or company
Descriptionπ︎
All of the text associated with this training item.
StartDateπ︎
Start date of this training item.
EndDateπ︎
End date of this training item.
Patents/Publications/Speaking Engagementsπ︎
When parsing of Patents, Publications, and Speaking Engagements is enabled, by setting Coverage.PatentsPublicationsAndSpeakingEvents = True
in the config string, these sections may be reported.
These sections are impossible to parse at a granular level with any meaningful accuracy. Do not use this data except perhaps as an indicator that the document contains such sections.
Patentsπ︎
For example, this text within a resume results in the following output.
Patents
George Doam and Neil Griffin, inventors, βMethod and Apparatus for Removing Corn Kernels From Denturesβ, Patent 1,064,098.
"PatentHistory": {
"Patent": [
{
"PatentTitle": "Method and Apparatus for Removing Corn Kernels From Dentures",
"Description": "George Doam and Neil Griffin, inventors, \"Method and Apparatus for Removing Corn Kernels From Dentures\", Patent 1,064,098.",
"Inventors": {
"InventorName": [
"George Doam and Neil Griffin"
]
},
"PatentDetail": [
{
"PatentMilestone": [
{
"Id": "1064098"
}
]
}
]
}
]
}
Publicationsπ︎
For example, this text within a resume results in the following output.
Publications
"The Way Home: How GPS Restored My Profits and Saved My Business Life", published in the American Journal of the Lost And Clueless, Volume 1, Number 4.
"PublicationHistory": {
"Article": [
{
"Title": "The Way Home: How GPS Restored My Profits and Saved My Business Life",
"JournalOrSerialName": "published in the American Journal of the Lost And Clueless",
"Issue": "Volume 1, Number 4"
}
]
}
Speaking Engagementsπ︎
For example, this text within a resume results in the following output.
"SpeakingEventsHistory": {
"SpeakingEvent": [
{
"EventName": "",
"EventType": "conference",
"Description": "Main Speaker, AYA Forum, 2006"
}
]
}
Miltary History & Security Clearanceπ︎
When parsing for Military History and Security Clearance is enabled, by setting Coverage.MilitaryHistoryAndSecurityCredentials = True
in the config string, these sections may be reported.
Military Historyπ︎
For example, this text within a resume results in the following output.
"MilitaryExperience": [
"Country": "US",
"Service": {
"Branch": "US Army",
"Rank": "FIRST LIEUTENANT",
},
"StartDate": {
"Date": "2012-02-01",
"HasValue": true,
"FoundYear": true,
"FoundMonth": true,
"FoundDay": true
},
"EndDate": {
"Date": "2014-09-01",
"HasValue": true,
"FoundYear": true,
"FoundMonth": true,
"FoundDay": true
},
"FoundInContext": "FIRST LIEUTENANT, US Army Schofield Barracks, HI 968576 02/2012-09/2014"
]
Security Clearanceπ︎
For example, this text within a resume results in the following output.
Resume User Areaπ︎
The UserArea content for Resume elements is located at Resume.UserArea.sov:ResumeUserArea
. The schema is fully defined in the SovrenResumeExtensions.xsd file, byt here is how the ResumeUserArea element is structured (with many of the details omitted to keep it short enough to review at a glance):
"sov:ResumeUserArea": {
"sov:Culture" : {},
"sov:Location" : {},
"sov:PersonalInformation" : {},
"sov:ExperienceSummary" : {},
"sov:TrainingHistory" : {},
"sov:Hobbies" : {},
"sov:Sections" : {},
"sov:CustomData" : {},
"sov:ReservedData" : {},
"sov:CoverLetterText" : {},
"sov:ParsedTextLength" : "",
"sov:ParseTime" : "",
"sov:TimedOut" : {},
"sov:ResumeQuality" : {},
"sov:LicenseSerialNumber" : "",
"sov:ParserConfigurationString" : "",
"sov:ParserVersion" : "",
"sov:DigitalSignature" : ""
}
Cultureπ︎
The Culture element describes the Language and Region information that is either:
- Calculated during parsing according to an analysis of the text, or
- The default specified by in case the culture cannot be calculated.
This culture information influences the way the Parser works, such as how it interprets ambiguous date values such as (5/1/09) or differing linguistic rules for analyzing the text.
A typical Culture element looks like this:
Languageπ︎
The primary language of the parsed text. The value is one of the ISO 639-1 codes. When the language could not be automatically determined, it is reported as the special value Invariant/Unknown (iv). The two-letter ISO codes reported by the Parser, such as βzhβ for Chinese, do not differentiate between language variants, such as Mandarin and Cantonese.
The language is also reported in the top-level Resume element:
See the Textkernel CV/Resume Parser User Guide provided with each version of the Parser for a list of languages supported that version. For a listing of languages and regions supported the most recent version, you can refer to Languages
Countryπ︎
The country of origin of the resume, typically determined by the postal address. The value is one of the ISO 3166-1 alpha-2 codes. For example, βUSβ for United States.
There is one exception, for all builds prior to 8.0: Prior to version 8.0, United Kingdom is "UK" instead of "GB" by default. For these pre-8.0 versions, to adhere to the ISO-3166 standard by using βGBβ for United Kingdom, you can set Culture.CountryCodeForUnitedKingdomIsUK = false
in the config string. This setting defaults to true for backward compatibility.
Culture Infoπ︎
This is an ISO 3066 code that represents the actual cultural context regarding formatting of numbers, dates, character symbols, and so on. This value is usually a simple concatenation of the Language and Country codes, such as "en-US" for US English, but beware that CultureInfo can be set independently of Language and Country to achieve fine-tuned cultural control over parsing, so if you use this value you should not assume that it always matches the Language and Country.
Prefer English Version of Resumeπ︎
When a document contains two versions of the resume, one in English and one in another language, the default behavior is to parse the non-English (presumably native) version. Set this property to true (Culture.PreferEnglishVersionIfTwoLanguagesInDocument = True
) to always parse the English version, when available. This setting currently only applies to Chinese-English resumes.
Locationπ︎
The Location element provides a place to store the geographic coordinates for the primary PostalAddress. This data is no longer provided by Textkernel, but instead, is available through the SaaS API. If you are an "self-hosted" customer of Textkernel, you have access to the geocoding API call on your own instance of the Textkernel SaaS service, using your own credentials to Bing or Google.
A typical Location element looks like this:
Latitudeπ︎
The latitude of the primary PostalAddress.
Longitudeπ︎
The longitude of the primary PostalAddress.
Sectionsπ︎
One of the first things the Parser does is split the resume into sections. Each section is then handed to a sub-parser that knows how to handle the type of information in each section. The Sections element contains a collection of Section elements, each of which identifies the types and locations of sections that were found.
"sov:Sections": {
"sov:Section": [
{
"@starts": "11",
"@ends": "12",
"@sectionType": "SUMMARY",
"#text": "Executive Summary"
},
{
"@starts": "13",
"@ends": "54",
"@sectionType": "WORK HISTORY",
"#text": "Experience"
},
{
"@starts": "55",
"@ends": "61",
"@sectionType": "EDUCATION",
"#text": "Education"
}
]
}
startsπ︎
The first line number (zero-based) containing text of this section.
endsπ︎
The last line number (zero-based) containing text of this section.
sectionTypeπ︎
One of the following values: ARTICLES, AVAILABILITY, BOOKS, CERTIFICATIONS, CONFERENCE_PAPERS, CONTACT_INFO, EDUCATION, HOBBIES, IGNORE_DATA_AFTER, LANGUAGES, LICENSES, MILITARY, OBJECTIVE, OTHER_PUBLICATIONS, PATENTS, PERSONAL_INTERESTS_AND_ACCOMPLISHMENTS, PROFESSIONAL_AFFILIATIONS, QUALIFICATIONS_SUMMARY, REFERENCES, SECURITY_CLEARANCES, SKILLS, SPEAKING, SUMMARY, TRAINING, WORK_HISTORY, WORK_STATUS
Valueπ︎
The value is the exact text that was used to identify the beginning of the section. If there was no text indicator and the location was calculated, then the value is βCALCULATEDβ.
Reserved Dataπ︎
The Parser uses this section to output all of the URLs, Email Addresses, Phone Numbers, and Twitter handles found anywhere in the document. These values are not necessarily tied to the candidate.
Cover Letter Textπ︎
This element reports all of the text that was determined to be part of a cover letter.
Parsed Text Lengthπ︎
This element reports the number of characters in the plain text resume.
Parse Timeπ︎
This element reports the number of milliseconds that were spent within the Parser. This value does not include network transfer times.
ResumeQualityπ︎
This is an advanced level feature. Please ignore the data in the ResumeQuality output unless/until you have discussed its proper use with Textkernel, and been approved to use it.
The ResumeQuality section output should NEVER IN ANY SENSE WHATSOEVER be used as an indication that the Parser has failed or performed poorly. The sole purpose of the ResumeQuality section is to help you, the integrator, to understand substandard aspects of the candidate's resume. The majority of resumes will have at least one entry in this section. AGAIN, that does not mean that parsing "failed" or that the Parser needs fixing.
Please recall that candidates' resumes fall within a bell curve. Some resumes are really well done. Some are horrible. Most fall into the Good to Pretty Good range. The ResumeQuality section is designed to help you understand where the resume falls in that bell curve. Great resumes will parse great. Horrible resumes will parse poorly. That is a limitation of the quality of the resume. The Parser cannot fix candidate mistakes.
For instance, the ResumeQuality section may report that the candidate provided neither a phone nor an email address. Reporting that fact does not indicate that the Parser failed. The failure was that the candidate did not include a way to be contacted electronically. We cannot fix that, nor can you, the integrator. Only the candidate can.
You should not use the ResumeQuality section to communicate problems/suggestions to candidates unless you have a very sophisticated workflow and step-by-step improvement process. Otherwise, you will frustrate candidates and do more harm than good.
The Resume Quality is a series of assessments of how well the resume conforms to best practices for constructing machine-readable resumes. Assessments are ordered by severity, from fatal problems (which nevertheless may not have caused an actual parsing problem), to suggested improvements. Each assessment contains a list of findings, describing the exact issue with the resume and a recommendation for how the candidate could resolve the issue.
Levelπ︎
The level of severity of the findings for the assessment. Ranging from, in order of most severe to least severe:
- Fatal Problems Found
- Major Issues Found
- Data Issues
- Suggested Improvements
Findingsπ︎
A list of information with a QualityCode, associated Identifiers, and a Message describing the issue or recommendation found. Use these findings to improve the resume.
Codeπ︎
Unique code to identify a resume quality finding information (see the chart below with all codes and identifier meanings)
Section Identifiersπ︎
Identifiers for the associated data with the Section Type and, if applicable, the Education or Position Id
Messageπ︎
The display message to understand the issue and recommendation
Code | Description | Section Identifiers |
---|---|---|
Fatal Problems Found (400-499) | ||
408 | Indicates the the document was too long and was truncated prior to parsing. | |
411 | Indicates that parsing had to be stopped because the time limit was exceeded and some data may not have been processed. | |
412 | Indicates that no sections were found in the resume. | |
413 | Indicates that a WORK HISTORY section was not found. | |
414 | Indicates that an EDUCATION section was not found. | |
415 | Indicates that a WORK HISTORY information was found but had to be calculated as a section. | |
416 | Indicates that an EDUCATION information was found but had to be calculated as a section. | |
417 | Indicates that this document is likely a curriculum vitae and prone to errors due to the use of nonstandard headers and the vast amount of data describing patents, speaking engagements, research, advisory roles, publications, etc. Accordingly, only the first WORK HISTORY section was parsed, as that usually results in far greater accuracy. | |
418 | Dates ranges were found written vertically on multiple lines | |
419 | The Employment section did not provide dates for jobs | |
433 | We detected that this document contained data in columnar format. We rearranged this data to be machine readable with greater accuracy. It is a HUGE MISTAKE for candidates to represent data in columns rather than in a simple top-to-bottom, all-across-the-page format. | |
441 | Indicates that neither an email address nor a phone number were found in the contact information. A resume should include always include both. | |
Major Issues Found (300-399) | ||
300 | Indicates that the document was PDF. | |
301 | Indicates that the document was Apple Pages. | |
302 | Indicates that the first and last name for the candidate was not found. | |
303 | Indicates that sections were found that appear to be longer than the WORK HISTORY and EDUCATION sections combined. This usually indicates an issue identifying the sections correctly and the majority of the content ended up being contained in the incorrect section. | X |
311 | Indicates if a contact information section was found somewhere other than the top of the resume. Contact information should only be found at the top of the resume. | |
312 | Indicates if a publications section with a significant amount of content appears in the resume. Publications should be avoided in a resume, but if necessary, they can be added but should absolutely be no longer than 10 lines. | X |
323 | Indicates if multiple sections of the same type have been found in the resume. | X |
324 | Indicates if any sections with no text, other than the header, have been found in the resume. | X |
325 | Indicates if any sections with no header have been found in the resume. | X |
331 | Indicates that the number of jobs found in the resume exceeds the threshold of 30 jobs. | |
Data Issues (200-299) | ||
211 | Indicates if no email address was found to contact the candidate. | |
212 | Indicates if no phone number was found to contact the candidate. | |
213 | Indicates if no street level address was found for the candidate. | |
221 | Indicates if any jobs were found without job titles. | X |
222 | Indicates if any jobs were found without job company names. | X |
223 | Indicates if more than one current job was found with the same employer. | X |
224 | Indicates if any jobs were found without start dates. | X |
225 | Indicates if any jobs were found without end dates. | X |
226 | Indicates if no jobs were found within the past year of the document last modified date. | |
227 | Indicates that there is very old work history on the resume. No one cares. Could be harmful. | |
231 | Indicates if any educational degrees were found without degree names. | X |
232 | Indicates if any educational degrees were found without school names. | X |
233 | Do not put dates in your education section. Such dates are not relevant and may be harmful. | |
Suggested Improvements (100-199) | ||
111 | Indicates if a references section was found. A resume does not need to include a references section. | |
112 | Indicates if a separate skills section was found. Skills should be included in the context of work history and education descriptions. | |
113 | Indicates if a publications section without a significant amount of content appears in the resume. Including a publications type section in a resume should always be avoided. | X |
121 | Indicates if a driving license number was found in the resume. Do not include this level of personal information in a resume. (Only applies to US, AU, NZ, and UK resumes) | |
122 | Indicates if a passport number was found in the resume. Do not include this level of personal information in a resume. (Only applies to US, AU, NZ, and UK resumes) | |
123 | Indicates if the candidates marital status was found in the resume. Do not include this level of personal information in a resume. (Only applies to US, AU, NZ, and UK resumes) | |
124 | Indicates if the candidates date of birth was found in the resume. Do not include this level of personal information in a resume. (Only applies to US, AU, NZ, and UK resumes) | |
131 | Indicates if multiple addresses were found in the contact information section. Only one contact address should be included in a resume. | |
132 | Indicates if multiple email addresses were found in the contact information section. Only one contact email address should be included in a resume. | |
133 | Indicates if multiple phone numbers were found in the contact information section. Only one contact phone number should be included in a resume. | |
141 | Indicates if any jobs or companies with a street level address were found. Never include a street level address for a job or company in a resume. | X |
142 | Indicates if any schools with a street level address were found. Never include a street level address for a school in a resume. | X |
151 | Indicates if any sections were found with the header not on a separate line above the content for that section. | X |
161 | Indicates the resume contains high school education as well as higher-level education. | X |
Timed Outπ︎
If the Parser timed out while processing the resume, then a TimedOut element is reported. The value is the number of milliseconds spent parsing before the timeout was reached.
The type attribute has one of these values:
- soft: The 15 second timeout was reached. The parser stopped at the next checkpoint and returned all information that had been processed up to that moment.
- hard: The 22.5 second timeout was reached. The parser stopped immediately (between checkpoints) and returned all information that had been processed up to that moment.
For example, the following represents a soft timeout that occurred after 15.121 seconds:
Parser Configuration Stringπ︎
This element reports the Parser configuration that was used during parsing. The configuration is output as a string of Name=Value pairs, each representing a parser setting.
This string value is not necessarily the same as the string that was passed in before parsing. It is the final combination of the value you provided plus pre-configured or built-in defaults and settings changed by the Parser at runtime as a consequence of its locale and language detection.
Parser Versionπ︎
This element simply reports the version number of the Parser that produced the output.
Experience Summaryπ︎
The parser performs many calculations to summarize the experience of the candidate. All of those calculations are reported within the ExperienceSummary element, which looks like this:
"sov:ExperienceSummary": {
"sov:Description": "Molly A. Adams's experience appears to be strongly concentrated in Information Technology (mostly Programming) and ...",
"sov:CareerStory": "",
"sov:MonthsOfWorkExperience": "244",
"sov:AverageMonthsPerEmployer": "81",
"sov:FulltimeDirectHirePredictiveIndex": "82",
"sov:MonthsOfManagementExperience": "191",
"sov:CurrentManagementLevel": "mid-level",
"sov:HighestManagementScore": "55",
"sov:ExecutiveType": "business_dev",
"sov:ManagementStory": "Current position is a mid-level management role: Director of Web Applications Development ...",
"sov:AttentionNeeded": "ATTENTION: The candidate appears to have been in management in a past ...",
"sov:SkillsTaxonomyOutput" : {}
}
Descriptionπ︎
The Description element contains a paragraph of text that summarizes the candidateβs experience. This paragraph is generated based on the other data points within the ExperienceSummary. The paragraph is generated in the same language as the resume for Dutch, English, French, Spanish, and Swedish (but not yet for German or Greek). To always generate this summary in English, you can set OutputFormat.AllSummariesInEnglish = True
in the config string.
Career Storyπ︎
The CareerStory element contains a paragraph of text that summarizes the candidateβs entire career.
Months Of Work Experienceπ︎
The number of months of work experience as indicated by the range of StartDate and EndDate values in the various PositionHistory elements. Overlapping date ranges are not double-counted. This value is NOT derived from text like βI have 15 years of experienceβ.
Months Of Management Experienceπ︎
The number of months of management experience as indicated by the range of StartDate and EndDate values in the various PositionHistory elements that have been determined to be management-level positions. Overlapping date ranges are not double-counted. This value is NOT derived from text like βI have 10 years of management experienceβ.
Current Management Levelπ︎
Computed level of management for the current position. One of the following values:
- low-or-no-level
- low-level
- mid-level
- somewhat high-level
- high-level
- executive-level
Highest Management Scoreπ︎
The highest score calculated from any of the position titles. The score is based on the wording of the title, not on the experience described within the position description.
- 0 to 29 = Low level
- 30-59 = Mid level
- 60+ = High level
Executive Typeπ︎
If HighestManagementScore is at least 30, then the job titles are examined to determine the best category for the executive experience, from among the following:
- none
- admin
- accounting
- business_dev
- executive
- financial
- general
- it
- learning
- marketing
- operations
Average Months Per Employerπ︎
The average number of months a candidate has spent at each employer. Note that this number is per employer, not per job.
Fulltime Direct Hire Predictive Indexπ︎
A score (0-100), where 0 means a candidate is more likely to have had (and want/pursue) short-term/part-time/temp/contracting jobs and 100 means a candidate is more likely to have had (and want/pursue) traditional full-time, direct-hire jobs.
Management Storyπ︎
The ManagementStory is a plain text line-by-line summary of the management experience.
Attention Neededπ︎
A message containing information about something abnormal about the candidate (e.g. the candidate was in management at one point but not at their most recent position). This does not appear in the results if nothing abnormal found.