Skip to content
CV/Resume and Job Parser Documentation
CV anonymization
latest

CV HTML anonymization🔗

Textkernel offers the capability to redact PII (personally identifiable information) from the HTML representation provided in the parsed output (see field Document HTML Representation in Additional Information). Upon parsing, Textkernel identifies PII data and annotates it with HTML tags, which customers can utilize to anonymize the CV/resume as required.

Important

While we strive for 100% parsing quality, we cannot guarantee perfect identification of PII data. Our parsing techniques may not encompass every potential variation or contextual nuance in the data, potentially leading to unidentified identifiable information. It is advisable for customers to review their anonymized output to confirm alignment with their privacy standards and compliance obligations.

Activation required

This functionality needs to be activated by Textkernel. Please contact the Textkernel Support team to enable it.

HTML redaction tags🔗

Once activated, HTML redaction markers will be added to the HTML presentation document. The HTML will include new <span> tags with the class redacted (<span class="redacted"></span>) for text containing personally identifiable information.

For example, the following HTML will be generated for the candidate's name and experience date:

<div>
    <span class="redacted" concept="name">Pamela Woolley</span>
    PROFESSIONAL EXPERIENCE
    <span class="redacted" concept="experience.date" replacement="258" replacement_postfix="months">2003-present</span> FREELANCE PROJECTS
</div>

Customers must implement their own anonymization by using these additional tags. These tags are designed to facilitate the removal or replacement of candidate information, ensuring the content can be presented in an anonymized manner by the customer. It is the customer's responsibility to use these tags appropriately to achieve the desired level of anonymization.

The available attributes on the redaction spans are:

  • class: All spans introduced will have the redacted class.
  • concept: The concept associated with the underlying text (see the list of concepts below).
  • replacement: An optional value that can be used to replace the text, for example specific dates with a tenure period (e.g. replace Jan 1999 - Jan 2000 with 12).
  • replacement_postfix: A postfix associated with the replacement value, e.g. months.

Redacted concepts🔗

The following concepts have redaction markers in the HTML:

  • name
  • emails
  • phone_numbers
  • location
  • birth date
  • nationality
  • gender
  • national id
  • drivers licenses
  • experience dates
  • education dates
  • hobbies

The following concepts have replacement values in the HTML:

  • experience.date: replacement value will contain the tenure period in months. Specific dates such as Jan 1999- Jan 2000 will have a replacement value of 12. The postfix will always be set to months (no internationalization).
  • education.date: replacement value will contain the tenure period in months. Specific dates such as Jan 1999- Jan 2000 will have a replacement value of 12. The postfix will always be set to months (no internationalization).