Jobs Data Feeds
About Jobs Data Feeds🔗
Textkernel can provide direct access to a large data set of jobs data. Whereas the Jobs Data API is intended for real-time job search and analytics, a data feed can give full and unlimited access to Textkernel Jobs data. This data can be ingested directly into software systems or analytics platform.
Technical specifications🔗
Delivery location🔗
Data feeds are made available via AWS S3. Textkernel grants programmatic download access to an S3 bucket which contains data feed files.
File format🔗
The files are provided in json-lines format, compressed via gzip. Other file formats are available via a customized data feed (see below).
File types🔗
Data feeds represent a snapshot of Textkernel's Jobs Data live system. We keep historical data in sets of years (S3 location historical
) as well as months. In addition, daily updates are provided.
In the S3 location, the following data sets can be found:
daily/<date>/jobs_new
: contains vacancies which were found as new on the day prior to datedaily/<date>/jobs_expired
: contains records (only job ID) which were were marked as inactive on the day prior to datemonthly/<month date YYYY-MM>/jobs
: contains vacancies which were found in the month. Vacancies can be currently new or expired. Expiration state reflects 1st day of current month, update to most recent state by using 'daily' files.
Delivery frequency🔗
Daily files are generated as follows: - US and CA: before 7 AM EST - All other countries: before 7 AM CET
Deduplication🔗
Data feeds can be provided as deduplicated data set or including duplicate postings. Each posting has a Job ID that uniquely identifies a job. Textkernel's duplicate detection mechanism is described in this linked blog post.
Data model and fields🔗
The data model can be found here: data model link.
Textkernel can introduce new fields to Jobs Data. These fields are announced via the Textkernel release notes.
Change policy / versioning🔗
Textkernel does not support versioning of data. Data files can be replaced and updated without prior notice. The AWS S3 timestamp indicates when the file was last updated.
Textkernel can make updates to historical data - this may be occasionally necessary in case of changes which affect substantial portions of data. Examples of such changes are re-processing due to updates in our machine learning models or deletion of invalid data sources.
Data retention and updates🔗
We do not archive / remove data in the historical
and monthly
folders. Daily
files are removed 31 days after file creation.
Customized data feed🔗
Textkernel can provide customized data feeds with respect to fields, data filters and file format. Please contact Textkernel Support for more details.