Indexing🔗︎
In the Textkernel Search & Match Engine, an index is a collection of documents of the same type, either Resumes or Jobs. You create one or more indexes, you add/update/delete documents in those indexes, and you search/match within those indexes. Each index is an inverted full-text index that provides fast searching of terms within the text of the documents as well as the semantic data, without needing to scan individual files.
Our engine is built on a near real-time full text search engine, so typically within 1 second of adding a document to an index it will be searchable from the API. We recommend that you build the indexing of new documents directly into your application's workflow. For example, if you need to geocode location coordinates, the workflow would be: 1) Parse, 2) Geocode, 3) Index. In scenarios where you have more than one document to add to an index make sure to use the bulk index API endpoint for the most performant response.
Indexing Strategy🔗︎
Since indexes are logical groups of documents, it's extremely important to properly group them to ensure efficient searching and security. The idea is to separate your data to the largest subsets that can be queried together. Here are some typical use cases and recommendations on how to group your documents.
If you are an ATS or another type of multi-tenant platform you should create separate indexes for each of your tenants. Since most queries will be to a single tenants' data, separating their data into their own indexes lowers the amount of filtering needed for each request which in turn speeds up the query. You can specify multiple indexes to query, so this structure doesn't limit the flexibility of querying.
For organizations using this tool to recruit internally, it is fitting to create an index per business unit or use a single index. The decision comes down to how the data will be queried. If there is strict separation of the business units, then separating into different indexes makes the most sense. If the candidate pool is treated as a single entity and recruiters consider all candidates, then just create a single index.
An alternative strategy for separating documents is to use custom value ids. This strategy makes more sense for groupings that change frequently such as the list of candidates a recruiter is working, what jobs a candidate has applied for, etc.
Deciding how to structure your indexes can have a large impact on performance and efficiency. Please reach out to support@textkernel.com with any questions.