Installation Guide
Important
This section covers on-premise installation of our software. This option is strongly discouraged and only available in exceptional cases. Please use our SaaS REST API for a future-proof solution that is scalable and always up-to-date.
Required Knowledge🔗︎
Choosing the installed distribution requires you to have a working knowledge of containers and deploying containers in a production environment. Textkernel Support is intended to help with the Textkernel API’s, but not intended for Kubernetes, or other orchestration support.
Software Requirements🔗︎
The applications run on Linux and can be used from any container runtime such as Docker, or within a container orchestration framework such as Kubernetes.
Setup Guide🔗︎
You can download the latest versions of the containers from the Tx Console. Contact support@textkernel.com if you need login information for the download site. Below is the simplest way to get started running the services. This is not our recommendation for a production environment. In a production environment, orchestration software such as Kubernetes, Amazon EKS, or Amazon ECS should be used to allow for proper scaling and recovery.
- Install Docker.
- Every setup requires the Preprocessor, Parsing Engine, and one of the Resume Parser REST APIs. Download the required images and docker compose file using the link above, then load them using the following command where {image-name} is the name of the file.
docker load -i {image-name}.tar
- Each image requires a TEXTKERNEL_LICENSE environment variable. This should be a base64 encoded string of your Sovren License file.
- After loading the images and setting the TEXTKERNEL_LICENSE environment variable, run the following command in the directory containing the docker-compose.yml file to start the containers.
docker compose up
- Navigate to http://localhost:53000/ and verify that a Swagger test page is returned. Run the InstallationCheck to verify all components are running and healthy.
More detailed instructions for each container follows.
Preprocessor🔗︎
The Preprocessor is responsible for converting documents to text. The service listens on port 3000, but you shouldn’t need to call this service directly.
Environment Variables🔗︎
The Preprocessor requires the below environment variables. The first time you set up the service, the default values should already be set correctly and not need to be updated. However, you should keep in mind the TXTOR_LICENSE_EXPIRATION and update the license before the expiration date.
- TXTOR_LICENSE_KEY: The key for the preprocessor license.
- TXTOR_LICENSE_COMMENCEMENT: The date the license begins in the following format
DayOfWeek Month Day Hour:Minute:Second Year
ex. Wed Jul 5 09:44:45 2023 - TXTOR_LICENSE_EXPIRATION: The date the license expires in the following format
DayOfWeek Month Day Hour:Minute:Second Year
ex. Sat Jul 5 09:44:45 2025
Hardware Requirements Per Instance🔗︎
- Memory: At least 1GB
- CPU: At least .5 cores. A modern fast x64 CPU is recommended, the faster the better.
- Disk Space: 1GB
Health Checks🔗︎
Health checks are available at /health.
Parsing Service🔗︎
The Parsing Service is responsible for parsing the text output by the Preprocessor. The majority of the processing happens here. The service listens on port 80, but you shouldn’t need to call this service directly.
Environment Variables🔗︎
- TEXTKERNEL_LICENSE: The base64 encoded string of your Sovren License file.
- NOTE: if you store this as a secret in some services (such as Kubernetes), the secret is decoded from base64 before it is used as an environment variable. In these cases you can simply repeat the base64 encoding so that the environment variable has the base64-encoded license file.
Hardware Requirements Per Instance🔗︎
- Memory: At least 1.5 GB per language needed to parse
- CPU: At least 1 core. A modern fast x64 CPU is recommended, the faster the better.
- Parsing speeds are directly related to processor speeds. Parsing speed improves with faster clock rates and larger CPU caches.
- Disk Space: .5GB
Health Checks🔗︎
Health checks are available at /ready
and /health
. The service can take up to a minute before the Ready endpoint is successful.
Self-Hosted REST API🔗︎
The Self-Hosted REST API is responsible for handling incoming API requests and is the only one that needs to be externally available. The service listens on port 80.
Environment Variables🔗︎
- TEXTKERNEL_LICENSE: The base64 encoded string of your Sovren License file.
- NOTE: if you store this as a secret in some services (such as Kubernetes), the secret is decoded from base64 before it is used as an environment variable. In these cases you can simply repeat the base64 encoding so that the environment variable has the base64-encoded license file.
- TX_PARSING_SERVICE: The url of the Parsing Service set up above
- TX_PREPROCESSOR: The url of the Preprocessor set up above
Hardware Requirements Per Instance🔗︎
- Memory: At least 1 GB
- CPU: At least .5 cores. A modern fast x64 CPU is recommended, the faster the better.
- Disk Space: .5GB
Health Checks🔗︎
Health checks are available at /ready
and /health
. You can run the Installation Check at /InstallationCheck
to verify all components are successfully set up, but it should not be called repeatedly.
Scaling🔗︎
The above hardware requirements are the minimum needed to run the software. If more throughput is needed, you should add more containers or increase the resources to fit your business needs. Most cloud-based infrastructure softwares provide CPU and Memory at a 1:2 ratio. Keeping your containers in that ratio will make most efficient use of the resources. For example, if your parser service requires 8GB of Memory, leveraging 4CPU is the best use of the hardware ratio. This is just a recommendation from our own experiences, not a rule that you must follow.