Resourcing

Running Locally

When running locally through Docker, we recommend making at least 4vCPU cores and 10GB of RAM available to Docker (16GB is preferred). This can be controlled in the Resources section of the Docker Desktop settings menu.

Single Cloud Instance

We generally recommend setting everything up on a single instance (e.g. an AWS EC2 instance, a Google Compute Engine instance, an Azure VM, etc.) via Docker Compose as it’s the simplest way to get started. For a step-by-step guide on how to do this, checkout our EC2 deployment guide.

For most use cases a single reasonably sized instance should be more than enough to guarantee excellent performance. A single instance should be able to effectively serve a small-medium sized organization without issue.

If you go with this approach, we recommend:

CPU: >= 4 vCPU cores (we recommend 8 vCPU cores if possible)
Memory: >= 10 GB of RAM (for best performance, 16 is recommended)
- Note: If you are switching embedding models, you will need >= 11 GB RAM as both sets of models need to be loaded in simultaneously
Disk: >= 50 GB + ~2.5x the size of the indexed documents. Disk is generally very cheap, so we would recommend getting extra disk beyond this recommendation to be safe.
- Note: Vespa does not allow writes when disk usage is >75%, so make sure to always have some headroom here
- Note: old, unused docker images often take up a bunch of space when performing frequent upgrades of RECAP.
- Note: Vespa, used by RECAP for document indexing, requires Haswell (2013) or later CPUs. For older CPUs, use the vespaengine/vespa-generic-intel-x86_64 image in your docker-compose.yml. This generic image is slower but ensures compatibility. For details, see Vespa CPU Support. To clean up these unused images, run: docker system prune --all.

For reference, we have chosen to give each RECAP Cloud customer a m7g.xlarge instance by default, which has 4vCPU cores + 16GB of RAM + 200GB of disk space. We’re comfortable using 4vCPU cores in a production setting since we have dedicated GPU instances that run the embedding / cross-encoder models. If you do not plan on setting that up, we would recommend going with 8vCPU cores (if possible) for a production deployment.

Kubernetes / AWS ECS

If you prefer to give each component it’s own dedicated resources for more efficient scaling, we recommend giving each container access to at least the following resources:

api_server - 1 CPU, 2Gi Memory

background - 2 CPU, 4Gi Memory

indexing_model_server / inference_model_server - 2 CPU, 4Gi Memory

postgres - 500m CPU, 2Gi Memory

vespa - >=4 CPU, >= 4Gi Memory. This is the bare minimum, and we would generally recommend higher than this. The resources required here also scales linearly with the number of documents indexed. For reference, with 50GB of documents, we would generally recommend at least 10 CPU, 20Gi Memory + tuning the VESPA_SEARCHER_THREADS environment variable.

nginx - 250m CPU, 128Mi Memory

Welcome to RECAP

Connectors

Deploy Onyx

Auth

Enterprise

Guides

Tools

Backend APIs

Cloud APIs

Enterprise

More

Running Locally

Single Cloud Instance

Kubernetes / AWS ECS

Welcome to RECAP

Connectors

Deploy Onyx

Auth

Enterprise

Guides

Tools

Backend APIs

Cloud APIs

Enterprise

More

​Running Locally

​Single Cloud Instance

​Kubernetes / AWS ECS

Running Locally

Single Cloud Instance

Kubernetes / AWS ECS