Custom Model Server

Extending RECAP to be compatible with your custom model server
RECAP with self hosted Llama-2-13B-chat-GGML using a custom FastAPI Server.

RECAP can also make requests to an arbitrary model server via REST requests. Optionally an access token can be passed in. To customize the request format and handling of the response, it may be necessary to update/rebuild the RECAP containers.

Extending RECAP to be compatible with your custom model server

There’s a very minimal interface to be implemented which can support any arbitrary LLM Model Server. Simply update the code here and rebuild. The default implementation is compatible with the blog demo shown below.

RECAP with self hosted `Llama-2-13B-chat-GGML` using a custom FastAPI Server.

See the Medium blog post.
This demo uses Google Colab to access a free GPU but this is not suitable for long term deployments

FastChat Configuring RECAP

⌘I

Welcome to RECAP

Connectors

Deploy Onyx

Auth

Enterprise

Guides

Tools

Backend APIs

Cloud APIs

Enterprise

More

Extending RECAP to be compatible with your custom model server

RECAP with self hosted `Llama-2-13B-chat-GGML` using a custom FastAPI Server.

Welcome to RECAP

Connectors

Deploy Onyx

Auth

Enterprise

Guides

Tools

Backend APIs

Cloud APIs

Enterprise

More

​Extending RECAP to be compatible with your custom model server

​RECAP with self hosted Llama-2-13B-chat-GGML using a custom FastAPI Server.

Extending RECAP to be compatible with your custom model server

RECAP with self hosted `Llama-2-13B-chat-GGML` using a custom FastAPI Server.