Extending RECAP to be compatible with your custom model server
There’s a very minimal interface to be implemented which can support any arbitrary LLM Model Server. Simply update the code here and rebuild. The default implementation is compatible with the blog demo shown below.RECAP with self hosted Llama-2-13B-chat-GGML
using a custom FastAPI Server.
- See the Medium blog post.
- This demo uses Google Colab to access a free GPU but this is not suitable for long term deployments