Ollama Parameter Mapping Fixed: Personas Now Apply Correctly

Hi everyone,

I’ve merged a fix for Ollama integration that addresses persona configuration issues.

The Problem

Personas were not applying correctly to Ollama models, and Ollama was using default context lengths (2048 or 4096 tokens, depending on the model) instead of the persona’s configured context window.

The root cause: Persona LLM config fields follow OpenAI-style formatting, but Ollama requires different field names and structure. Parameters need to be placed inside an options object rather than at the root level.

Parameter Mapping Example:

Persona Config Ollama Config (in options)
context_window: 8192 num_ctx: 8192
temperature: 0.7 temperature: 0.7
top_p: 0.9 top_p: 0.9
top_k: 40 top_k: 40
frequency_penalty: 0.5 repeat_penalty: 1.25
presence_penalty: 0.5 repeat_penalty: 1.25
stop_sequences: [...] stop: [...]

Note: max_tokens doesn’t come from persona config but is mapped separately (set in the backend) (max_tokensnum_predict)

Without this mapping, Ollama was cutting tokens from the top of the context, including parts of the system prompt and relevant retrieved context.

Context Window Examples:

  • gemma3:1b: 32,768 tokens (BrainDrive UI allows max 30,000)
  • qwen3:8b: 32,768 tokens (BrainDrive UI allows max 30,000)
  • llama3.2:3b: 128,000 tokens

Tip: Run ollama show <model_name> to check the context length for any model.

The Fix

The fix now correctly maps persona configuration to Ollama’s expected format. All parameters are properly translated and wrapped in the options object per Ollama’s API specification.

What’s Next: RAG Optimization Strategy

This fix revealed opportunities to optimize how we handle context. Here’s what I’m planning:

1. Chunk Size Optimization - BrainDriveAI/Document-Chat-Service

Reduce chunk sizes optimized for 8000-token context window models. Most modern Ollama models support at least 8K context, with many supporting significantly higher limits.

2. Dynamic Chunk Retrieval - BrainDriveAI/Document-Chat-Service

Update the chat-with-documents backend to dynamically determine the number of chunks to return based on the selected LLM’s capabilities. Different models have different context capacities, and retrieval should adapt accordingly.

3. Reverse Chunk Ordering - BrainDriveAI/Document-Chat-Service

Reverse the order of retrieved chunks so the most relevant content appears last in the context. Since Ollama may strip tokens from the top when approaching context limits, placing the most relevant information at the bottom ensures it’s preserved.

4. Dynamic Context Normalizer with Auto-Compact - BrainDriveAI/BrainDrive-Core

Implement a context manager that:

  • Summarizes previous conversation history up to “n” messages
  • Compresses context to ensure: system prompt + history + retrieved chunks + latest message fit within the context window
  • Dynamically adjusts based on the selected model (initially optimized for 8000-token models)

Suggested Context Allocation Strategy (for 8K token models):

Component Percentage Approx Tokens Notes
System Prompt 12.5% ~1,000 tokens (max) Protected, never truncated
Previous Conversation History 20% ~1,600 tokens Summarized if needed
Retrieved Context 45% ~3,600 tokens Most relevant chunks (reversed order)
User Query 7.5% ~600 tokens (max) Current message
Buffer for LLM Response 15% ~1,200 tokens Critical to prevent response cutoff
Total 100% ~8,000 tokens Dynamically managed

Implementation Plan

These RAG optimizations will be addressed in upcoming PRs. The dynamic context normalizer and allocation strategy will be the priority, followed by chunk optimization.

Related: Issue #192

Let me know your thoughts or suggestions on the context allocation strategy!

1 Like

Thanks

I do agree with your strategy as a first step for us to look at as a solution, in my opinion I say go ahead with it and lets see what the results look like. I would only add that anything extra as far as context is used in the response since reasoning models tend to have higher token responses.

Thanks @beck this all sounds good to me as well. And also sounds like it opens up another variable that we can potentially give BrainDrive Owners the ability to customize and evaluate via the system to improve their results with a specific model that may handle more or less context better or worse than average.