Release Notes
TOC
v1.4.0Features and EnhancementsNew embedding model: gte-multilingual-baseHybrid retrieval (BM25 + dense vector)Multi-vector retrievalBuilt-in knowledge base file selectorAuto knowledge-base data swap on upgradeConversation history compressionService-to-service API key authentication (HMAC-SHA256)MCP server exposed via IngressDocumentation: Build a Custom Knowledge BaseImprovementsBreaking ChangesKnown Limitationsv1.3.1Features and EnhancementsBYO Knowledge ToolMulti-Cluster SupportToken Quota LimitsHistoryImprovementsBug Fixesv1.2.1Bug FixesImprovementsv1.2.0Features and EnhancementsKnown Issuesv1.4.0
Features and Enhancements
New embedding model: gte-multilingual-base
The bundled knowledge base has been re-embedded with Alibaba's gte-multilingual-base model, replacing
paraphrase-multilingual-mpnet-base-v2. Cross-language retrieval (e.g., a Chinese question against English
docs) is substantially better, and recall@5 on the bundled ACP corpora rises to ≈ 0.82 with hybrid retrieval.
Existing v1.3.x deployments are migrated automatically — see the Upgrade guide.
Hybrid retrieval (BM25 + dense vector)
A ParadeDB pg_search BM25 index is now built alongside the dense vector index, and answers are retrieved
using a Reciprocal Rank Fusion (RRF) of both signals. This recovers exact-keyword matches that pure semantic
search misses (CRD names, command flags, error strings). Configurable via enableBm25, bm25Weight, bm25K.
Multi-vector retrieval
Each document now contributes both per-chunk vectors and an LLM-generated document-level summary vector. This lets a question about a doc's overall topic match the doc as a whole instead of needing the user's wording to overlap with any specific chunk.
Built-in knowledge base file selector
During install / edit, choose between docvec_gte_cs2000_<date>.dump (default; chunk size 2000) and
docvec_gte_cs3000_<date>.dump (chunk size 3000, slightly better recall on long-form docs).
Auto knowledge-base data swap on upgrade
The init container detects legacy mpnet collections in docvec_sys_kb and atomically swaps them to the new
gte dump in place — keeping the original collection name so pgconnect.pgCollectionName does not have to
change. Idempotent (per-data-version stamp), multi-replica safe (PostgreSQL advisory lock), and crash-safe
(in-flight kb_swap_state row recovers on restart).
Conversation history compression
When a session approaches the LLM context window, older turns are summarised by the LLM before being re-fed
to the prompt. Tunable via enableHistoryCompression, historyBudgetRatio, historyKeepRecent,
toolOutputTruncateChars, and modelContextWindow. Reduces token cost on long agent-mode conversations
without losing earlier context.
Service-to-service API key authentication (HMAC-SHA256)
A new apiKey / apiKeyHeader / apiKeyMaxTimeDiff set lets external services call the smart-doc API
without going through the OAuth proxy. Tokens are HMAC of <username>:<unix_ts>, so the shared key never
travels on the wire. Disabled by default — leave apiKey empty unless you need it.
MCP server exposed via Ingress
The Agent Mode now talks to the bundled acp-mcp-server over the cluster-local service, and external MCP
clients (e.g., IDE-side coding agents) can reach it through Ingress (toggled by Expose MCP). The previous
"MCP K8s API Server Address" install field — which required users to construct an erebus URL by hand — has
been removed.
Documentation: Build a Custom Knowledge Base
A new guide walks admins through ingesting internal Git repositories into the Hyperflux knowledge base, with
two delivery modes (chart-managed swap or manual pg_restore). See
Build a Custom Knowledge Base.
Improvements
- PostgreSQL 18 + ParadeDB upgrade. Bundled vector image is now
mlops/paradedb:0.22.6-pg18, bringing pg18 performance and the latestpg_search. - Tool output truncation. MCP tool outputs are truncated to a configurable character count
(
toolOutputTruncateChars, default 500) before being re-injected into the LLM context, reducing token cost on verbose K8s responses. - Init container extracted from inline bash to
init-database.sh(shipped via ConfigMap) for easier auditing and customisation. - Switched the PG driver from pg8000 to psycopg with TCP keepalives enabled, eliminating a class of long-lived-connection drops behind LB / NAT.
- Built-in dump now ships inside the plugin package, removing the separate "download dump file" step that was required up through v1.2.x.
Breaking Changes
- Custom knowledge bases built on v1.3.x must be re-embedded with
gte-multilingual-basebefore upgrading. See the Upgrade guide special-case section. MCP K8s API Server Addressinstall field has been removed. Existing values are ignored on upgrade.
Known Limitations
- The auto KB swap only touches
docvec_sys_kb. Thedocvec_user_kb(BYO Knowledge) database is preserved unchanged across upgrades.
v1.3.1
Features and Enhancements
BYO Knowledge Tool
The BYO Knowledge tool allows enterprises to import private knowledge and use it as a dedicated, searchable knowledge source during question answering. This helps teams provide responses based on internal documents, operational knowledge, and organization-specific context.
Multi-Cluster Support
Multi-cluster support enables users to access information from multiple clusters by cluster name, expanding question-answering capabilities across cluster boundaries. This makes it easier to query and compare resources in different cluster environments.
Token Quota Limits
Token quota limits allow request frequency and token usage to be restricted by user. This helps administrators control costs, manage quotas, and prevent excessive consumption of model resources.
History
History support enables users to review previous conversations and question-answering results. This makes it easier to trace context, continue earlier investigations, and troubleshoot issues based on past interactions.
Improvements
- Optimized the RAG (langchain) and reranking process to significantly improve answer accuracy and relevance.
- Upgraded the core AI framework to LangChain 1.0 to stay compatible with latest features and optimizations.
- Added routine system check prompts and performed comprehensive code polishing and unit test linting.
- Separated databases for system knowledge base, user knowledge base, and chat history to improve data isolation and performance.
- Redesigned the Smart Doc interaction page for a more intuitive and efficient user experience.
- Upgraded the MCP server, adding support for OAuth authentication and writable tool configurations.
- Enhanced file upload integration for a smoother knowledge ingestion process.
- Added support for IDs in custom elements and resolved related data redundancy issues.
- Implemented a Redis-based rate limiter to enhance system stability and manage API traffic.
Bug Fixes
- Fixed an issue where model downloading could fail and improved environment variable configuration for embedding models.
- Resolved data processing errors occurring during the merging and unpacking of update values.
- Fixed a bug that caused redundant data prefixes in custom elements.
- Resolved occasional service call failures to improve overall system reliability.
v1.2.1
NOTE: Agent mode is an experimental feature, please use with caution.
Bug Fixes
- Fixed an issue where setting the knowledge database name may not work. This fix adds an option to set the database dump file name during installation, and automatically use the specified database dump file to initialize the knowledge base.
- Fixed an issue where MCP tools can create or delete K8s resources without human confirmation in Agent mode.
- Fixed an issue when asking for disk space information using Agent mode, the server may get stuck.
- Fixed an issue when deploying on ACP 4.2 or above, the default node taints are not handled.
- Fixed a deployment error:
kubeVersion: >=1.20.0 which is incompatible with Kubernetes v1.33.7-1. - Fixed an issue where the API key for LLM service and rerank service appeared in plain text when deploying.
Improvements
- Improved the prompt for correct Hyperflux identity.
- Removed not used configuration items in the installation page.
v1.2.0
Features and Enhancements
- Default using RAG chain to answer user questions, improving answer accuracy.
- Support importing database dump to initialize knowledge base, simplifying the setup process.
- Experimental: Support enabling Agent mode to leverage MCP tools to retrieve real-time cluster information.
- Support connecting to PGVector database deployed outside the Alauda Hyperflux installation.
- Support Cohere Reranker model to improve answer relevance.
- Support setting RAG chain parameters such as total_search_k etc.
Known Issues
- When LLM returns errors, the answer generation may fail. When come back to view the chat history, will send the question again to LLM, causing duplicated conversations.