Release Notes

v1.4.0

Features and Enhancements

New embedding model: gte-multilingual-base

The bundled knowledge base has been re-embedded with Alibaba's gte-multilingual-base model, replacing paraphrase-multilingual-mpnet-base-v2. Cross-language retrieval (e.g., a Chinese question against English docs) is substantially better, and recall@5 on the bundled ACP corpora rises to ≈ 0.82 with hybrid retrieval. Existing v1.3.x deployments are migrated automatically — see the Upgrade guide.

Hybrid retrieval (BM25 + dense vector)

A ParadeDB pg_search BM25 index is now built alongside the dense vector index, and answers are retrieved using a Reciprocal Rank Fusion (RRF) of both signals. This recovers exact-keyword matches that pure semantic search misses (CRD names, command flags, error strings). Configurable via enableBm25, bm25Weight, bm25K.

Multi-vector retrieval

Each document now contributes both per-chunk vectors and an LLM-generated document-level summary vector. This lets a question about a doc's overall topic match the doc as a whole instead of needing the user's wording to overlap with any specific chunk.

Built-in knowledge base file selector

During install / edit, choose between docvec_gte_cs2000_<date>.dump (default; chunk size 2000) and docvec_gte_cs3000_<date>.dump (chunk size 3000, slightly better recall on long-form docs).

Auto knowledge-base data swap on upgrade

The init container detects legacy mpnet collections in docvec_sys_kb and atomically swaps them to the new gte dump in place — keeping the original collection name so pgconnect.pgCollectionName does not have to change. Idempotent (per-data-version stamp), multi-replica safe (PostgreSQL advisory lock), and crash-safe (in-flight kb_swap_state row recovers on restart).

Conversation history compression

When a session approaches the LLM context window, older turns are summarised by the LLM before being re-fed to the prompt. Tunable via enableHistoryCompression, historyBudgetRatio, historyKeepRecent, toolOutputTruncateChars, and modelContextWindow. Reduces token cost on long agent-mode conversations without losing earlier context.

Service-to-service API key authentication (HMAC-SHA256)

A new apiKey / apiKeyHeader / apiKeyMaxTimeDiff set lets external services call the smart-doc API without going through the OAuth proxy. Tokens are HMAC of <username>:<unix_ts>, so the shared key never travels on the wire. Disabled by default — leave apiKey empty unless you need it.

MCP server exposed via Ingress

The Agent Mode now talks to the bundled acp-mcp-server over the cluster-local service, and external MCP clients (e.g., IDE-side coding agents) can reach it through Ingress (toggled by Expose MCP). The previous "MCP K8s API Server Address" install field — which required users to construct an erebus URL by hand — has been removed.

Documentation: Build a Custom Knowledge Base

A new guide walks admins through ingesting internal Git repositories into the Hyperflux knowledge base, with two delivery modes (chart-managed swap or manual pg_restore). See Build a Custom Knowledge Base.

Improvements

PostgreSQL 18 + ParadeDB upgrade. Bundled vector image is now mlops/paradedb:0.22.6-pg18, bringing pg18 performance and the latest pg_search.
Tool output truncation. MCP tool outputs are truncated to a configurable character count (toolOutputTruncateChars, default 500) before being re-injected into the LLM context, reducing token cost on verbose K8s responses.
Init container extracted from inline bash to init-database.sh (shipped via ConfigMap) for easier auditing and customisation.
Switched the PG driver from pg8000 to psycopg with TCP keepalives enabled, eliminating a class of long-lived-connection drops behind LB / NAT.
Built-in dump now ships inside the plugin package, removing the separate "download dump file" step that was required up through v1.2.x.

Breaking Changes

Custom knowledge bases built on v1.3.x must be re-embedded with gte-multilingual-base before upgrading. See the Upgrade guide special-case section.
MCP K8s API Server Address install field has been removed. Existing values are ignored on upgrade.

Known Limitations

The auto KB swap only touches docvec_sys_kb. The docvec_user_kb (BYO Knowledge) database is preserved unchanged across upgrades.

v1.3.1

Features and Enhancements

BYO Knowledge Tool

The BYO Knowledge tool allows enterprises to import private knowledge and use it as a dedicated, searchable knowledge source during question answering. This helps teams provide responses based on internal documents, operational knowledge, and organization-specific context.

Multi-Cluster Support

Multi-cluster support enables users to access information from multiple clusters by cluster name, expanding question-answering capabilities across cluster boundaries. This makes it easier to query and compare resources in different cluster environments.

Token Quota Limits

Token quota limits allow request frequency and token usage to be restricted by user. This helps administrators control costs, manage quotas, and prevent excessive consumption of model resources.

History

History support enables users to review previous conversations and question-answering results. This makes it easier to trace context, continue earlier investigations, and troubleshoot issues based on past interactions.

Improvements

Optimized the RAG (langchain) and reranking process to significantly improve answer accuracy and relevance.
Upgraded the core AI framework to LangChain 1.0 to stay compatible with latest features and optimizations.
Added routine system check prompts and performed comprehensive code polishing and unit test linting.
Separated databases for system knowledge base, user knowledge base, and chat history to improve data isolation and performance.
Redesigned the Smart Doc interaction page for a more intuitive and efficient user experience.
Upgraded the MCP server, adding support for OAuth authentication and writable tool configurations.
Enhanced file upload integration for a smoother knowledge ingestion process.
Added support for IDs in custom elements and resolved related data redundancy issues.
Implemented a Redis-based rate limiter to enhance system stability and manage API traffic.

Bug Fixes

Fixed an issue where model downloading could fail and improved environment variable configuration for embedding models.
Resolved data processing errors occurring during the merging and unpacking of update values.
Fixed a bug that caused redundant data prefixes in custom elements.
Resolved occasional service call failures to improve overall system reliability.

v1.2.1

NOTE: Agent mode is an experimental feature, please use with caution.

Bug Fixes

Fixed an issue where setting the knowledge database name may not work. This fix adds an option to set the database dump file name during installation, and automatically use the specified database dump file to initialize the knowledge base.
Fixed an issue where MCP tools can create or delete K8s resources without human confirmation in Agent mode.
Fixed an issue when asking for disk space information using Agent mode, the server may get stuck.
Fixed an issue when deploying on ACP 4.2 or above, the default node taints are not handled.
Fixed a deployment error: kubeVersion: >=1.20.0 which is incompatible with Kubernetes v1.33.7-1.
Fixed an issue where the API key for LLM service and rerank service appeared in plain text when deploying.

Improvements

Improved the prompt for correct Hyperflux identity.
Removed not used configuration items in the installation page.

v1.2.0

Features and Enhancements

Default using RAG chain to answer user questions, improving answer accuracy.
Support importing database dump to initialize knowledge base, simplifying the setup process.
Experimental: Support enabling Agent mode to leverage MCP tools to retrieve real-time cluster information.
Support connecting to PGVector database deployed outside the Alauda Hyperflux installation.
Support Cohere Reranker model to improve answer relevance.
Support setting RAG chain parameters such as total_search_k etc.

Known Issues

When LLM returns errors, the answer generation may fail. When come back to view the chat history, will send the question again to LLM, causing duplicated conversations.

#Release Notes

#TOC

#v1.4.0

#Features and Enhancements

#New embedding model: gte-multilingual-base

#Hybrid retrieval (BM25 + dense vector)

#Multi-vector retrieval

#Built-in knowledge base file selector

#Auto knowledge-base data swap on upgrade

#Conversation history compression

#Service-to-service API key authentication (HMAC-SHA256)

#MCP server exposed via Ingress

#Documentation: Build a Custom Knowledge Base

#Improvements

#Breaking Changes

#Known Limitations

#v1.3.1

#Features and Enhancements

#BYO Knowledge Tool

#Multi-Cluster Support

#Token Quota Limits

#History

#Improvements

#Bug Fixes

#v1.2.1

#Bug Fixes

#Improvements

#v1.2.0

#Features and Enhancements

#Known Issues

Release Notes

TOC

v1.4.0

Features and Enhancements

New embedding model: gte-multilingual-base

Hybrid retrieval (BM25 + dense vector)

Multi-vector retrieval

Built-in knowledge base file selector

Auto knowledge-base data swap on upgrade

Conversation history compression

Service-to-service API key authentication (HMAC-SHA256)

MCP server exposed via Ingress

Documentation: Build a Custom Knowledge Base

Improvements

Breaking Changes

Known Limitations

v1.3.1

Features and Enhancements

BYO Knowledge Tool

Multi-Cluster Support

Token Quota Limits

History

Improvements

Bug Fixes

v1.2.1

Bug Fixes

Improvements

v1.2.0

Features and Enhancements

Known Issues