InnConnect Docs ← Back to InnConnect

Knowledge Base

The content your AI draws on to answer customer questions accurately.

Overview

The Knowledge Base (KB) is the foundation of everything your AI assistant says. Every article you publish is transformed into a vector embedding and stored in a semantic search index. When a customer sends a message through the chat widget, the system retrieves the most relevant articles and feeds them to the language model as context — a technique known as Retrieval-Augmented Generation (RAG).

This means the AI does not guess or hallucinate answers. It draws exclusively from content your team has reviewed and published. The quality, clarity, and coverage of your Knowledge Base directly determines the quality of your AI's responses.

Grounded Responses

Every AI answer is backed by a specific article in your Knowledge Base. No training data leakage, no invented facts.

Instant Updates

Publish or update an article and the change takes effect immediately. No retraining, no deployment, no waiting.

Per-Site Isolation

Each site has its own independent Knowledge Base. Articles from one site are never surfaced to visitors of another.

How RAG Works

Retrieval-Augmented Generation is the core mechanism that connects your Knowledge Base to the AI's responses. Understanding the pipeline helps you write better articles and debug cases where the AI gives incomplete or incorrect answers.

Step Stage What Happens
1 Customer message A visitor types a question in the chat widget. The raw text is sent to the InnConnect API.
2 Embedding The message is converted into a 1024-dimensional vector embedding that captures its semantic meaning.
3 Vector search The embedding is compared against all article chunks in the site's index using pgvector with an HNSW index. The top-K most similar chunks are retrieved.
4 Context assembly Retrieved chunks are assembled into a structured context block and inserted into the system prompt with security delimiters.
5 LLM generation The Mistral AI language model generates a natural-language response grounded in the retrieved context.
6 Response delivery The sanitized response is returned to the chat widget and displayed to the visitor.

Screenshot: Knowledge Base dashboard showing articles and categories with status badges

Why does this matter for content authors? The AI can only answer questions when it finds a semantically similar article in the index. If there is no article about your refund policy, the AI cannot answer refund questions — even if the answer seems obvious. Coverage is everything.

Categories

Categories provide organisational structure to your Knowledge Base. They help your team navigate content quickly and can optionally be surfaced in the admin panel's article browser. Categories do not affect how the AI retrieves articles — semantic search operates across all categories simultaneously.

Field Description Required
Name Display name shown in the admin panel (e.g. "Shipping & Delivery") Yes
Slug URL-friendly identifier, auto-generated from the name Auto
Description Internal note explaining what content belongs in this category No
Sort Order Numeric value controlling display order in the admin panel No
Active Inactive categories and their articles are excluded from the AI search index Yes

Categories are per-site. Each site manages its own independent set of categories. You can create as many categories as needed to reflect your content structure.

Deactivating a category hides all its articles. When you set a category to inactive, every article within it is removed from the semantic search index. The AI will no longer use those articles to answer questions, even if the individual articles are set to "Published". Reactivating the category restores them.

Articles

Articles are the individual knowledge items the AI draws on. Each article is written in a question-and-answer format optimised for semantic retrieval. The question field is the primary search target — it is used to generate the embedding that determines when this article is retrieved.

Field Description Required
Title Short label for the article, used in admin panel navigation and search results Yes
Question The customer question this article answers. This is the primary embedding target — write it in the language and phrasing your customers would use. Yes
Answer The full answer content. Supports Markdown formatting (headings, lists, bold, links). This is the text the AI will reference when constructing its response. Yes
Category Assigns the article to an organisational category Yes
Data Classification Sensitivity level of the content. Controls whether it appears in customer-facing AI responses. See Data Classification below. Yes
Status Draft or Published. Only published articles are included in the semantic search index and used by the AI. Yes
Sort Order Controls display order within its category in the admin panel No

Data Classification

Every article must be assigned a data classification level before it can be published. This is a compliance control required for ISO 27001 alignment and GDPR governance. Classification determines how the content is handled in customer-facing AI responses.

Public

General information suitable for anyone. Product descriptions, opening hours, FAQ content, return policies. Safe to include verbatim in AI responses to customers.

Internal

Information intended for staff reference. Internal procedures, pricing guides, escalation paths. Included in AI context but the AI is instructed to handle the content with discretion.

Confidential

Sensitive business information. Contract terms, partner details, internal pricing logic, strategic plans. Excluded from customer-facing AI responses. Visible only to users with the appropriate role.

Restricted

Highly sensitive information. Legal content, security procedures, credentials, PII. Excluded from customer-facing AI responses. For authorised internal reference only.

Confidential and Restricted content is never shown to customers. Regardless of semantic relevance, articles classified as Confidential or Restricted are filtered out before the context reaches the language model for customer-facing responses. This is enforced server-side and cannot be bypassed. Use these levels for content that must exist in the KB for internal purposes but should never leak to the public.

Article Chunking

Long articles are split into smaller chunks before embedding. Chunking improves retrieval precision by allowing the system to match on specific paragraphs rather than entire documents. This is especially important for articles with multiple distinct topics or detailed step-by-step instructions.

Chunk Content Purpose
Chunk 0 Title + Question Retrieval anchor. This is the primary embedding used for semantic matching. It captures the intent of the article in the most concentrated form.
Chunk 1–N Answer content, split by paragraph boundaries Granular retrieval of specific answer sections. Enables the AI to reference the most relevant paragraph, not the entire article.

Chunking rules

You do not need to manage chunks manually. Chunking is fully automatic and happens whenever an article is published or updated. The system recalculates chunks and regenerates embeddings each time content changes. The information above is provided for transparency and to help you understand how article length affects retrieval.

AI Wizard

The KB Wizard automates content generation by analysing existing content sources and producing structured Q&A articles ready for review. Instead of writing every article from scratch, provide a topic, URL, or PDF and let the AI do the initial drafting.

Five-step flow

Step Stage What Happens
1 Topic Input Choose one of three input methods: type a topic for the AI to research, paste a URL for the scraper to extract content from, or upload a PDF document.
2 Research The web research service scrapes the provided source and gathers relevant information. Credit usage is shown in the wizard UI.
3 Select Review the research results and select which findings to use as source material for article generation.
4 Generate The AI generates structured Q&A articles from the selected research. Mistral AI is used for generation.
5 Save Assign a category, review the generated content, edit as needed, and save the articles to your Knowledge Base.

Screenshot: AI Wizard five-step flow showing topic input, research results, article selection, generation, and save

SSRF protection. The wizard validates all URLs before making HTTP requests. Internal addresses (localhost, private IP ranges, cloud metadata endpoints such as 169.254.169.254) are blocked to prevent Server-Side Request Forgery attacks. Only public HTTP and HTTPS URLs are permitted. Blocked attempts are logged to the security audit log. See the Security page for details.

The wizard consumes AI credits for each step: 15 credits for research, 10 credits for topic generation, and 5 credits per article generated. See the Commerce & Billing page for credit package details.

Publishing Workflow

Articles follow a two-stage lifecycle. Only published articles are included in the semantic search index and used by the AI to answer customer questions.

Status Description Visible to AI Who Can Set
Draft Article is being written or reviewed. Not yet approved for customer-facing use. No Any user with manage-kb
Published Article has been reviewed, approved, and embedded into the semantic search index. Yes Users with approve-kb (KB Manager or higher)

This separation ensures a review step before content goes live. Agents and other users with the manage-kb permission can create and edit draft articles, but only a KB Manager or Tenant Admin (users holding the approve-kb permission) can publish them.

Unpublishing an article. Changing a published article's status back to Draft immediately removes it from the search index. The AI will stop using it to answer questions within seconds. The article content is preserved and can be re-published at any time.

Semantic Search

InnConnect uses semantic vector search rather than traditional keyword matching. This means the system understands meaning, not just words. A customer asking "wat zijn jullie openingstijden?" will match an article titled "Opening Hours" even though the two share no keywords.

Technical details

Component Implementation
Embedding dimensions 1024-dimensional vectors
Vector storage PostgreSQL with the pgvector extension
Index type HNSW (Hierarchical Navigable Small World) for fast approximate nearest-neighbour search
Similarity metric Cosine similarity
Search strategy Hybrid: vector similarity search combined with PostgreSQL full-text search for maximum recall
Multilingual Embedding models capture cross-lingual semantics. An article in English can match a question in Dutch, and vice versa.

How matching works in practice

Synonyms

"Cancel my subscription" matches an article about "How to end your membership" — no keyword overlap required.

Rephrasing

"Where are you located?" matches "Our office address" because the underlying intent is the same.

Cross-language

"Hoe kan ik retourneren?" (Dutch) matches an article titled "Return Policy" (English) because the embedding captures the concept, not the language.

Top-K retrieval. The search returns the most relevant article chunks ranked by cosine similarity. Only chunks above a confidence threshold are included in the AI's context. If no chunks meet the threshold, the AI acknowledges it does not have the information rather than guessing.

Best Practices

The quality of your Knowledge Base is the single most important factor in the quality of your AI's responses. Follow these guidelines to get the best results.

Write questions the way customers phrase them

The question field is the primary embedding target. If your customers ask "How much does shipping cost?" then that should be your question — not "Shipping fee information" or "Logistics cost overview". Match the natural language of your audience.

One topic per article

Resist the urge to create a single "FAQ" article with 30 questions. Each article should address exactly one question or topic. This improves retrieval precision because the embedding represents a single, focused concept rather than a diluted mixture.

Keep answers concise and factual

The AI synthesises responses from retrieved articles. Shorter, clearer answers produce better AI output. Avoid marketing language, filler text, or lengthy preambles. Get to the point.

Test with the chat preview

After publishing articles, open your site's chat preview (Sites → your site → Preview) and ask the questions your customers ask most often. If the AI gives an incorrect or incomplete answer, the corresponding article likely needs to be rewritten or a new article needs to be added.

Cover the gaps

Review your escalation history regularly. Questions that the AI could not answer (and escalated to a human agent) are prime candidates for new Knowledge Base articles. Each resolved escalation is an opportunity to make the AI smarter.

Classify data correctly

Take data classification seriously. Incorrectly marking internal procedures as "Public" could expose sensitive information through AI responses. When in doubt, use a more restrictive classification — you can always relax it later after review.

Quick start. If you are building your Knowledge Base from scratch, start with your 10 most frequently asked customer questions. Publish those articles, test with the chat preview, and iterate. A small, high-quality Knowledge Base will outperform a large, unfocused one.