Knowledge Base

The content your AI draws on to answer customer questions accurately.

Overview

The Knowledge Base (KB) is the foundation of everything your AI assistant says. Every article you publish is transformed into a vector embedding and stored in a semantic search index. When a customer sends a message through the chat widget, the system retrieves the most relevant articles and feeds them to the language model as context — a technique known as Retrieval-Augmented Generation (RAG).

This means the AI does not guess or hallucinate answers. It draws exclusively from content your team has reviewed and published. The quality, clarity, and coverage of your Knowledge Base directly determines the quality of your AI's responses.

Grounded Responses

Every AI answer is backed by a specific article in your Knowledge Base. No training data leakage, no invented facts.

Instant Updates

Publish or update an article and the change takes effect immediately. No retraining, no deployment, no waiting.

Per-Site Isolation

Each site has its own independent Knowledge Base. Articles from one site are never surfaced to visitors of another.

How RAG Works

Retrieval-Augmented Generation is the core mechanism that connects your Knowledge Base to the AI's responses. Understanding the pipeline helps you write better articles and debug cases where the AI gives incomplete or incorrect answers.

Step	Stage	What Happens
1	Customer message	A visitor types a question in the chat widget. The raw text is sent to the InnConnect API.
2	Embedding	The message is converted into a 1024-dimensional vector embedding that captures its semantic meaning.
3	Vector search	The embedding is compared against all article chunks in the site's index using pgvector with an HNSW index. The top-K most similar chunks are retrieved.
4	Context assembly	Retrieved chunks are assembled into a structured context block and inserted into the system prompt with security delimiters.
5	LLM generation	The Mistral AI language model generates a natural-language response grounded in the retrieved context.
6	Response delivery	The sanitized response is returned to the chat widget and displayed to the visitor.

Screenshot: Knowledge Base dashboard showing articles and categories with status badges

Why does this matter for content authors? The AI can only answer questions when it finds a semantically similar article in the index. If there is no article about your refund policy, the AI cannot answer refund questions — even if the answer seems obvious. Coverage is everything.

Field	Description	Required
Name	Display name shown in the admin panel (e.g. "Shipping & Delivery")	Yes
Slug	URL-friendly identifier, auto-generated from the name	Auto
Description	Internal note explaining what content belongs in this category	No
Sort Order	Numeric value controlling display order in the admin panel	No
Active	Inactive categories and their articles are excluded from the AI search index	Yes

Articles

Articles are the individual knowledge items the AI draws on. Each article is written in a question-and-answer format optimised for semantic retrieval. The question field is the primary search target — it is used to generate the embedding that determines when this article is retrieved.

Field	Description	Required
Title	Short label for the article, used in admin panel navigation and search results	Yes
Question	The customer question this article answers. This is the primary embedding target — write it in the language and phrasing your customers would use.	Yes
Answer	The full answer content. Supports Markdown formatting (headings, lists, bold, links). This is the text the AI will reference when constructing its response.	Yes
Category	Assigns the article to an organisational category	Yes
Data Classification	Sensitivity level of the content. Controls whether it appears in customer-facing AI responses. See Data Classification below.	Yes
Status	Draft or Published. Only published articles are included in the semantic search index and used by the AI.	Yes
Sort Order	Controls display order within its category in the admin panel	No

Data Classification

Every article must be assigned a data classification level before it can be published. This is a compliance control required for ISO 27001 alignment and GDPR governance. Classification determines how the content is handled in customer-facing AI responses.

Public

General information suitable for anyone. Product descriptions, opening hours, FAQ content, return policies. Safe to include verbatim in AI responses to customers.

Internal

Information intended for staff reference. Internal procedures, pricing guides, escalation paths. Included in AI context but the AI is instructed to handle the content with discretion.

Confidential

Sensitive business information. Contract terms, partner details, internal pricing logic, strategic plans. Excluded from customer-facing AI responses. Visible only to users with the appropriate role.

Restricted

Highly sensitive information. Legal content, security procedures, credentials, PII. Excluded from customer-facing AI responses. For authorised internal reference only.

Confidential and Restricted content is never shown to customers. Regardless of semantic relevance, articles classified as Confidential or Restricted are filtered out before the context reaches the language model for customer-facing responses. This is enforced server-side and cannot be bypassed. Use these levels for content that must exist in the KB for internal purposes but should never leak to the public.

Article Chunking

Long articles are split into smaller chunks before embedding. Chunking improves retrieval precision by allowing the system to match on specific paragraphs rather than entire documents. This is especially important for articles with multiple distinct topics or detailed step-by-step instructions.

Chunk	Content	Purpose
Chunk 0	Title + Question	Retrieval anchor. This is the primary embedding used for semantic matching. It captures the intent of the article in the most concentrated form.
Chunk 1–N	Answer content, split by paragraph boundaries	Granular retrieval of specific answer sections. Enables the AI to reference the most relevant paragraph, not the entire article.

Chunking rules

Target size: approximately 500 tokens per chunk
Overlap: 50 tokens of overlap between consecutive chunks to preserve context at boundaries
Split strategy: paragraphs are the primary split point. If a single paragraph exceeds 500 tokens, it is further split by sentence boundaries.
Short articles: articles shorter than 500 tokens produce only two chunks (chunk 0 for title+question, chunk 1 for the full answer)

You do not need to manage chunks manually. Chunking is fully automatic and happens whenever an article is published or updated. The system recalculates chunks and regenerates embeddings each time content changes. The information above is provided for transparency and to help you understand how article length affects retrieval.

AI Wizard

The KB Wizard automates content generation by analysing existing content sources and producing structured Q&A articles ready for review. Instead of writing every article from scratch, provide a topic, URL, or PDF and let the AI do the initial drafting.

Five-step flow

Step	Stage	What Happens
1	Topic Input	Choose one of three input methods: type a topic for the AI to research, paste a URL for the scraper to extract content from, or upload a PDF document.
2	Research	The web research service scrapes the provided source and gathers relevant information. Credit usage is shown in the wizard UI.
3	Select	Review the research results and select which findings to use as source material for article generation.
4	Generate	The AI generates structured Q&A articles from the selected research. Mistral AI is used for generation.
5	Save	Assign a category, review the generated content, edit as needed, and save the articles to your Knowledge Base.

Screenshot: AI Wizard five-step flow showing topic input, research results, article selection, generation, and save

SSRF protection. The wizard validates all URLs before making HTTP requests. Internal addresses (localhost, private IP ranges, cloud metadata endpoints such as 169.254.169.254) are blocked to prevent Server-Side Request Forgery attacks. Only public HTTP and HTTPS URLs are permitted. Blocked attempts are logged to the security audit log. See the Security page for details.

The wizard consumes AI credits for each step: 15 credits for research, 10 credits for topic generation, and 5 credits per article generated. See the Commerce & Billing page for credit package details.

Publishing Workflow

Articles follow a two-stage lifecycle. Only published articles are included in the semantic search index and used by the AI to answer customer questions.

Status	Description	Visible to AI	Who Can Set
Draft	Article is being written or reviewed. Not yet approved for customer-facing use.	No	Any user with manage-kb
Published	Article has been reviewed, approved, and embedded into the semantic search index.	Yes	Users with approve-kb (KB Manager or higher)

This separation ensures a review step before content goes live. Agents and other users with the manage-kb permission can create and edit draft articles, but only a KB Manager or Tenant Admin (users holding the approve-kb permission) can publish them.

Unpublishing an article. Changing a published article's status back to Draft immediately removes it from the search index. The AI will stop using it to answer questions within seconds. The article content is preserved and can be re-published at any time.

Semantic Search

InnConnect uses semantic vector search rather than traditional keyword matching. This means the system understands meaning, not just words. A customer asking "wat zijn jullie openingstijden?" will match an article titled "Opening Hours" even though the two share no keywords.

Technical details

Component	Implementation
Embedding dimensions	1024-dimensional vectors
Vector storage	PostgreSQL with the pgvector extension
Index type	HNSW (Hierarchical Navigable Small World) for fast approximate nearest-neighbour search
Similarity metric	Cosine similarity
Search strategy	Hybrid: vector similarity search combined with PostgreSQL full-text search for maximum recall
Multilingual	Embedding models capture cross-lingual semantics. An article in English can match a question in Dutch, and vice versa.

How matching works in practice

Synonyms

"Cancel my subscription" matches an article about "How to end your membership" — no keyword overlap required.

Rephrasing

"Where are you located?" matches "Our office address" because the underlying intent is the same.

Cross-language

"Hoe kan ik retourneren?" (Dutch) matches an article titled "Return Policy" (English) because the embedding captures the concept, not the language.

Top-K retrieval. The search returns the most relevant article chunks ranked by cosine similarity. Only chunks above a confidence threshold are included in the AI's context. If no chunks meet the threshold, the AI acknowledges it does not have the information rather than guessing.

Best Practices

The quality of your Knowledge Base is the single most important factor in the quality of your AI's responses. Follow these guidelines to get the best results.

Write questions the way customers phrase them

The question field is the primary embedding target. If your customers ask "How much does shipping cost?" then that should be your question — not "Shipping fee information" or "Logistics cost overview". Match the natural language of your audience.

One topic per article

Resist the urge to create a single "FAQ" article with 30 questions. Each article should address exactly one question or topic. This improves retrieval precision because the embedding represents a single, focused concept rather than a diluted mixture.

Keep answers concise and factual

The AI synthesises responses from retrieved articles. Shorter, clearer answers produce better AI output. Avoid marketing language, filler text, or lengthy preambles. Get to the point.

Test with the chat preview

After publishing articles, open your site's chat preview (Sites → your site → Preview) and ask the questions your customers ask most often. If the AI gives an incorrect or incomplete answer, the corresponding article likely needs to be rewritten or a new article needs to be added.

Cover the gaps

Review your escalation history regularly. Questions that the AI could not answer (and escalated to a human agent) are prime candidates for new Knowledge Base articles. Each resolved escalation is an opportunity to make the AI smarter.

Classify data correctly

Take data classification seriously. Incorrectly marking internal procedures as "Public" could expose sensitive information through AI responses. When in doubt, use a more restrictive classification — you can always relax it later after review.

Quick start. If you are building your Knowledge Base from scratch, start with your 10 most frequently asked customer questions. Publish those articles, test with the chat preview, and iterate. A small, high-quality Knowledge Base will outperform a large, unfocused one.

Knowledge Base

Overview

Grounded Responses

Instant Updates

Per-Site Isolation

How RAG Works

Categories

Articles

Data Classification

Public

Internal

Confidential

Restricted

Article Chunking

Chunking rules

AI Wizard

Five-step flow

Publishing Workflow

Semantic Search

Technical details

How matching works in practice

Synonyms

Rephrasing

Cross-language

Best Practices

Write questions the way customers phrase them

One topic per article

Keep answers concise and factual

Test with the chat preview

Cover the gaps

Classify data correctly