Vector Databases and Semantic Search: Building Aria

Aria is SellTrove's AI-powered selling assistant. It helps sellers with product description writing, pricing suggestions, and customer inquiry responses — all contextualised with the seller's actual store data.

The core technical challenge: Aria needs to retrieve relevant context (a seller's product catalogue, past orders, pricing history) before generating a response. This is Retrieval-Augmented Generation (RAG).

The vector database choice: Pinecone for production, with OpenAI's text-embedding-3-small model for generating embeddings. Product descriptions, past order data, and seller preferences are embedded and stored. At query time, the user's question is embedded and the closest vectors are retrieved as context.

The architecture: user sends a message to Aria. The message is embedded. Top-k similar vectors are retrieved from Pinecone with the seller's tenant ID as a metadata filter. The retrieved context and the user's message are combined into a prompt. The LLM generates a contextualised response.

The latency budget: embedding (50ms) + vector retrieval (30ms) + LLM generation (1-3s) = 1.1-3.1s end-to-end. Acceptable for a conversational assistant. Not acceptable for a synchronous API response.

— Dick Bassey | DevDick | 2026

Vector Databases and Semantic Search: What I Learned Building Aria

Related Articles

ONNX Runtime Cut Our Inference Latency by 35%

The 28% Conversion Uplift Nobody Asked About

Prompt Engineering Is Software Engineering