@neural-tools/semantic-cache

Installation

npm install @neural-tools/semantic-cache

Usage

Basic Example

import { SemanticCache } from '@neural-tools/semantic-cache';

const cache = new SemanticCache({
  similarityThreshold: 0.9,
  ttl: 3600 // 1 hour
});

// Cache an LLM response
await cache.set(
  'What is the capital of France?',
  'The capital of France is Paris.'
);

// Retrieve similar queries
const cached = await cache.get(
  'Tell me the French capital'
);

if (cached) {
  console.log('Cache hit:', cached);
} else {
  // Call your LLM
  const response = await llm.generate(query);
  await cache.set(query, response);
}

With OpenAI

import OpenAI from 'openai';
import { SemanticCache } from '@neural-tools/semantic-cache';

const openai = new OpenAI();
const cache = new SemanticCache();

async function getChatResponse(message) {
  // Check cache first
  const cached = await cache.get(message);
  if (cached) return cached;

  // Call API if not cached
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: message }]
  });

  const content = response.choices[0].message.content;
  await cache.set(message, content);

  return content;
}

How It Works

Semantic caching uses vector embeddings to match similar queries:

Query is converted to an embedding vector
Vector is compared against cached queries
If similarity exceeds threshold, cached response is returned
Otherwise, new response is generated and cached

Benefits

Reduce LLM API costs by up to 90%
Improve response times significantly
Handle paraphrased queries intelligently
Configurable similarity thresholds

Configuration

Options

const cache = new SemanticCache({
  // Minimum similarity score (0-1)
  similarityThreshold: 0.9,

  // Cache TTL in seconds
  ttl: 3600,

  // Vector database provider
  vectorDB: 'pinecone', // or 'qdrant', 'chroma', 'local'

  // Embedding model
  embeddingModel: 'text-embedding-3-small'
});

Similarity Threshold

0.95-1.0: Very strict, only near-identical queries
0.85-0.95: Balanced, catches paraphrasing
0.7-0.85: Loose, may match unrelated queries

Related Packages

@neural-tools/vector-db - Vector database abstraction
@neural-tools/core - Core utilities

On this page

Installation

Usage

Basic Example

With OpenAI

How It Works

Benefits

Configuration

Options

Similarity Threshold

Related Packages