PGVector

To enable vector search in a generic PostgreSQL database, LangChain.js supports using the pgvector Postgres extension.

Setup

To work with PGVector, you need to install the pg package:

npm
Yarn
pnpm

npm install pg

yarn add pg

pnpm add pg

Setup a `pgvector` self hosted instance with `docker-compose`

tip

See this section for general instructions on installing integration packages.

npm
Yarn
pnpm

npm install @langchain/openai @langchain/community

yarn add @langchain/openai @langchain/community

pnpm add @langchain/openai @langchain/community

pgvector provides a prebuilt Docker image that can be used to quickly setup a self-hosted Postgres instance. Create a file below named docker-compose.yml:

# Run this command to start the database:
# docker-compose up --build
version: "3"
services:
  db:
    hostname: 127.0.0.1
    image: ankane/pgvector
    ports:
      - 5432:5432
    restart: always
    environment:
      - POSTGRES_DB=api
      - POSTGRES_USER=myuser
      - POSTGRES_PASSWORD=ChangeMe
    volumes:
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql

And then in the same directory, run docker compose up to start the container.

You can find more information on how to setup pgvector in the official repository.

Usage

Security

User-generated data such as usernames should not be used as input for table and column names.
This may lead to SQL Injection!

One complete example of using PGVectorStore is the following:

import { OpenAIEmbeddings } from "@langchain/openai";
import {
  DistanceStrategy,
  PGVectorStore,
} from "@langchain/community/vectorstores/pgvector";
import { PoolConfig } from "pg";

// First, follow set-up instructions at
// https://js.langchain.com/docs/modules/indexes/vector_stores/integrations/pgvector

const config = {
  postgresConnectionOptions: {
    type: "postgres",
    host: "127.0.0.1",
    port: 5433,
    user: "myuser",
    password: "ChangeMe",
    database: "api",
  } as PoolConfig,
  tableName: "testlangchain",
  columns: {
    idColumnName: "id",
    vectorColumnName: "vector",
    contentColumnName: "content",
    metadataColumnName: "metadata",
  },
  // supported distance strategies: cosine (default), innerProduct, or euclidean
  distanceStrategy: "cosine" as DistanceStrategy,
};

const pgvectorStore = await PGVectorStore.initialize(
  new OpenAIEmbeddings(),
  config
);

await pgvectorStore.addDocuments([
  { pageContent: "what's this", metadata: { a: 2, b: ["tag1", "tag2"] } },
  { pageContent: "Cat drinks milk", metadata: { a: 1, b: ["tag2"] } },
]);

const results = await pgvectorStore.similaritySearch("water", 1);

console.log(results);

/*
  [ Document { pageContent: 'Cat drinks milk', metadata: { a: 1 } } ]
*/

// Filtering is supported
const results2 = await pgvectorStore.similaritySearch("water", 1, {
  a: 2,
});

console.log(results2);

/*
  [ Document { pageContent: 'what's this', metadata: { a: 2 } } ]
*/

// Filtering on multiple values using "in" is supported too
const results3 = await pgvectorStore.similaritySearch("water", 1, {
  a: {
    in: [2],
  },
});

console.log(results3);

/*
  [ Document { pageContent: 'what's this', metadata: { a: 2 } } ]
*/

await pgvectorStore.delete({
  filter: {
    a: 1,
  },
});

const results4 = await pgvectorStore.similaritySearch("water", 1);

console.log(results4);

/*
  [ Document { pageContent: 'what's this', metadata: { a: 2 } } ]
*/

// Filtering using arrayContains (?|) is supported
const results5 = await pgvectorStore.similaritySearch("water", 1, {
  b: {
    arrayContains: ["tag1"],
  },
});

console.log(results5);

/*
  [ Document { pageContent: "what's this", metadata: { a: 2, b: ['tag1', 'tag2'] } } } ]
*/

await pgvectorStore.end();

API Reference:

OpenAIEmbeddings from @langchain/openai
DistanceStrategy from @langchain/community/vectorstores/pgvector
PGVectorStore from @langchain/community/vectorstores/pgvector

You can also specify a collectionTableName and a collectionName to partition vectors between multiple users or namespaces.

Advanced: reusing connections

You can reuse connections by creating a pool, then creating new PGVectorStore instances directly via the constructor.

Note that you should call .initialize() to set up your database at least once to set up your tables properly before using the constructor.

import { OpenAIEmbeddings } from "@langchain/openai";
import { PGVectorStore } from "@langchain/community/vectorstores/pgvector";
import pg from "pg";

// First, follow set-up instructions at
// https://js.langchain.com/docs/modules/indexes/vector_stores/integrations/pgvector

const reusablePool = new pg.Pool({
  host: "127.0.0.1",
  port: 5433,
  user: "myuser",
  password: "ChangeMe",
  database: "api",
});

const originalConfig = {
  pool: reusablePool,
  tableName: "testlangchain",
  collectionName: "sample",
  collectionTableName: "collections",
  columns: {
    idColumnName: "id",
    vectorColumnName: "vector",
    contentColumnName: "content",
    metadataColumnName: "metadata",
  },
};

// Set up the DB.
// Can skip this step if you've already initialized the DB.
// await PGVectorStore.initialize(new OpenAIEmbeddings(), originalConfig);

const pgvectorStore = new PGVectorStore(new OpenAIEmbeddings(), originalConfig);

await pgvectorStore.addDocuments([
  { pageContent: "what's this", metadata: { a: 2 } },
  { pageContent: "Cat drinks milk", metadata: { a: 1 } },
]);

const results = await pgvectorStore.similaritySearch("water", 1);

console.log(results);

/*
  [ Document { pageContent: 'Cat drinks milk', metadata: { a: 1 } } ]
*/

const pgvectorStore2 = new PGVectorStore(new OpenAIEmbeddings(), {
  pool: reusablePool,
  tableName: "testlangchain",
  collectionTableName: "collections",
  collectionName: "some_other_collection",
  columns: {
    idColumnName: "id",
    vectorColumnName: "vector",
    contentColumnName: "content",
    metadataColumnName: "metadata",
  },
});

const results2 = await pgvectorStore2.similaritySearch("water", 1);

console.log(results2);

/*
  []
*/

await reusablePool.end();

API Reference:

OpenAIEmbeddings from @langchain/openai
PGVectorStore from @langchain/community/vectorstores/pgvector

Create HNSW Index

By default, the extension performs a sequential scan search, with 100% recall. You might consider creating an HNSW index for approximate nearest neighbor (ANN) search to speed up similaritySearchVectorWithScore execution time. To create the HNSW index on your vector column, use the createHnswIndex() method:

The method parameters include:

dimensions: Defines the number of dimensions in your vector data type, up to 2000. For example, use 1536 for OpenAI's text-embedding-ada-002 and Amazon's amazon.titan-embed-text-v1 models.

m?: The max number of connections per layer (16 by default). Index build time improves with smaller values, while higher values can speed up search queries.

efConstruction?: The size of the dynamic candidate list for constructing the graph (64 by default). A higher value can potentially improve the index quality at the cost of index build time.

distanceFunction?: The distance function name you want to use, is automatically selected based on the distanceStrategy.

More info at the Pgvector GitHub project and the HNSW paper from Malkov Yu A. and Yashunin D. A.. 2020. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs

import { OpenAIEmbeddings } from "@langchain/openai";
import {
  DistanceStrategy,
  PGVectorStore,
} from "@langchain/community/vectorstores/pgvector";
import { PoolConfig } from "pg";

// First, follow set-up instructions at
// https://js.langchain.com/docs/modules/indexes/vector_stores/integrations/pgvector

const config = {
  postgresConnectionOptions: {
    type: "postgres",
    host: "127.0.0.1",
    port: 5433,
    user: "myuser",
    password: "ChangeMe",
    database: "api",
  } as PoolConfig,
  tableName: "testlangchain",
  columns: {
    idColumnName: "id",
    vectorColumnName: "vector",
    contentColumnName: "content",
    metadataColumnName: "metadata",
  },
  // supported distance strategies: cosine (default), innerProduct, or euclidean
  distanceStrategy: "cosine" as DistanceStrategy,
};

const pgvectorStore = await PGVectorStore.initialize(
  new OpenAIEmbeddings(),
  config
);

// create the index
await pgvectorStore.createHnswIndex({
  dimensions: 1536,
  efConstruction: 64,
  m: 16,
});

await pgvectorStore.addDocuments([
  { pageContent: "what's this", metadata: { a: 2, b: ["tag1", "tag2"] } },
  { pageContent: "Cat drinks milk", metadata: { a: 1, b: ["tag2"] } },
]);

const model = new OpenAIEmbeddings();
const query = await model.embedQuery("water");
const results = await pgvectorStore.similaritySearchVectorWithScore(query, 1);

console.log(results);

await pgvectorStore.end();

API Reference:

OpenAIEmbeddings from @langchain/openai
DistanceStrategy from @langchain/community/vectorstores/pgvector
PGVectorStore from @langchain/community/vectorstores/pgvector

PGVector

Setup

Setup a `pgvector` self hosted instance with `docker-compose`

Usage

API Reference:

Advanced: reusing connections

API Reference:

Create HNSW Index

API Reference:

Was this page helpful?

You can also leave detailed feedback on GitHub.

PGVector

Setup​

Setup a pgvector self hosted instance with docker-compose​

Usage​

API Reference:

Advanced: reusing connections​

API Reference:

Create HNSW Index​

API Reference:

Was this page helpful?

You can also leave detailed feedback on GitHub.

Setup

Setup a `pgvector` self hosted instance with `docker-compose`

Usage

Advanced: reusing connections

Create HNSW Index