Document transformers | 🦜️🔗 Langchain

📄️ html-to-text

When ingesting HTML documents for later retrieval, we are often interested only in the actual content of the webpage rather than semantics.

📄️ @mozilla/readability

When ingesting HTML documents for later retrieval, we are often interested only in the actual content of the webpage rather than semantics.

📄️ OpenAI functions metadata tagger

It can often be useful to tag ingested documents with structured metadata, such as the title, tone, or length of a document, to allow for more targeted similarity search later. However, for large numbers of documents, performing this labelling process manually can be tedious.

📄️ Cohere Rerank

Reranking documents can greatly improve any RAG application and document retrieval system.

📄️ Mixedbread AI reranking

Overview