📄️ html-to-text
When ingesting HTML documents for later retrieval, we are often interested only in the actual content of the webpage rather than semantics.
📄️ @mozilla/readability
When ingesting HTML documents for later retrieval, we are often interested only in the actual content of the webpage rather than semantics.
📄️ OpenAI functions metadata tagger
It can often be useful to tag ingested documents with structured metadata, such as the title, tone, or length of a document, to allow for more targeted similarity search later. However, for large numbers of documents, performing this labelling process manually can be tedious.
📄️ Cohere Rerank
Reranking documents can greatly improve any RAG application and document retrieval system.
📄️ Mixedbread AI reranking
Overview