You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
4.8 KiB
4.8 KiB
Module Support
DeepSearcher supports various integration modules including embedding models, large language models, document loaders and vector databases.
📊 Overview
Module Type | Count | Description |
---|---|---|
Embedding Models | 7+ | Text vectorization tools |
Large Language Models | 11+ | Query processing and text generation |
Document Loaders | 5+ | Parse and process documents in various formats |
Vector Databases | 2+ | Store and retrieve vector data |
🔢 Embedding Models
Support for various embedding models to convert text into vector representations for semantic search.
Provider | Required Environment Variables | Features |
---|---|---|
Open-source models | None | Locally runnable open-source models |
OpenAI | OPENAI_API_KEY |
High-quality embeddings, easy to use |
VoyageAI | VOYAGE_API_KEY |
Embeddings optimized for retrieval |
Amazon Bedrock | AWS_ACCESS_KEY_ID , AWS_SECRET_ACCESS_KEY |
AWS integration, enterprise-grade |
FastEmbed | None | Fast lightweight embeddings |
PPIO | PPIO_API_KEY |
Flexible cloud embeddings |
Novita AI | NOVITA_API_KEY |
Rich model selection |
IBM watsonx.ai | WATSONX_APIKEY , WATSONX_URL , WATSONX_PROJECT_ID |
IBM's Enterprise AI platform |
🧠 Large Language Models
Support for various large language models (LLMs) to process queries and generate responses.
Provider | Required Environment Variables | Features |
---|---|---|
OpenAI | OPENAI_API_KEY |
GPT model family |
DeepSeek | DEEPSEEK_API_KEY |
Powerful reasoning capabilities |
XAI Grok | XAI_API_KEY |
Real-time knowledge and humor |
Anthropic Claude | ANTHROPIC_API_KEY |
Excellent long-context understanding |
SiliconFlow | SILICONFLOW_API_KEY |
Enterprise inference service |
PPIO | PPIO_API_KEY |
Diverse model support |
TogetherAI | TOGETHER_API_KEY |
Wide range of open-source models |
Google Gemini | GEMINI_API_KEY |
Google's multimodal models |
SambaNova | SAMBANOVA_API_KEY |
High-performance AI platform |
Ollama | None | Local LLM deployment |
Novita AI | NOVITA_API_KEY |
Diverse AI services |
IBM watsonx.ai | WATSONX_APIKEY , WATSONX_URL , WATSONX_PROJECT_ID |
IBM's Enterprise AI platform |
📄 Document Loader
Support for loading and processing documents from various sources.
Local File Loaders
Loader | Supported Formats | Required Environment Variables |
---|---|---|
Built-in Loader | PDF, TXT, MD | None |
Unstructured | Multiple document formats | UNSTRUCTURED_API_KEY , UNSTRUCTURED_URL (optional) |
Web Crawlers
Crawler | Description | Required Environment Variables/Setup |
---|---|---|
FireCrawl | Crawler designed for AI applications | FIRECRAWL_API_KEY |
Jina Reader | High-accuracy web content extraction | JINA_API_TOKEN |
Crawl4AI | Browser automation crawler | Run crawl4ai-setup for first-time use |
💾 Vector Database Support
Support for various vector databases for efficient storage and retrieval of embeddings.
Database | Description | Features |
---|---|---|
Milvus | Open-source vector database | High-performance, scalable |
Zilliz Cloud | Managed Milvus service | Fully managed, maintenance-free |
Qdrant | Vector similarity search engine | Simple, efficient |