diff --git a/README.md b/README.md index 62ba55b..c716976 100644 --- a/README.md +++ b/README.md @@ -1,618 +1,54 @@ - +## 部署方法 +环境:Ubuntu + UV + Docker -
config.set_provider_config("llm", "(LLMName)", "(Arguments dict)")
-The "LLMName" can be one of the following: ["DeepSeek", "OpenAI", "XAI", "SiliconFlow", "Aliyun", "PPIO", "TogetherAI", "Gemini", "Ollama", "Novita"]
-The "Arguments dict" is a dictionary that contains the necessary arguments for the LLM class.
- - Make sure you have prepared your OPENAI API KEY as an env variable OPENAI_API_KEY
.
config.set_provider_config("llm", "OpenAI", {"model": "o1-mini"})
- More details about OpenAI models: https://platform.openai.com/docs/models
- Make sure you have prepared your Bailian API KEY as an env variable DASHSCOPE_API_KEY
.
config.set_provider_config("llm", "Aliyun", {"model": "qwen-plus-latest"})
- More details about Aliyun Bailian models: https://bailian.console.aliyun.com
-config.set_provider_config("llm", "OpenAI", {"model": "qwen/qwen3-235b-a22b:free", "base_url": "https://openrouter.ai/api/v1", "api_key": "OPENROUTER_API_KEY"})
- More details about OpenRouter models: https://openrouter.ai/qwen/qwen3-235b-a22b:free
- Make sure you have prepared your DEEPSEEK API KEY as an env variable DEEPSEEK_API_KEY
.
config.set_provider_config("llm", "DeepSeek", {"model": "deepseek-reasoner"})
- More details about DeepSeek: https://api-docs.deepseek.com/
- Make sure you have prepared your SILICONFLOW API KEY as an env variable SILICONFLOW_API_KEY
.
config.set_provider_config("llm", "SiliconFlow", {"model": "deepseek-ai/DeepSeek-R1"})
- More details about SiliconFlow: https://docs.siliconflow.cn/quickstart
- Make sure you have prepared your TOGETHER API KEY as an env variable TOGETHER_API_KEY
.
config.set_provider_config("llm", "TogetherAI", {"model": "deepseek-ai/DeepSeek-R1"})
- For Llama 4:
- config.set_provider_config("llm", "TogetherAI", {"model": "meta-llama/Llama-4-Scout-17B-16E-Instruct"})
- You need to install together before running, execute: pip install together
. More details about TogetherAI: https://www.together.ai/
Make sure you have prepared your XAI API KEY as an env variable XAI_API_KEY
.
config.set_provider_config("llm", "XAI", {"model": "grok-4-0709"})
- More details about XAI Grok: https://docs.x.ai/docs/overview#featured-models
- Make sure you have prepared your ANTHROPIC API KEY as an env variable ANTHROPIC_API_KEY
.
config.set_provider_config("llm", "Anthropic", {"model": "claude-sonnet-4-0"})
- More details about Anthropic Claude: https://docs.anthropic.com/en/home
- Make sure you have prepared your GEMINI API KEY as an env variable GEMINI_API_KEY
.
config.set_provider_config('llm', 'Gemini', { 'model': 'gemini-2.0-flash' })
- You need to install gemini before running, execute: pip install google-genai
. More details about Gemini: https://ai.google.dev/gemini-api/docs
Make sure you have prepared your PPIO API KEY as an env variable PPIO_API_KEY
. You can create an API Key here.
config.set_provider_config("llm", "PPIO", {"model": "deepseek/deepseek-r1-turbo"})
- More details about PPIO: https://ppinfra.com/docs/get-started/quickstart.html?utm_source=github_deep-searcher
-Follow these instructions to set up and run a local Ollama instance:
-Download and install Ollama onto the available supported platforms (including Windows Subsystem for Linux).
-View a list of available models via the model library.
- Fetch available LLM models via ollama pull <name-of-model>
Example: ollama pull qwen3
To chat directly with a model from the command line, use ollama run <name-of-model>
.
By default, Ollama has a REST API for running and managing models on http://localhost:11434.
-config.set_provider_config("llm", "Ollama", {"model": "qwen3"})
- Make sure you have prepared your Volcengine API KEY as an env variable VOLCENGINE_API_KEY
. You can create an API Key here.
config.set_provider_config("llm", "Volcengine", {"model": "deepseek-r1-250120"})
- More details about Volcengine: https://www.volcengine.com/docs/82379/1099455?utm_source=github_deep-searcher
- Make sure you have prepared your GLM API KEY as an env variable GLM_API_KEY
.
config.set_provider_config("llm", "GLM", {"model": "glm-4-plus"})
- You need to install zhipuai before running, execute: pip install zhipuai
. More details about GLM: https://bigmodel.cn/dev/welcome
Make sure you have prepared your Amazon Bedrock API KEY as an env variable AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
.
config.set_provider_config("llm", "Bedrock", {"model": "us.deepseek.r1-v1:0"})
- You need to install boto3 before running, execute: pip install boto3
. More details about Amazon Bedrock: https://docs.aws.amazon.com/bedrock/
Make sure you have prepared your watsonx.ai credentials as env variables WATSONX_APIKEY
, WATSONX_URL
, and WATSONX_PROJECT_ID
.
config.set_provider_config("llm", "watsonx", {"model": "us.deepseek.r1-v1:0"})
- You need to install ibm-watsonx-ai before running, execute: pip install ibm-watsonx-ai
. More details about IBM watsonx.ai: https://www.ibm.com/products/watsonx-ai/foundation-models
config.set_provider_config("embedding", "(EmbeddingModelName)", "(Arguments dict)")
-The "EmbeddingModelName" can be one of the following: ["MilvusEmbedding", "OpenAIEmbedding", "VoyageEmbedding", "SiliconflowEmbedding", "PPIOEmbedding", "NovitaEmbedding"]
-The "Arguments dict" is a dictionary that contains the necessary arguments for the embedding model class.
- - Make sure you have prepared your OpenAI API KEY as an env variable OPENAI_API_KEY
.
config.set_provider_config("embedding", "OpenAIEmbedding", {"model": "text-embedding-3-small"})
- More details about OpenAI models: https://platform.openai.com/docs/guides/embeddings/use-cases
- Make sure you have prepared your OpenAI API KEY as an env variable OPENAI_API_KEY
.
config.set_provider_config("embedding", "OpenAIEmbedding", {
- "model": "text-embedding-ada-002",
- "azure_endpoint": "https://.openai.azure.com/",
- "api_version": "2023-05-15"
-})
- Use the built-in embedding model in Pymilvus, you can set the model name as "default"
, "BAAI/bge-base-en-v1.5"
, "BAAI/bge-large-en-v1.5"
, "jina-embeddings-v3"
, etc.
- See [milvus_embedding.py](deepsearcher/embedding/milvus_embedding.py) for more details.
config.set_provider_config("embedding", "MilvusEmbedding", {"model": "BAAI/bge-base-en-v1.5"})
- config.set_provider_config("embedding", "MilvusEmbedding", {"model": "jina-embeddings-v3"})
- For Jina's embedding model, you needJINAAI_API_KEY
.
You need to install pymilvus model before running, execute: pip install pymilvus.model
. More details about Pymilvus: https://milvus.io/docs/embeddings.md
Make sure you have prepared your VOYAGE API KEY as an env variable VOYAGE_API_KEY
.
config.set_provider_config("embedding", "VoyageEmbedding", {"model": "voyage-3"})
- You need to install voyageai before running, execute: pip install voyageai
. More details about VoyageAI: https://docs.voyageai.com/embeddings/
config.set_provider_config("embedding", "BedrockEmbedding", {"model": "amazon.titan-embed-text-v2:0"})
- You need to install boto3 before running, execute: pip install boto3
. More details about Amazon Bedrock: https://docs.aws.amazon.com/bedrock/
Make sure you have prepared your Novita AI API KEY as an env variable NOVITA_API_KEY
.
config.set_provider_config("embedding", "NovitaEmbedding", {"model": "baai/bge-m3"})
- More details about Novita AI: https://novita.ai/docs/api-reference/model-apis-llm-create-embeddings?utm_source=github_deep-searcher&utm_medium=github_readme&utm_campaign=link
- Make sure you have prepared your Siliconflow API KEY as an env variable SILICONFLOW_API_KEY
.
config.set_provider_config("embedding", "SiliconflowEmbedding", {"model": "BAAI/bge-m3"})
- More details about Siliconflow: https://docs.siliconflow.cn/en/api-reference/embeddings/create-embeddings
- Make sure you have prepared your Volcengine API KEY as an env variable VOLCENGINE_API_KEY
.
config.set_provider_config("embedding", "VolcengineEmbedding", {"model": "doubao-embedding-text-240515"})
- More details about Volcengine: https://www.volcengine.com/docs/82379/1302003
- Make sure you have prepared your GLM API KEY as an env variable GLM_API_KEY
.
config.set_provider_config("embedding", "GLMEmbedding", {"model": "embedding-3"})
- You need to install zhipuai before running, execute: pip install zhipuai
. More details about GLM: https://bigmodel.cn/dev/welcome
Make sure you have prepared your Gemini API KEY as an env variable GEMINI_API_KEY
.
config.set_provider_config("embedding", "GeminiEmbedding", {"model": "text-embedding-004"})
- You need to install gemini before running, execute: pip install google-genai
. More details about Gemini: https://ai.google.dev/gemini-api/docs
config.set_provider_config("embedding", "OllamaEmbedding", {"model": "bge-m3"})
- You need to install ollama before running, execute: pip install ollama
. More details about Ollama Python SDK: https://github.com/ollama/ollama-python
Make sure you have prepared your PPIO API KEY as an env variable PPIO_API_KEY
.
config.set_provider_config("embedding", "PPIOEmbedding", {"model": "baai/bge-m3"})
- More details about PPIO: https://ppinfra.com/docs/get-started/quickstart.html?utm_source=github_deep-searcher
-config.set_provider_config("embedding", "FastEmbedEmbedding", {"model": "intfloat/multilingual-e5-large"})
- You need to install fastembed before running, execute: pip install fastembed
. More details about fastembed: https://github.com/qdrant/fastembed
Make sure you have prepared your WatsonX credentials as env variables WATSONX_APIKEY
, WATSONX_URL
, and WATSONX_PROJECT_ID
.
config.set_provider_config("embedding", "WatsonXEmbedding", {"model": "ibm/slate-125m-english-rtrvr-v2"})
- config.set_provider_config("embedding", "WatsonXEmbedding", {"model": "sentence-transformers/all-minilm-l6-v2"})
- You need to install ibm-watsonx-ai before running, execute: pip install ibm-watsonx-ai
. More details about IBM watsonx.ai: https://www.ibm.com/products/watsonx-ai/foundation-models
config.set_provider_config("vector_db", "(VectorDBName)", "(Arguments dict)")
-The "VectorDBName" can be one of the following: ["Milvus"] (Under development)
-The "Arguments dict" is a dictionary that contains the necessary arguments for the Vector Database class.
- -config.set_provider_config("vector_db", "Milvus", {"uri": "./milvus.db", "token": ""})
- More details about Milvus Config:
-uri
as a local file, e.g. ./milvus.db
, is the most convenient method, as it automatically utilizes Milvus Lite to store all data in this file.
- http://localhost:19530
, as your uri
.
- You can also use any other connection parameters supported by Milvus such as host
, user
, password
, or secure
.
- uri
and token
- according to the Public Endpoint and API Key in Zilliz Cloud.
- config.set_provider_config("vector_db", "AzureSearch", {
- "endpoint": "https://.search.windows.net",
- "index_name": "",
- "api_key": "",
- "vector_field": ""
-})
- More details about Milvus Config:
- -config.set_provider_config("file_loader", "(FileLoaderName)", "(Arguments dict)")
-The "FileLoaderName" can be one of the following: ["PDFLoader", "TextLoader", "UnstructuredLoader"]
-The "Arguments dict" is a dictionary that contains the necessary arguments for the File Loader class.
- -You can use Unstructured in two ways:
-UNSTRUCTURED_API_KEY
and UNSTRUCTURED_API_URL
config.set_provider_config("file_loader", "UnstructuredLoader", {})
- pip install unstructured-ingest
pip install "unstructured[all-docs]"
pip install "unstructured[pdf]"
config.set_provider_config("file_loader", "DoclingLoader", {})
- Currently supported file types: please refer to the Docling documentation: https://docling-project.github.io/docling/usage/supported_formats/#supported-output-formats
- You need to install docling before running, execute: pip install docling
. More details about Docling: https://docling-project.github.io/docling/
config.set_provider_config("web_crawler", "(WebCrawlerName)", "(Arguments dict)")
-The "WebCrawlerName" can be one of the following: ["FireCrawlCrawler", "Crawl4AICrawler", "JinaCrawler"]
-The "Arguments dict" is a dictionary that contains the necessary arguments for the Web Crawler class.
- - Make sure you have prepared your FireCrawl API KEY as an env variable FIRECRAWL_API_KEY
.
config.set_provider_config("web_crawler", "FireCrawlCrawler", {})
- More details about FireCrawl: https://docs.firecrawl.dev/introduction
- Make sure you have run crawl4ai-setup
in your environment.
config.set_provider_config("web_crawler", "Crawl4AICrawler", {"browser_config": {"headless": True, "verbose": True}})
- You need to install crawl4ai before running, execute: pip install crawl4ai
. More details about Crawl4AI: https://docs.crawl4ai.com/
Make sure you have prepared your Jina Reader API KEY as an env variable JINA_API_TOKEN
or JINAAI_API_KEY
.
config.set_provider_config("web_crawler", "JinaCrawler", {})
- More details about Jina Reader: https://jina.ai/reader/
-config.set_provider_config("web_crawler", "DoclingCrawler", {})
- Currently supported file types: please refer to the Docling documentation: https://docling-project.github.io/docling/usage/supported_formats/#supported-output-formats
- You need to install docling before running, execute: pip install docling
. More details about Docling: https://docling-project.github.io/docling/