You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
2.2 KiB
2.2 KiB
File Loader Configuration
DeepSearcher supports various file loaders to extract and process content from different file formats.
📝 Basic Configuration
config.set_provider_config("file_loader", "(FileLoaderName)", "(Arguments dict)")
📋 Available File Loaders
Loader | Description | Supported Formats |
---|---|---|
UnstructuredLoader | General purpose document loader with broad format support | PDF, DOCX, PPT, HTML, etc. |
DoclingLoader | Document processing library with extraction capabilities | See documentation |
🔍 File Loader Options
Unstructured
Unstructured is a powerful library for extracting content from various document formats.
config.set_provider_config("file_loader", "UnstructuredLoader", {})
??? tip "Setup Instructions"
You can use Unstructured in two ways:
1. **With API** (recommended for production)
- Set environment variables:
- `UNSTRUCTURED_API_KEY`
- `UNSTRUCTURED_API_URL`
2. **Local Processing**
- Simply don't set the API environment variables
- Install required dependencies:
```bash
# Install core dependencies
pip install unstructured-ingest
# For all document formats
pip install "unstructured[all-docs]"
# For specific formats (e.g., PDF only)
pip install "unstructured[pdf]"
```
For more information:
- [Unstructured Documentation](https://docs.unstructured.io/ingestion/overview)
- [Installation Guide](https://docs.unstructured.io/open-source/installation/full-installation)
Docling
Docling provides document processing capabilities with support for multiple formats.
config.set_provider_config("file_loader", "DoclingLoader", {})
??? tip "Setup Instructions"
1. Install Docling:
```bash
pip install docling
```
2. For information on supported formats, see the [Docling documentation](https://docling-project.github.io/docling/usage/supported_formats/#supported-output-formats).