You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

70 lines
2.2 KiB

2 weeks ago
# File Loader Configuration
DeepSearcher supports various file loaders to extract and process content from different file formats.
## 📝 Basic Configuration
```python
config.set_provider_config("file_loader", "(FileLoaderName)", "(Arguments dict)")
```
## 📋 Available File Loaders
| Loader | Description | Supported Formats |
|--------|-------------|-------------------|
| **UnstructuredLoader** | General purpose document loader with broad format support | PDF, DOCX, PPT, HTML, etc. |
| **DoclingLoader** | Document processing library with extraction capabilities | See [documentation](https://docling-project.github.io/docling/usage/supported_formats/) |
## 🔍 File Loader Options
### Unstructured
[Unstructured](https://unstructured.io/) is a powerful library for extracting content from various document formats.
```python
config.set_provider_config("file_loader", "UnstructuredLoader", {})
```
??? tip "Setup Instructions"
You can use Unstructured in two ways:
1. **With API** (recommended for production)
- Set environment variables:
- `UNSTRUCTURED_API_KEY`
- `UNSTRUCTURED_API_URL`
2. **Local Processing**
- Simply don't set the API environment variables
- Install required dependencies:
```bash
# Install core dependencies
pip install unstructured-ingest
# For all document formats
pip install "unstructured[all-docs]"
# For specific formats (e.g., PDF only)
pip install "unstructured[pdf]"
```
For more information:
- [Unstructured Documentation](https://docs.unstructured.io/ingestion/overview)
- [Installation Guide](https://docs.unstructured.io/open-source/installation/full-installation)
### Docling
[Docling](https://docling-project.github.io/docling/) provides document processing capabilities with support for multiple formats.
```python
config.set_provider_config("file_loader", "DoclingLoader", {})
```
??? tip "Setup Instructions"
1. Install Docling:
```bash
pip install docling
```
2. For information on supported formats, see the [Docling documentation](https://docling-project.github.io/docling/usage/supported_formats/#supported-output-formats).